ICMPC6 2000 - Proceedings

Proceedings
SIXTH INTERNATIONAL CONFERENCE ON MUSIC PERCEPTION

AND COGNITION
KEELE, UK, AUGUST 2000
PROCEEDINGS
EDITED BY C. WOODS, G. B. LUCK, R. BROCHARD,
F. SEDDON & J. A. SLOBODA
Browse from the timetable Alphabetic list of authors
Using this CD Legal and Copyright A note

information
file:///g|/start.htm [18/07/2000 00:27:23]

procttable
Spoken paper sessions: Poster and demonstration

Sunday sessions:
Poster session 1
Monday
Poster session 2
Tuesday
Demonstration papers
Wednesday
Poster session 3
Thursday
Back
file:///g|/proctt.htm [18/07/2000 00:27:25]

sunday
Back
Proceedings
Saturday 17.00 Invited speaker: Sundberg, J. Title: The Singing Voice

Chair: Welch, G.
Sunday 6th Morning Session

S1 S2 S3 S4 S5 S6
Symposium: Symposium: Symposium: Thematic Session: Thematic Session: Thematic Session:
Movement meaning Human voice Categorisation, Instrumental Rhythm and meter Music performance
and flux development similarity perception learning:
participation and Chair: Parncutt,R. Chair:
cue abstraction
practice Gabrielsson,A.
Convenor: Convener: Welch,G (First of 3 sessions)
Davidson,J. Chair:
Chair: Trehub,S Chair: Hallam,S.
Cohen,A. Convenors: Deliège,I
Chair: Louhivouori,J
Discussant:
Sundberg,J. Discussiants:
Baroni,M., Cross,I.,
Mélen,M.
9.00 Clarke, E. Welch, G. Deliège, I. Ryan, K. London, J. Williamon, A.
Meaning and the Genesis of singing Perception of Perceived social Hierarchic Coordinating duo
specification of behaviour similarity and related support and representations on piano performance
motion in music theories children's complex meters
White, P. participation in
music
Long-term average
spectrum analysis of
developmental
9.30 Guile, L. changes in children's Anagnostopoulou, C. McPherson, G. Vazan, P. Ashley, R.
voices. (Abstract)
The expressive Similarity and Self regulation and Mental The pragmatics of
world of flux! Howard, D. categorisation in musical practice: a manipulation of conducting:
boulez' parenthese longitudinal study meter analysing and
Listener perception from the third piano interpreting
of girls and boys in sonata: a formal (Abstract) conductor's
an English cathedral analysis expressive gestures
choir
McAllister, A.
10.00 Correia, J. Addessi, A. Jorgensen, H. Hugardt, A. Edlund, B.
Psychological and
Music performance Segmentation of post Practice planning Children Pulse and Listening to oneself
perceptual voice
as intimate self tonal music and instrumental music at a distance
characteristics in 10
disclosure achievement
year old girls as
manifested in voice
Davidson, J. range profiles Dibben, N. Renwick, J. DiMatteo, R. Keller, P.
10.30
Exploring the body Thurman, L. Motive structure and I've got to do my Time estimation Attentional resource
in the production the perception of scale first, a case and time related allocation in musical
Vocal self-expression
and perception of similarity study of a novice's features of auditory ensemble
and
performance clarinet practice environment performance
self-determination.
(Abstract)
12.00 Invited speaker: Costa-Giomi, E.

Music as an aid to learning and well-being (The nonmusical benefits of music instruction)
Chair: O'Neill, S.
file:///g|/Sun/sunday.htm (1 of 2) [18/07/2000 00:27:28]

sunday
Sunday 6th Afternoon Session

14.00 Poster session 1
Click here for details
Demonstration paper session 1
S1 S2 S3 S4 S5 S6
Symposium: Symposium: Symposium : Thematic Session: Thematic session: Thematic session:
The application of a The perception and Categorisation, Melodic Song memory Models for pulse
theory of intrinsic performance of similarity perception, representation and rhythm
musicality in therapy. vibrato cue abstraction Chair:
(Second of 3 sessions) Chair: Kopiez,R. Costa-Giomi,E. Chair: Povel,D.
Convenor: Convenors:
Trevarthen,C. Timmers,R., Convenor: Deliège,I
Desain, P. & Chair: Louhivouori,j.
Chair: Sloboda,J. Honing,H.
Discussants: Baroni,
Discussant: Chair: Honing,H M., Cross,I., Mélen,M.
Pavlicevic,M
Discussant: Rodet,
X
15.30 Malloch, S. Gleiser, J & Cambouropoulos, E. Demorest, S. Chen-Hafteck, L. Toiviainen, P.

Friberg, A.
Communicative Melodic cue The influence of Discussing The time-course of
musicality and Vibrato rate and abstraction, similarity phrase cues on cross-cultural issues pulse sensation:
human extent in violin and category children's melodic in children's song dynamics of beat
companionship performance formation: a reconstructions learning and induction
computational singing
(Abstract) (Abstract) approach (Abstract)
(Read by C.
Trevarthen)
16.00 Aldridge, D. Konishi, T., Mélen,M. Martinez, I. Fomina, A. Todd, N.

Satoshi, I. & Niimi,
Musicality and S. Categorisation Testing models as Song melody A sensory motor
rehabilitation in processes in music predictors of the influence on speech theory III: human
music therapy Vibrato and perception during rivalry between intonation vs machine
emotion in singing childhood structures and memorisation performance in a
(Abstract) voice surface in the beat induction task
(Abstract) perception of
(Abstract) melodies (Abstract)
16.30 Trevarthen, C. Timmers, R. & Deliège Aoyagi, T. Ginsborg, J. Waadeland, C.

Desain, P.
Timing and Prototype effect in Perception of Off by heart: expert Modulated rhythms
expression of Vibrato, the music listening: an modality in Arabic singers? ? a new model of
emotion in natural questions and empirical approach on maqam: memorisation rhythmic
musicality answers from the notion of imprint experiments with strategies and recall performance
musicians and multiple methods for the words and
(Abstract) science and subject groups music of songs
(Abstract)
17.00 Aldridge, G Yoshiyuki, Horii. Koniari, D. Smith, N. . Langner, J.
Musicality and Perception and Musical schemata in Pitch-distributional Rhythm periodicity

mutuality: the production of vocal children from 10 to 12 effects on the and oscillation
development of vibrato rate and years of age: a study perception of
melody extent on segmentation and tonality
mental line elaboration
(Abstract) (Abstract)
file:///g|/Sun/sunday.htm (2 of 2) [18/07/2000 00:27:28]

VOICE CATEGORISATION AND VOICE PRODUCTION CHARACTERISTICS
Keynote abstract
VOICE CATEGORISATION AND VOICE PRODUCTION CHARACTERISTICS
Johan Sundberg, KTH Voice Research Centre, KTH, Stockholm

Background.
Over recent decades, voice research has revealed several links between the physiology and the acoustics
of voice production in various types of singing. Yet it has been difficult to identify acoustic parameters that
are relevant to perceptual voice parameters. There are reasons to assume, however, that perceptual
descriptions are closely related to voice production, so that much of the gap between perceptual and
acoustic descriptions of voice reflect the complex relations between production and acoustic characteristics.
Aims.
This paper aims to elucidate links between perceptual and acoustic aspects of voice timbre by reviewing
research on the relationships between variations of physiological parameters, such as subglottal pressure,
larynx height, voice source, and vocal tract dimensions, and their effects on the acoustic characteristics.
These relationships will be illustrated by reference to physiological differences between classifications, male
and female, young and old, and operatic, pop music and amateur singers.
Main Contribution.
The paper will comment on possible relationships between acoustic, physiological, and perceptual
descriptions of singing voices.
Implications.
The review will highlight those acoustic voice parameters which should be of direct relevance to perceptual
descriptions of voice quality and thus attempt to identify acoustic parameters that are perceptually and
physiologically informative.
Back to index
file:///g|/Sun/sundabs.htm [18/07/2000 00:27:29]

Movement, meaning and flux
Symposium introduction
Title of Symposium: Movement, meaning and flux
Symposium Convenor: DAVIDSON, Jane W. Dr
Symposium Rationale: The papers in this symposium focus on the human body as the site out of
which musical understanding emerges. The symposium is regarded as being of critical importance
since the role of the body in music perception and cognition has been largely ignored.
Aims This symposium aims to explore the role of bodily movement in the creation and
comprehension of a musical work. Four different research approaches to the issue are included to
illustrate the central role of the body in music listening, performing, teaching and creation.
Back to index
file:///g|/Sun/Jdsymp.htm [18/07/2000 00:27:29]

Children and Music: The Genesis of Musical Behaviour
Human Voice Development: Symposium Introduction

Graham F. Welch
Director of Educational Research, University of Surrey Roehampton; Professor of Music Education
and Director, Centre for Advanced Studies in Music Education (ASME), University of Surrey
Roehampton, UK SW15 5PJ
The symposium presents five complementary papers that illustrate the wide variety of
inter-disciplinary perspectives that are inherent in human voice development.
In the opening paper, Graham Welch provides a conceptual framework for the content of the other
papers. Human voice development and function, whether normal, supra-normal (through the effects of
training and specialised education) or abnormal is a product of the particular interaction between
neuropsychobiological development within specific socio-cultural contexts.
Peta White's paper provides an illustration of this interrelatedness through an examination of the links
between the acoustic bases of voice production and perceived gender of children aged four to eleven
years.
David Howard and John Szymanski's paper extends this focus to the recent introduction of girls into
cathedral choirs and examines the links between acoustic cues and the perceived gender of cathedral
choristers.
In contrast, Anita McAllister explores the appropriateness of particular voice assessment tools in a
clinical setting in order generate a more robust procedure for the identification of voice dysfunction
and pathology.
Finally, Leon Thurman explores some of the major psychological features of human voice
development and the ways in which responses to particular socio-cultural contexts can be critical in
the realisation of vocal potential.
Back to index
file:///g|/Sun/welchsym.htm [18/07/2000 00:27:30]

Delisymp
Symposium
CATEGORISATION<==>Similarity Perception<==>CUE ABSTRACTION
Convenor: Irène Deliège
URPM - CRFMW
Department Arts and Sciences of Music
University of Liège
7 place du Vingt Août
B - 4000 Liège
Tel and fax: +32 2 660 10 13
e-mail: Irene.Deliege@ping.be
1. Rationale for the topic

It is today generally accepted that categorization processes are at the root of all acts of knowledge and that these
processes rest on the possibility, for the individual, to treat sets of things as more or less equivalent. Broadly taken,
categorization originates essentially through perceptions of likeness - or similarity - that can be considered fundamental
within acts of categorisation.
Over several decades, research on categorisation has taken an important place in the field of the cognitive sciences in
general, more precisely in psychology and in artificial intelligence. However, work on categorisation in the domain of
music perception has mainly developed over the last ten years.
2. Aims
The program of the symposium is intended to bring together a group of young researchers who have undertaken the
study of the topic of categorisation from a variety of perspectives in connection with music perception. It is expected
that a synthesis of results will be developed during the round table discussions with a view to define further and future
research perspectives.
3. Contributors and titles of the presentations
session I:
Sunday morning
• Irène Deliège (Liège, Belgium) :
Introduction: Similarity perception in music and related theories
• Christina Anagnostopoulou & Alan Smaill (Edinburgh, Scotland, UK) :
Similarity and categorisation in Boulez' Parenthese from the 3rd piano sonata:
A formal analysis
file:///g|/Sun/Delisymp.htm (1 of 3) [18/07/2000 00:27:31]

Delisymp
• Anna Rita Addessi & Roberto Caterina (Bologna, Italy) :
On segmentation of post tonal music
• Nicola Dibben (Sheffield, UK) & Alexandra Lamont (Leicester, UK)
Motivic structure and the perception of similarity
session II:
Sunday afternoon
• Emilios Cambouropoulos (Vienna) :
Melodic cue abstraction, similarity and category formation: a computational approach
• Marc Mélen, Irène Deliège and Sandrine Praedazzer (Liège) :
Categorisation processes in music perception during chilhood
• Irène Deliège (Liège) :
Prototype effect in music listening. An empirical approach on the notion of imprint
• Dimitra Koniari (Thessaloniki), Marc Mélen and Irène Deliège (Liège) :
Musical schemata in children from 10 to 12 years of age: A study on segmentation and mental line elaboration
session III:
Monday morning
• Marc Mélen and Julie Wachsman (Liège) :
Categorisation of musical structures in 6- to 10- month-old infants
• Stephen Malloch & Kate Stevens (Sydney, Australia):
Cognition and affect in music listening: inter-relations between musical structure, cue abstraction and continuous
measures of emotion
• Tuomas Eerola, Petri Toiviainen, Topi Järvinen and Jukka Louhivuori (Jyvaskyla, Finland):
Categorising folk melodies using similarity ratings
• Ian Cross (Cambridge):
Musical categories, ethnoscience and cognitive anthropology
Discussants :
Mario Baroni (Bologna), Ian Cross (Cambridge), Marc Mélen (Liège)
To close the symposium : KEYNOTE SPEAKER ASSOCIATED WITH THE SYMPOSIUM
Mario Baroni : MUSIC AND MEANING
Monday morning

Delisymp
Back to index

Meaning and specification of motion in music
Proceedings paper
Meaning and the Specification of Motion in Music

Eric Clarke,
Music Department,
University of Sheffield,
Sheffield S10 2TN, UK.
e.f.clarke@sheffield.ac.uk
Abstract
The relationship between music and motion has attracted interest over a broad sweep of
history and across a variety of disciplines including aesthetics, psychology, music theory
and neuroscience, and the relationship itself has been regarded variously as metaphorical,
semiotic, and physiological. This paper will argue that the relationship between music
and motion: i) is a fundamental aspect of music's impact and meaning; ii) is significantly,
but not entirely, concerned with self-motion (as argued by Todd); iii) should be regarded
as a truly perceptual relationship - even though the motion that is perceived may be
illusory (in the sense of being virtual rather than real). The aim of the paper is to clarify
the nature and status of the relationship between music and motion, rejecting both the
physiological reductionism of regarding it as 'hardwired' and the potentially dismissive
view of it as merely metaphorical. In place of these, the paper will argue for motion as
being perceptually specified by the acoustical information in music .
The theory proposed here has specific empirical implications - most obviously an
investigation of the various kinds of information in music that specify motion, and a
consideration of whether these function in anything like the same manner as for real
motion. It also has implications for theories of musical meaning, since it allows for the
integration of the sense of (self-)motion with the other kinds of events (physical,
structural, cultural) as they are specified in music.
1. Introduction
The close relationship between music and movement has been acknowledged since the time of
classical Greek writings on music. More recently, a number of authors - with perspectives ranging
from philosophy and semiotics to neuroscience - have discussed the relationship, proposing
explanations ranging from hardwired physiology to metaphor. This paper will present an overview of
some of this work, and will propose a perceptual approach to the issue based on ecological principles.
In essence, I will propose that the sense of motion when listening to music is an inevitable
consequence of the event-detecting nature of the auditory system, that there are some interesting
file:///g|/Sun/Clarke.htm (1 of 11) [18/07/2000 00:27:34]

questions about what listeners perceive as being in motion, and that the varieties of motion specified
in musical sound are central to listeners' perceptions of meaning in music. This paper will argue that
the relationship between music and motion: i) is a fundamental aspect of music's impact and meaning;
ii) is significantly, but not entirely, concerned with self-motion (as argued by Todd); iii) should be
regarded as a truly perceptual relationship - even though the motion that is perceived may be illusory
(in the sense of being attributed to virtual, rather than real, objects).
2. Music and motion: brief overview.
In his historical overview of rhythmic theory, going back to Greek theory, Yeston (1976) observes
that
"In the broadest sense, the theory of musical rhythm has always been concerned with the
elucidation of musical motion - motion that is differentiated by the durational value,
pitch, or intensity of sounds, but which, at the same time, presumably exhibits certain
regularities." (p. 1)
His survey lies clearly within the tradition of structuralist musicology, of which Shove & Repp
(1995), who provide an important survey of motion literature relating to performance, comment:
"Traditionally, to explain the source of musical motion, theorists, philosophers and
psychologists alike have turned to musical structure, which by most accounts is abstract.
This has led some to believe that the motion heard is virtual, illusory or abstract. ...
Hidden from this view is perhaps the most obvious source of musical movement: the
human performer. Why have so many theorists failed to acknowledge that musical
movement is, among other things, human movement?" (p. 58)
Their aim is to examine the way in which the real (or imagined) motion of the performer may be
specified in sound, and the ways in which the movements of both performers and listeners might be
harnessed to increase their aesthetic awareness of music. In doing so, Shove and Repp demonstrate
that a number of authors in the early part of the twentieth century (Sievers, Becking, Truslit - see
Shove & Repp, 1995) were particularly interested in the relationship between body movement,
gesture and performance (both musical and literary). Each of these authors developed his own lexicon
of movement types, the function of which was both analytical and practical: each lexicon was
intended both to reveal and understand the inner dynamics of works of music (and literature), and to
help performers to find a fluent and expressive approach in performance. Becking, whose ideas have
subsequently been taken up and developed by Clynes (e.g. Clynes, 1983), distinguished a number of
types or styles of movement curve, and attributed these different styles of movement to the music of
different groups of composers. Truslit (1938; see Repp, 1993) had no interest in the composer
specificity of musical motion, but was more concerned with the relationship between the acoustic
surface of a musical performance and the underlying motion dynamics of the piece. His interest was
therefore in the specifics of individual pieces, and with the discovery of the particular movement
patterns that would help listeners and performers to understand and project the music in the most
effective manner. A particular feature of his theory which links it with more recent work by Todd (see
below) is the proposal that the combination of larger (more global) and smaller (more local)
movements in performance, as well as in the physical responses of listeners, reflect the organisation of
the motor system into two divisions controlling whole body movement and more peripheral limb
movement respectively. The former (the ventromedial system) is closely associated with the
vestibular apparatus (as Truslit observed) which is responsible for our sense of balance and
movement, thus suggesting a possible direct physiological link between the perception of sound and
movement.

Although Shove and Repp briefly consider the wider specification of motion in music (i.e. motion that
is not necessarily attributable to the performer, but to some other agent, such as the listener's self
motion, or the motion of some virtual agent) this is not the focus of their review. Todd, however, has
given rather more attention to listening and the sense of self-motion that it can induce, without
resorting to the 'abstraction' of musical structure to which Shove and Repp refer. In a paper concerned
with the relationship between tempo and dynamics in performance, Todd (1992) concludes with the
proposal "that musical expression has its origins in simple motor actions and that the performance and
perception of tempo/musical dynamics is based on an internal sense of motion", and that "expressive
sounds can induce a percept of self-motion in the listener..." (p. 3549). The basis for this, Todd
speculates, may lie in the physiology of the inner ear, and in the possibility that sound directly
activates the vestibular apparatus which is responsible for our sense of self-motion. In a subsequent
paper, however, Todd (1995) notes the need to be cautious about the relationship between musical
expression and physical motion, distinguishing between "purely metaphorical notions of musical
motions and any more psychologically concrete phenomena that correspond to the metaphorical."
(Todd, 1995; p. 1946)
Empirical evidence for listeners' sense of motion in rhythm is provided by Gabrielsson (1973), who
had listeners rate a battery of simple rhythmic figures on a large number of adjectival descriptors, and
analysed these judgements using factor analysis. Three main dimensions emerged: a 'structure' factor
was an indicator of metre (triple or duple) and pattern complexity; an 'emotion' factor was an indicator
of broadly positive and negative emotional attributions; and a 'motion' factor picked up listeners' sense
of the movement quality of each rhythm, using labels such as 'running', 'limping', 'flowing', 'crawling'
etc. The relationship between music and dance (as well as marching and other forms of physical
activity) is an obvious and ancient one, and can be seen in concert music in the persistence of dance
forms right up to the present day, quite apart from its more manifest importance in virtually every
form of popular and traditional music. But the striking feature of Gabrielsson's research is the
spontaneous emergence of this factor in the context of 'laboratory' conditions and very simple single
line rhythms played on a drum. The implication is that the motion character is a pervasive deep-seated
component of listeners' responses to music.
3. The Metaphor of Motion.
In his book on the aesthetics of music, Scruton (1997) devotes considerable space to a consideration
of music's motion character. The starting point for Scruton's argument is the distinction between
sound and tone, identifiable in
"three important distinctions: that between the acoustical experience of sounds, and the
musical experience of tones; that between the real causality of sounds, and the virtual
causality that generates tone from tone in the musical order; and that between the
sequence of sounds and the movement of the tones that we hear in them. These
distinctions are parts of the comprehensive distinction between sound and tone... When
we hear music, we do not hear sound only; we hear something in the sound..." (Scruton,
1997: 19)
In common with many other writers, Scruton argues strenuously for a fundamental distinction
between the sounds of the 'everyday' world, and the tone(s) of music. This is important because it
places the motion experiences of listeners in a realm that is quite separate from those same listeners'
auditory experiences of the motional character of everyday objects in the 'real' world - the sound of
footsteps approaching, of cars passing, of balls bouncing, of bottles breaking, of water gushing and so
on. But despite (or perhaps even because of) the separation of musical motion from the motion of

objects in the world, Scruton places the sense of motion at the centre of musical experience.
"Whenever we hear music, we hear movement...", he writes (p. 55), and elsewhere "...we must hear
the movement in music, if we are to hear it as music." (p. 52).
His explanation for this sense of movement in music, relating as it does to an acousmatic space
divorced from the real spaces of the world, is that it depends on a deep-seated metaphor:
"[The] idea of musical movement is an irreducible metaphor, which can be explained
only through our response to music. It is associated with other metaphors - and in
particular with the metaphor of life. In hearing the movement in music we are hearing
life - life conscious of itself..." (p. 353)
And taking the metaphor of life further, Scruton suggests that we
"think of music as spread out in acousmatic space, where a new kind of individual is born
and lives out its life: an individual whose character is constantly changing in response to
the musical surroundings.
These musical individuals are not, of course, concrete particulars, like tables and chairs.
... They are heard as individuals; but any attempt to identify them must lean upon
acoustical criteria - according to which they are not individuals at all, but repeatable
patterns or types." (p. 72-3)
Scruton's approach relies on taking the first step of claiming that musical events are 'secondary
qualities' - i.e. not tied to the physical circumstances of the real world but separated from them, and
capable of 'behaving' in ways that are not constrained by the real world. His notion of movement,
gesture etc. is necessarily abstract and (in some sense) idealised - because the 'things' that are moving
are metaphysical rather than physical. The movement and space are metaphorical because the
properties of real space and movement have been transferred across to another domain where they
have no literal application. Scruton is aware of the arguments for the central role of metaphor in
human understanding (he makes reference to the work of Lakoff & Johnson on several occasions), but
he nonetheless draws a clear line between the tangible and practical world of sound, and the abstract
and incorporeal domain of tone - and in doing so aligns himself unambiguously with the aesthetic
tradition of music's autonomy.
But why make the first move of insisting upon the separation between sound and tone? The sounds of
music can and obviously do specify objects and events in the world (instruments and the people who
play them), and kinds of action, even when the nature of what is acting is unclear or uncertain (we
may hear blowing or scraping without knowing exactly what is being blown or scraped). In this most
obvious sense the sounds of music are the sounds of the 'real world'. A listening strategy that was
concerned solely with identifying instruments and playing actions would undoubtedly be a limited one
(although the importance of music's instrumental character has been seriously undervalued by the
psychology of music, aesthetics and music theory - all of which are unduly focused on pitch and
rhythm), but a range of other possibilities is opened up by considering the way in which musical
sounds may specify the objects and events of a virtual environment. By analogy, when looking at a
painting (representational or abstract), an important part of the perceptual experience is to see the
forms and colours as specifying a 'virtual space' which has properties that are reminiscent of the real
spaces that we know, or that intrigue us by defying the normal rules of space (the striking and
puzzling quality of some of M. C. Escher's drawings, for example, depend on this). The same
principle applies to hearing: sounds can specify a virtual domain that both abides by and defies the
normal laws of physics.

4. Ecological Accounts of Sound, Music and Motion.

The idea that sounds specify their sources comes from ecological perceptual theory from which I will
extract three principles for present purposes: i) stimulus information is highly structured and specifies
its source; ii) source specification is an aspect of meaning; iii) while ecological theory may have been
developed primarily in relation to the perception of the natural environment, there is no sharp
discontinuity between nature and culture, and the manner in which cultural meanings are available to
a perceiver does not differ from their natural counterparts. Gibson pointed out that culture is as
dependent on material reality as is the natural environment:
"Symbols are taken to be profoundly different from things. But let us be clear about this.
There have to be modes of stimulation, or ways of conveying information, for any
individual to perceive anything, however abstract. He must be sensitive to stimuli no
matter how universal or fine-spun the thing he apprehends. No symbol exists except as it
is realized in sound, projected light, mechanical contact, or the like." (Gibson, 1966, p.
26).
Gibson developed these ideas primarily in relation to vision, and there has been only a rather slow
attempt to apply the principles to hearing, let alone music (though see Bregman, 1990; Gaver, 1993a
& 1993b; Windsor, 1996). A great deal stands to be gained by doing so, however, and particularly in
bringing some kind of continuity and unity to the way in which we understand the perception of 'basic'
features and the cultural meanings of music. The ecological approach asserts that the source of a
sound is specified in the stimulus information as it arrives at the sensory system of a perceiver: its
spatial location, material, mode of excitation, separation from other sources, and so on. As an example
of this, Warren & Verbrugge (1984) showed that the sounds of bouncing and breaking could be
distinguished from one another on the basis of the temporal properties of the impact sequences, and
showed that listeners could still very reliably distinguish between the sense of bouncing and breaking
in artificially generated examples which used only a highly simplified imitation of the temporal
properties of real impact sequences.
In a similar manner, Fowler (e.g. Fowler, 1986) has argued that listeners understand speech on the
basis of their ability to perceive the movements of the vocal apparatus of a speaker as specified in the
acoustic signal, not on the basis of an analysis of the acoustic signal itself. The evidence for this
comes from the phenomenon known as co-articulation, where the acoustic traces of successive speech
sounds become heavily overlapped as a result of 'short-cuts' and elisions between speech segments -
particularly in rapid speech. The acoustic signal under these circumstances can on occasion offer
virtually no evidence for the segmentation and identification of phonemes, and yet is comprehensible
by listeners. Fowler has claimed that listeners' capacities to make sense of such examples (which are
common in ordinary speaking) can be explained if the way in which the speech sounds specify
articulatory gestures is considered: listeners understand what a speaker is saying not by analysing the
sound, but by detecting the articulatory gestures specified in the sound. The point that Fowler's
research and theory makes plain is that sound can specify movement - in this case of a very detailed
and subtle kind.
In the development of his 'auditory scene analysis' approach to the perceptual organization of sound,
Bregman (1990) draws on rather similar principles to explain the way in which listeners make sense
of the sounds around them. "Music is built out of sound. Therefore we should not be surprised if its
perceptual organization is governed, at least in part, by ... primitive scene-analyzing principles" states
Bregman (1990: 455), and he goes on to show how important scene-analyzing is for segregating and
aggregating the elements of the auditory signal into coherent and informative streams and events. For
Bregman, scene analysis is the basis on which a more conventional cognitive construction is erected,

but scene analysis itself extends to some quite large-scale and complex features - and dynamic and
motional character is one such. In a passage which draws attention to the close relationship between
the way this character is conveyed in music and in 'real life', Bregman writes:
"Transformations in loudness, timbre, and other acoustic properties may allow the
listener to conclude that the maker of a sound is drawing nearer, becoming weaker or
more aggressive, or changing in other ways. However, in order to justify such
conclusions, it must be known that the sounds that bear these acoustic relations to one
another are derived from the same source. ... This strategy of allowing discrete elements
to be the bearers of form or 'transformation' only if they are in the same stream is the
foundation of our experience of sequential form in music as much as in real life."
(Bregman, 1990: 469)
As remarked earlier, if the sounds of music were restricted to specifying only real physical sources
(instruments and their modes of activation), an ecological approach to music perception might seem to
offer a limited prospect, and this is perhaps why Gaver (1993a) draws a sharp line between musical
and everyday listening in his important paper on the ecology of audition. But there are ways of
developing the ecological principle far beyond the recognition of instrumental sources, as I have
proposed elsewhere (Clarke, 1997; 1999; see also Windsor, 1995), and as Bregman also admits when
he adopts an idea of McAdams' - that of virtual sources. It is in this idea that the potential of an
ecological approach lies for the kind of motion perception that is the specific subject of this paper.
McAdams (1984) coined the term 'virtual source' by analogy with the terms virtual image or virtual
object in optics, which refer to the objects and images seen in mirrors and pictures, and which occupy
the virtual space behind the plane of the picture or mirror.
"A composer may want the listener to group the sounds from different instruments and
hear this grouping as a sound with its own emergent properties. This grouping is not a
real source, however. Stephen McAdams has referred to it as a 'virtual' source, perhaps
drawing the analogy from the virtual image of optics ... Experiences of real sources and
of virtual sources are both examples of auditory streams. They are different not in terms
of their psychological properties, but in the reality of the things that they refer to in the
world. Real sources tell a true story; virtual sources are fictional." (Bregman, 1990: 460)
In the case of a mirror, the virtual objects have a lawful relation with the real objects of which they are
a reflection, move in a fashion that is identical to the corresponding movements of their real
counterparts (of which they are a reflection), and are described by an optics identical to that of the real
world (albeit right/left reversed). In a picture, or film/animation, the objects have qualities that may
mimic those of the real world, and can do so very convincingly in the case of trompe l'oeuil painting,
or computer animation, but achieve these qualities using quite different means. Gibson noted this in
his discussion of painting and drawing (Gibson, 1966; p. 240) when he pointed out (in relation to a
painting of Sir Thomas More by Holbein) that the sense of folding and texture on More's velvet sleeve
is achieved by Holbein using pigments of different hues, while the same visual effect produced by a
real piece of velvet is achieved by differential reflectance and shadow. Assuming that the painting of
the velvet is completely convincing, the same perceptual effect is produced by completely different
means: paintings specify the shapes and positions of a virtual space with quite different methods
(pigments, shading, geometry) than the 'same' effects in the real world (reflection, shadow,
orientation).
The comparison with painting is instructive because it suggests a way of understanding both what it is
that we perceive as moving in music, and how the effect is produced. Just as the disposition of

pigment in a painting can create a perceptual effect analogous to that produced by reflectance and
shadow in the real world, so music may create perceptual effects with the disposition of discrete
pitches and instrumental timbres in time that reproduce, or approximate to, those that we experience
with the continuous acoustical transformations that are characteristic of real world events. It is
important to be cautious about making too simple a translation from the spatial domain of pictures to
the temporal domain of music, however, and to avoid representing music as little more than 'sound
pictures'.
5. Ecology extended: motion, gesture, meaning
While it seems obvious that visual information can specify movement, there is more resistance to the
idea that the same might be true of sound. Certain kinds of acoustical information are readily accepted
as specifying motion (e.g. continuously changing left ear/right ear intensity balance or phase relation;
or the pitch shift of the Doppler effect) since these directly specify real movement in real space. The
experiences of vivid sound tracks or 'demonstration' sequences in surround-sound systems or Dolby
Digital cinemas are indicative of how powerful this effect can be even when artificially generated. But
the much more general possibility is that rhythmic effects, timbral changes, dynamic changes, pitch
patterns etc. have the capacity to specify motion in a virtual space - in the same way that the swirls
and textures and so on of computer animation specify motion in the virtual spaces of the film. Just as
the success of animation depends on the propensity of the visual system to detect movement in even
quite poor approximations to the changing visual arrays that specify real movement, so too the
perception of motion and gesture in music relies on the detection of motional and gestural invariants
in sound sequences which may be quite poor approximations to their real-world counterparts. The
temporal, timbral and dynamic components of music can be seen as having a direct capacity to specify
objects and events (for a detailed analysis, see Todd, O'Boyle & Lee, 1999).
This final section can do little more than sketch the outlines of such an approach - for two reasons:
first, a more complete account is beyond the scope of this paper; and second the empirical
investigations needed to explore the ideas presented here do not exist. But the principle itself can be
stated simply enough: since sounds in the everyday world specify (among other things) the motional
characteristics of their sources, it is inevitable that musical sounds will also specify movements and
gestures, both the real movements and gestures of their actual physical production (as discussed
above) and also the fictional movements and gestures of the virtual environment (cf. Windsor, 1995)
which they conjure up. In certain respects, this is not a new idea at all: Langer (1942) wrote of the
way in which musical meaning is tied up with its capacity to convey movement and gesture, but the
difference here is the claim that this relationship is truly perceptual rather than metaphorical, symbolic
or analogical. For obvious adaptive reasons, the auditory system is highly attuned to the
movement-specifying properties of sounds, and since the variety of movements by animate and
inanimate objects is unlimited, every musical sound will specify some kind of movement. The most
obvious way in which movement is specified is in the temporal properties of musical sound, since any
temporal property can specify movement. The crucial question is therefore what a listener hears as
being in motion.
As discussed earlier, Todd has proposed that the sense of motion in music is a sense of self-motion.
For him this is a necessary consequence of the vestibular explanation that he proposes: if musical
sound directly stimulates the vestibular apparatus, then this will induce a sense of self-motion in just
the same way as the well-documented manner in which visual information can. But if a perceptual
approach is adopted, self-motion is not the only kind of motion that listeners might experience -
although the relativity of motion means that it is always an option. When you look out of the window
of a stationary train and see another train move, the well-known phenomenon of experiencing either

illusory self-motion or the true movement of the other train is an illustration of this relativity. In a
similar way, sound specifies relative motion but cannot alone specify which object is moving in
relation to the other: in terms of pitch shift, the Doppler effect, for example, is identical whether
caused by a sound-emitting object approaching and passing a stationary observer, or a moving
observer passing a stationary sound-emitting object. In the absence of appropriate empirical evidence,
it is impossible assert the extent to which listeners perceive musical motion as self-motion or the
movement of other objects, but intuition suggests the experience of self-motion is not uncommon. In
part this may be attributed to a simple principle of ecological acoustics: if all the separate sound
sources that are (or appear to be) specified in some music are heard to behave and move in a
correlated fashion, then the auditory scene specified is that of a listener moving in relation to a
collection of stationary sound sources (self-motion). If, however, the various sound sources move
relative to one another and in relation to the listener, then this specifies the movements of external
objects in relation to one another. In very simple terms this suggests, for instance, that music that has
broadly polyphonic and contrapuntal properties is likely to be heard in the latter category, while
monodic or homophonic music may more easily specify self-motion.
Let us consider a single musical example which may illustrate some of these points: the pair of
dramatic orchestral crescendos on a single note (B) that occurs in the interlude between scenes 2 and 3
of the third act of Alban Berg's opera "Wozzeck". The movement that is specified by these sounds is a
paradoxical one. On the one hand the complete absence of pitch change specifies stasis, while on the
other the continuous change in both timbre (in the first of the two crescendos) and dynamic (in both)
specifies continuous, inexorable and perhaps headlong motion. The net result is, perhaps, a sense of
highly focused and unswerving approach - the auditory equivalent of what Gibson called 'looming' or
'time-to-contact'. Gibson writes of the information which specifies approach to/of an object as
follows:
"Approach to a solid surface is specified by a centrifugal flow of the texture of the optic
array. Approach to an object is specified by a magnification of the closed contour in the
array corresponding to the edges of the object. A uniform rate of approach is
accompanied by an accelerated rate of magnification. ... The magnification reaches an
explosive rate in the last moments before contact. This accelerated expansion ... specifies
imminent collision." (Gibson, 1958, cited in Gibson 1979, p. 231)
This description of the information specifying approach reads as a very close parallel to the sounds of
these two orchestral crescendos, if one makes appropriate sensory substitutions: continuous dynamic
increase substitutes for flow of optical texture, and the pitch stasis provides the centrifugal quality
(imagine how different the effect of the example would be if the music were to trace some kind of
continuous or stepwise pitch trajectory at the same time as the crescendo). Relating to Gibson's
mention of "an explosive rate" of magnification, and the specification of imminent collision, this
aspect of the Wozzeck example is largely in the hands of the conductor who controls by exactly how
much, and at what rate, Berg's dynamic markings should be realised. But performances and recordings
of the work usually do seem to reach "an explosive rate" of intensity increase, and thus the sense of
imminent collision.
The two crescendos differ in a number of respects, and give rise to interestingly different movement
effects as a result. The first crescendo is achieved not only by continuous increases of dynamic within
each of the instruments or instrumental groups of the orchestra, but also by a complex pattern of
successive instrumental entries, so that the timbre and texture of the orchestral sound changes
continuously as its dynamic increases. Arguably, this results in a less focused and static 'looming'
quality than is achieved by the second crescendo, which consists simply of huge orchestral tutti

crescendo. A second difference concerns the respective goals or 'contact moments' of the two
crescendos. The first, after a build up on unison B, ends with a 6-note orchestral chord, which is
played as a rhythmic unison on a downbeat, and coincides with the first note of a striking rhythmic
figure played solo and triple forte by the bass drum. The orchestral downbeat has the attack and
unanimity of an impact, and the sense of motion that the first crescendo conveys is therefore of
approach, followed by impact - out of which the bass drum appears as if in a blinding flash. The
second crescendo, by contrast, is an orchestral unison throughout, and consists solely of a dynamic
crescendo which ends not with a downbeat - indeed without any final 'event' at all - but with the
equivalent of a cinematic cut straight into the next scene of the opera. It is as if the imminent collision
never materialises, and the listener (or the music) is shot out into a new and completely unpredicted
space - as if passing through an invisible barrier into a new domain at the moment when collision
seemed inevitable. These descriptions may seem fanciful and unfounded, but it would be relatively
straightforward to establish empirically what the musical/acoustical conditions are that specify
collision, or rupture, or emergence, or linear movement, or movement with frequent directional
change, and so on. It simply hasn't yet been tried, but using the similarity with optical flow, texture
gradients, coordinated versus independent component behaviour and so on, the basis for which is
already established within auditory scene analysis research, it is a perfectly tractable proposition.
An issue already broached and again pinpointed by the Wozzeck example is the question of who or
what is moving. As the discussion above illustrates, there is an ambiguity about the agency to which
the movements described above should be attributed. This is perhaps made all the more concrete and
particular by the operatic context from which this music comes, and the drama of which it is a part.
Does each listener hear him or herself as moving towards some collision, or one of the characters of
the opera moving towards some collision, or indeed some other person or object? Given the dramatic
context in which this music takes place (the main character, Wozzeck, has just murdered Marie, his
partner), it is likely that many listeners will hear this as Wozzeck 'rushing to meet his fate' (or fate
rushing to meet him), or perhaps death rushing to meet Marie (heard now in immediate retrospect).
But equally it is possible for a listener to hear this as self-motion and as an identification with one of
the opera's characters. Writing at a much more psychophysical level of a similar kind of subject/object
uncertainty or fluidity, and of the relationship between rhythm and motion, Todd et al. (1999) express
the situation as follows:
"The essence of the sensory-motor theory is that the experience of rhythm is mediated by
two complementary representations: a sensory representation of the motional-rhythmical
properties of an external source on the one hand, and a motor representation of the
[listener's] musculoskeletal system, on the other. For any individual, the sensory systems,
by learning, tune in to the temporal-motional properties of the physical environment,
whilst the motor control system, also by learning, tunes into the dynamic properties of its
musculoskeletal system. If the temporal/dynamical properties of the source and the
musculoskeletal system are matched, then the motor image will tend to synchronise with
the source." (Todd, O'Boyle & Lee, 1999, p. 26)
It could be objected once again that the discussion of the "Wozzeck" example is highly speculative,
subject to an enormous amount of interpretative licence, and out of step with an ecological approach
since it seems to depend on the interpretation of sense data in the light of verbal and dramatic
information, rather than specification by stimulus invariants. But this would be to overlook the fact
that all of the factors mentioned (the drama, the characters, the sounds) are indeed part of the available
information for a viewer/listener. That there is considerable leeway for different perceptions of this
short musical extract is not a problem for an ecological approach: many environmental circumstances
can be perceived in more than one way, and aesthetic objects are particularly (and deliberately)

multivalent - not only because they often contain deliberately partial and perhaps conflicting
perceptual information (and therefore have the capacity to specify more than one state of affairs), but
also because the viewers/listeners who encounter them, even when drawn from notionally the same
culture, may differ markedly in their previous experience of this or similar events. The differences in
exposure to the music of Berg, for example, among a sample of psychologists of music is likely to be
far greater than the differences between these same individuals' experiences of mountains.
6. Summary and Implications
To conclude, I have proposed in this paper that the sense of motion and gesture in music is a truly
perceptual phenomenon, and that the perceptual information that specifies motion is broadly speaking
the same as for the perception of motion in the everyday world. In music, it is the temporal component
of music that is particularly rich in specifying motion. The sense of motion or self-motion draws the
listener into an engagement with the musical materials in a particularly dynamic manner (he or she
acts amongst them), and in so doing constitutes a vital part of musical meaning. The theory proposed
here has specific empirical implications - most obviously an investigation of the various kinds of
information in music that specify motion (and the work of Todd et al. is an important start in this
direction), and a consideration of whether these function in anything like the same manner as for real
motion. It also has implications for theories of musical meaning, since it allows for the integration of
the sense of (self-)motion with the other kinds of events (physical, structural, cultural) as they are
specified in music.
References
Bregman, A. S. (1990): Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge,
Mass.: MIT Press.
Clarke, E. F. (1997): Perception and Critique: Ecological Acoustics, Critical Theory and Music.
Proceedings of the International Computer Music Conference, Thessaloniki, Greece , 19-22.
Clarke, E. F. (1999): Subject-position and the specification of invariants in music by Frank Zappa and
P. J. Harvey. Music Analysis, 18, 347-374.
Clynes, M. (1983): Expressive microstructure in music, linked to living qualities. In J. Sundberg (Ed):
Studies of Music Performance. Publications issued by the Royal Swedish Academy of Music no. 39,
Stockholm.
Fowler, C. A. (1986): An event approach to the study of speech perception from a direct-realist
perspective. Journal of Phonetics, 14, 3-28.
Gabrielsson, A. (1973): Adjective ratings and dimension analyses of auditory rhythm patterns.
Scandinavian Journal of Psychology, 14, 244-260.
Gaver, W. W. (1993a): What in the world do we hear? An Ecological Approach to Auditory Event
Perception. Ecological Psychology, 5, 1-30.
Gaver, W. W. (1993b): How do we hear in the world? Explorations in ecological acoustics.
Ecological Psychology, 5, 285-313.
Gibson, J. J. (1966): The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gibson, J. J. (1979/1986): The Ecological Approach to Visual Perception. Hillsdale, NJ: Lawrence

Erlbaum.
Langer, S. K. (1942): Philosophy in a New Key. Cambridge, MA: Harvard University Press.
McAdams, S. (1984): Spectral Fusion, Spectral Parsing, and the Formation of Auditory Images.
Unpublished PhD thesis, Stanford University.
Repp, B. (1993): Music as motion: a synopsis of Alexander Truslit's (1938) "Gestaltung und
Bewegung in der Musik". Psychology of Music, 21, 48-72.
Scruton, R. (1997): The Aesthetics of Music. Oxford: Oxford University Press.
Shove, P. & Repp, B. (1995): Musical motion and performance: theoretical and empirical
perspectives. In J. Rink (Ed.): The Practice of Performance. Studies in Musical Interpretation.
Cambridge: Cambridge University Press, p. 55-83.
Todd, N. P. (1992): The dynamics of dynamics: a model of musical expression. JASA, 91, 3540-3550.
Todd, N. P. (1995): The kinematics of musical expression. JASA, 97, 1940-1949.
Todd, N. P., O'Boyle, D. J. & Lee, C. S. (1999): A Sensory-Motor Theory of Rhythm, Time
Perception and Beat Induction. Journal of New Music Research, 28, 5-28.
Truslit, A. (1938): Gestaltung und Bewegung in der Musik. Berlin: Chr. Friedrich Vieweg.
Warren, W. & Verbrugge, R. (1984): Auditory perception of breaking and bouncing events: a case
study in ecological acoustics. Journal of Experimental Psychology: Human Perception and
Performance, 10, 704-712.
Windsor, W. L. (1995): A Perceptual Approach to the Description and Analysis of Acousmatic Music.
Unpublished PhD thesis, City University, London.
Windsor, W. L. (1996): Perception and Signification in Electroacoustic Music . In R. Monelle and C.
T. Gray (eds.) Song and Signification . Edinburgh University Faculty of Music.
Yeston, M. (1976): The Stratification of Musical Rhythm. New Haven: Yale University Press.
Back to index

Proceedings paper
The Genesis of Singing Behaviour

Graham F. Welch
Director of Educational Research, University of Surrey Roehampton; Professor of Music Education
and Director, Centre for Advanced Studies in Music Education (ASME), University of Surrey
Roehampton, UK SW15 5PJ
Introduction
'The crow doth sing as sweetly as the lark
When neither is attended, and I think
The nightingale, if she should sing by day,
When every goose is cackling, would be thought
No better a musician than the wren.
How many things by season seasoned are
To their right praise and true perfection!'
Shakespeare The Merchant of Venice (1596-8) Act 5, sc.1, l.124.
It has been long understood that context is significant in our perception of musical behaviours. For
those of us brought up in the Western cultural tradition, a sense of audience is often an important
factor in the perception, cognition and perceived worth and significance of music, particularly in
relation to archetypal 'high art' and ' popular' genres. Any differential status and reward reflect the
relative meanings that we, as individuals and as groups, assign to the musical products of our
(sub-)culture (cf Finnegan, 1989; Burns, 1999; Carterette & Kendall, 1999). Singing to oneself in the
car, in the bathroom or whilst undertaking some daily chore is likely to be perceived differently from
singing on the concert stage in front of a paying audience. The former is private and personal, relaxed
and unselfconscious. In contrast, public singing involves a greater sense of 'performance', of implied
'correctness' against some perceived expectation of what counts as 'appropriate' musical behaviour in
this context.
So, an appreciation of context provides the socio-cultural and socio-musical framing for the
interpretation of a particular musical behaviour. Similarly, that which counts as 'music' within a
particular culture is also socially located. Particular groupings and patterns of sounds are accepted as
belonging to, even as exemplars of, specific musical genres. Although all musics may share basic
generic acoustic elements (related to acoustic waveform, duration, intensity and frequency), it is the
specific configuration of these in particular combinations that gives rise to the many different and
distinct musical genres.
Nevertheless, although an appreciation of context and the structure of the musical artefact are
necessary ingredients for an understanding of observed musical behaviour, they are not sufficient. A
file:///g|/Sun/Welch3.htm (1 of 7) [18/07/2000 00:27:37]

third element is also required, centred on an individual's particular neuropsychobiological

development. Each individual undergoes anatomical and physiological changes over time, including
developmental changes related to the nervous system which is interfaced into every other system and
organ of the body. As visual, auditory and somatosensory inputs are processed, they effect changes in
the body and its cognitive architecture, resulting in behaviour repetition, modification or change,
memory and learning. This internal processing allows the individual to make sense of their world in a
way that is unique and peculiar to them. Yet, at the same time, it has sufficient in common with others
to permit communication and expression. Patterns and relationships are detected in the sonic world
through the utilisation of a hierarchical signal processing capability. This appears to progress from the
perception of psycho-acoustic features (such as pitch, loudness, duration, timbre), to structures
(detecting/constructing patterns and regularities in the sounds), and (subsequently) to music's syntactic
and communicative elements (being the potential for musical sounds to be characterised by a
grammatical function within the musical context: music as a form of language) (cf Spender, 1987;
Welch, 1998).
These three generative elements - (i) the overall nature and individual developmental history of our
human neuropsychobiology, (ii) socio-cultural context, and (iii) music as a sonic artefact - are central
to an understanding of musical behaviour and its development (see figure 1).
The current empirical evidence is unequivocal that all humans have the potential to exhibit musical
behaviours and to be 'musical' (cf Gardner, 1983; Sloboda, 1985; Lecanuet, 1996, 1998; Papousek,
1996; Hargreaves, 1996; Trehub, Schellenberg & Hill, 1997; Welch, 1998). Being 'unmusical' is only
tenable as a concept in relation to behaviours that are assessed against the rule system inherent within
a particular musical genre. The notion that anyone is 'unmusical' in any absolute sense runs contrary to
the findings of the many research studies that have examined the benefits of supportive pedagogical
strategies and environments on musical development (e.g. Hallam, 1998; OfSTED, 1998; Folkestad,
1998; Durrant, 1998; Cottrell, 1998; Harwood, 1998; Webster, 1998; Young, 1999). Nevertheless, it is
a popular misconception that singing 'out-of-tune' is prime facie evidence that someone is essentially
'unmusical' and lacking in 'musical ability'. Such labels (and self-labelling) often persist into
adulthood and are often used as a basis for categorising and stereotyping a person's overall musical
ability (Welch, 2000a).
Given the possible range and depth of interaction between the model's three generative elements, it is
not surprising that, at the level of the individual, similarities and differences are evident in the
development of musical behaviour. Such similarities and differences are the outcome of the inherent
tensions between (a) the musical 'pathways' (Finnegan, 1989) that groups and societies utilise to
induct, foster, perpetuate (and transform) musical traditions across successive generations and (b) the
somewhat more idiosyncratic developmental 'routes' (Welch, 1998) actually taken by individuals as
they encounter and attempt to make sense of the dominant musical genres within their particular sonic
environments.
The Genesis of Singing Behaviour

The human voice is a ubiquitous musical instrument and singing is a characteristic musical behaviour
of all known societies (Nettl, 1983). Recent neurological research into the complex functioning of the
brain provides evidence of common neural contributions to the active processing of intonation in
speech (Van Lancker, 1997 p3) and also of melodic contour (Patel & Peretz, 1997 p197). Utilising
Gardner's conceptualisation of a multiplicity of intelligences, the human voice can be conceived as a
product of both linguistic and musical intelligences. Perceptual studies of two-month-old infants

provide evidence that they are already able to discriminate prosodic features of the dominant
(maternal) language such as intonation and rhythm (Mehler & Christophe, 1995 p947).
Further evidence of the interfacing of socio-cultural context with neuropsychobiological development
is provided by analyses of lullabies and infant songs from different cultures and also of infant-directed
speech. The musical structures of the lullabies and songs are characterised by perceived simplicity,
descending intervals and contours and relatively few contour changes. These features parallel some
prosodic features of infant-directed speech (Unyk et al, 1992 p25). During these first few months of
life, parents 'consistently guide the infant towards at least three levels of vocal expertise' (Papousek,
1996 p44-45). These levels gradually emerge during preverbal vocal development.
● level 1: initial fundamental voicing develops into prolonged, euphonic cooing (around the age
of 8 weeks); subsequently leading to phrasing and melodic modulation (two months);
● level 2: the vocal stream becomes segmented into syllables due to the use of consonants;
mothers facilitate this development through the use of rhythmic games and rhythmic melodies;
● level 3: canonical syllables appear and are treated by parents as 'protowords' to which are
attributed meanings, being assigned in a declarative manner to the naming of persons, objects
and events.
Adult-child dialogue is characterised by 'rich melodic modifications' during these first months of life
(Papousek, op.cit. p48), with repetitions, frequent glides, a prevalence of basic harmonic intervals
(3rds, 4ths, 5ths, 8ves) and sometimes dramatic changes in intensity.
Given the bipotentiality of early interactive vocalisation, it is not surprising to discover that the
borders between singing and speech are often blurred, particularly for the young child (Davies, 1994;
Davidson, 1994; Welch, 1994; Sergeant, 1994; Welch, Sergeant and White, 1996). Such blurring is
characteristic of one of the first phases of singing development for many young children, with singing
often described as 'chant-like', with a relatively restricted pitch range and melodic phrasing (Welch,
2000a). Nevertheless, some children become relatively skilled in the dominant musical song genre(s)
at a very early age, probably because of a greater match between individual potential and the
existence of a particularly nurturing, socio-cultural musical pathway.
Human voice development is, therefore, characterised by the extent to which the socio-cultural
context (including the dominant soundscape and its musical genres) interfaces with emergent physical
and mental abilities. Anatomical and physiological development create limitations to the range and
variety of vocal products. Compared to the adult, for example, the infant larynx is much smaller, with
different ratios of cartilage to ligament and less-defined mucosal tissues (Thurman & Klitzke, 2000).
The onset of puberty heralds the onset of major physical changes to the underlying vocal structures,
especially (but not only) in adolescent males. It is only after puberty that the internal vocal ligament
and mucosa become fully mature. Subsequently, as the vocal structures age, there can be a
degeneration of muscle and connective tissue alongside an increased ossification and calcification of
the laryngeal cartilages (Welch & Thurman, 2000). Even so, older singers are still capable of singing
expressively, not least because biological age is only loosely related to chronological age for those
individuals who have maintained healthy and well-conditioned vocal instruments. So, ongoing
changes in the physical bases for voicing across the lifespan in both structure and function have an
impact on vocal production in singing, particularly for the young, adolescent and old. Therefore,
certain aspects of a predominant musical genre (such as its vocal pitch range, tessitura, or vocal style)
may be beyond, or inimical to, the capabilities of a particular individual at a particular time in their
lives.
Similarly, psychological processing and development determines the salient features of both musical

input and sung output. In a supportive context, a mismatch between musical task and current vocal
competency could produce a greater motivation towards success and mastery. Equally, however, such
a mismatch in a different context could result in avoidance behaviours, or singing behaviours that are
judged to be consistently 'inadequate' when set against a music's internal rule system (as is the case
with singing 'out-of-tune'). The linguistic terms applied to singing demonstrate the ambivalence with
which societies nurture or hinder potential singing competency, as well as the development of singer
identity. Labels such as 'tone-deaf', 'monotone', 'growler', 'poor pitch singer' and having a voice that is
'breaking' or 'broken' may be contrasted with language which celebrates the vocal 'purity' of the
cathedral chorister and those 'gifted' individuals who are regarded as 'beautiful', 'powerful', or 'natural'
singers. Such positive and negative features are characteristic of the socio-musical environment and
are likely to be closely related an individual's value-emotive response to singing as a pleasurable or
unenjoyable activity.
Summary
Overall, there is a sense in which the 'genesis' of singing behaviour can be seen to have both a micro
as well as a macro meaning. With regard to the latter, pre- and perinatal musical experiences provide
the socio-cultural framework within which subsequent vocal utterances are shaped according to
dominant musical conventions towards that which counts as 'singing' within a particular (sub-)culture.
Birth is the beginning of voice production and it is possible to trace the genesis of singing behaviours
from this time. Initial vocal exploration during the cooing and babbling stages includes variations in
vocal pitch range, timbre and loudness. Concomitantly, there is a highly significant interaction with
parents (carers) using infant-directed speech that can facilitate a growing mastery of musical artefacts
and conventions. After the age of five 'children can hold onto a stable tonality throughout a song...the
typical 5-year-old has a fairly large repertoire of nursery songs from his or her culture' (Dowling,
1999).
Yet the nature of the physical development of the vocal instrument across the lifespan suggests that
each vocal transformation has its own genesis, especially after adolescence. Each of the periods of
early childhood, later childhood, puberty, adolescence, early adulthood, older adulthood and
senescence are marked by particular and sometimes significant changes in vocal structure and
(potentially) function. These physical changes require behaviour modification, a different
co-ordination of muscle groups and, in some cases, relearning of singing behaviours because the
previous motor programmes do not produce the required or expected singing outcomes.
Individual singing potential and achievement should be seen in relation to each of these lifespan
periods. If there is a succession of appropriately supportive contexts where singing 'task' and response
are matched, it should be possible to celebrate relatively accomplished singing behaviours from early
childhood through to senescence.

Figure 1: The generative interface for observed musical behaviours
Bibliography
Burns, E.M. (1999). Intervals, Scales, and Tuning. In D. Deutsch (Ed.). The Psychology of Music [2nd
Edition]. London: Academic Press. pp 215-264.
Carterette, E.C. & Kendall, R.A. (1999). Comparative Music Perception and Cognition. In D.
Deutsch. The Psychology of Music. [2nd Edition]. London: Academic Press. pp 725-791.
Cottrell, S. (1998). Partnerships in the classroom. British Journal of Music Education, 15(3), 271-285.
Davidson, L. (1994). Songsinging by Young and Old: A Developmental Approach to Music. In R.
Aiello and J.A. Sloboda (Eds.). Musical Perceptions, Oxford: Oxford University Press. pp 99-130.
Davies, C. (1994). The Listening Teacher: An Approach to the Collection and Study of Invented
Songs of Children Aged 5 to 7. Musical Connections: Tradition and Change. Auckland, NZ:
International Society for Music Education. pp. 120-127.
Dowling, W. Jay. (1999). The Development of Music Perception and Cognition. In D.Deutsch (Ed).
The Psychology of Music. [2nd Edition]. London: Academic Press. pp 603-625.

Durrant, C. (1998). Developing a choral conducting curriculum. British Journal of Music Education.
15(3), 303-316.
Finnegan, R. (1989). The Hidden Musicians. Cambridge: Cambridge University Press.
Folkestad, G. (1998). Musical Learning as Cultural Practice: As Exemplified in Computer-Based
Creative Music-Making. In B. Sundin, G.E. McPherson & G. Folkstad (Eds.). Children Composing.
Malmö: Lunds University. pp 97-134.
Gardner, H. (1983). Frames of Mind. London: Heinemann.
Hallam, S. (1998). The Predictors of Achievement and Dropout in Instrumental Tuition. Psychology
of Music, 26(2), 116-132.
Hargreaves, D.J. (1996). The development of artistic and musical competence. In I.Deliege & J.
Sloboda (Eds.). Musical Beginnings. Oxford: Oxford University Press. pp145-170.
Harwood, E. (1998). Musical learning in Context: A Playground Tale. Research Studies in Music
Education, 11, 52-60.
Lecanuet, J-P. (1996). Prenatal auditory experience. In I.Deliege & J. Sloboda (Eds.). Musical
Beginnings. Oxford: Oxford University Press. pp3-34.
Lecanuet, J-P. (1998). Foetal responses to auditory and speech stimuli. In A. Slater (Ed.). Perceptual
Development. Hove: Psychology Press. pp317-355.
Mehler, J. & Christophe, A. (1995).Maturation and Learning of Language in the First Year of Life. In
M.S. Gazzaniga (Ed.). The Cognitive Neurosciences. Cambridge, Mass: MIT Press. pp 943-954.
Nettl, B. (1983). The Study of Ethnomusicology. Urbana: University of Illinois Press.
OfSTED. (1998). The Arts Inspected. Oxford: Heinemann.
Papousek, H. (1996). Musicality in infancy research: biological and cultural origins of early
musicality. In I.Deliege & J. Sloboda (Eds.). Musical Beginnings. Oxford: Oxford University Press.
pp37-55.
Patel, A. D. & Peretz. I. (1997). Is music autonomous from language? A neuropsychological
appraisal. In I. Deliege & J. Sloboda (Eds.). Perception and Cognition of Music. Hove, UK:
Psychology Press. pp 191-215.
Sergeant, D.C. (1994). Towards a Specification for Poor-Pitch Singing. In G.F. Welch and T. Murao
(Eds.). Onchi and Singing Development. London: David Fulton. pp 63-73.
Sloboda, J.A. (1985). The musical mind: The cognitive psychology of music. Oxford: Clarendon
Press.
Spender, N. (1987). Psychology of Music. In R.L. Gregory (Ed.), The Oxford Companion to the
Mind.. Oxford: Oxford University Press. pp499-505.
Thurman, L. & Klitzke, C. (2000). Highlights of physical growth and function of voices from
pre-birth to age 21. In L. Thurman and Welch, G.F. (Eds.). Bodymind and Voice: Foundations of
Voice Education. [2nd Edition]. Iowa: National Center for Voice and Speech.
Trehub, S.E., Schellenberg, G. & Hill, D. (1997). The origins of music perception and cognition: A

developmental perspective. In I. Deliege & J. Sloboda (Eds.). Perception and Cognition of Music.
Hove, UK: Psychology Press. pp103-128.
Unyk, A.M., Trehub, S.E., Trainor, L.J. & Schellenberg, E.G. (1992). Lullabies and Simplicity: A
Cross-Cultural Perspective. Psychology of Music, 20(1), 15-28.
Van Lancker, D. (1997). Rags to Riches: Our Increasing Appreciation of Cognitive and
Communicative Abilities of the Human Right Cerebral Hemisphere. Brain and Language, 57(1), 1-11.
Webster, P.R. (1998). Young Children and Music Technology. Research Studies in Music Education,
11, 61-76.
Welch, G.F. (1994). The Assessment of Singing. Psychology of Music, 22, 3-19.
Welch, G.F. (1998). Early Childhood Musical Development. Research Studies in Music Education,
11, 27-41.
Welch, G.F. (2000a). Singing Development in Early Childhood: the Effects of Culture and Education
on the Realisation of Potential. In P.J. White (Ed.) Child Voice. Stockholm: Royal Institute of
Technology. [in press].
Welch, G.F. (2000b). 'The Ontogenesis of Musical Behaviour: A Sociological Perspective'. Research
Studies in Music Education. [in press].
Welch, G.F., Sergeant, D.C. & White, P. (1996). 'The singing competences of five-year-old
developing singers'. Bulletin of the Council for Research in Music Education, 127, 155-162.
Welch, G.F. & Thurman, L. (2000). Vitality, health, and vocal self-expression in older adults. In L.
Thurman and Welch, G.F. (Eds.). Bodymind and Voice: Foundations of Voice Education. [2nd
Edition]. Iowa: National Center for Voice and Speech.
Young, S. (1999). Just Making a Noise? Reconceptualizing the Music Making of Three and Four Year
Olds in a nursery setting. Early Childhood Connections. 5(1), 14-22.
Back to index

LONG-TERM AVERAGE SPECTRUM ANALYSIS OF DEVELOPMENTAL CHANGES IN SINGING VOICE
Proceedings abstract
LONG-TERM AVERAGE SPECTRUM ANALYSIS OF DEVELOPMENTAL CHANGES IN

CHILDREN'S VOICES
Peta White, KTH Voice Research Centre Stockholm, and University of Surrey Roehampton
Background.
Long-term average spectrum (LTAS) analysis has been found to offer representative information on
voice timbre. It is characterised by peaks reflecting the average of formant frequencies. In addition, it
allows conclusions to be made regarding the mean level of the lowest spectrum partial (ie the
fundamental). Both these aspects would contribute to the perceptual properties of the voice. For
example, a study of 11-year-old children revealed differences between sexes in the centre frequencies
of the LTAS peaks.
Aims.
To compare perceived and actual age and sex of children's voices to the LTAS characteristics across a
wider age range.
Method.
A total of 320 children, 20 boys and 20 girls in each of eight age groups (range four to 11 years), were
recorded singing a nursery rhyme. In an earlier analysis, the voices were evaluated with respect to
perceived age and gender. The difficulty in classifying the voices correctly varied between age
groups. In the present study, the centre frequencies of the LTAS peaks and the mean level of the
fundamental were analysed.
Results.
The centre frequencies of the LTAS peaks should be related to the perceptual evaluation data since
formant frequencies depend on vocal tract dimensions; in general, the older the child, the longer the
vocal tract and the lower the formant frequencies. This relationship will perhaps explain the difficulty
in judging age and sex in different age groups.
Conclusions.
The conclusions will concern the relevance of various LTAS characteristics to perceived age and
gender.
Back to index
file:///g|/Sun/White.htm [18/07/2000 00:27:37]

Abstract submission for
Proceedings paper
Listener perception of girls and boys in an English cathedral choir

David M Howard and John E Szymanski
Department of Electronics, University of York, Heslington, York. YO10 5DD, UK.
Email: dmh/jes@ohm.york.ac.uk
Abstract
The presence of girls in English cathedral choirs is becoming increasingly commonplace, but there are those who believe that they are unable to carry out
this role appropriately in this traditionally male dominated arena. It is suggested by some that girls are unable to produce a sound that is in keeping with the
musical traditions of the choral sung divine offices. The aim of this paper is to explore whether or not listeners can tell the difference between the blended
sound of boys and girls in a cathedral musical context. A perceptual experiment was conducted to determine the extent to which listeners can tell whether
boy or girl choristers were singing the top line in snippets from professional English cathedral choir recordings where the lower three parts and the acoustic
environment remained essentially constant. Overall, the results suggest that listeners can tell the difference between girls and boys and that this difference
has statistical significance. In addition, the data indicates that this ability improves with age and that girls are more accurate than boys. It is also clear from
the data that identification abilities vary between some of the musical settings selected as stimuli.
Background
The English cathedral choir has for centuries been the preserve of male singers only with the upper musical line being sung by boys, or trebles, in
pre-pubescence. The traditional musical repertoire composed for sung divine offices has therefore been written for a top line sung by boy trebles and there
are those who would argue that their sound is thereby was specifically desired by composers. As girls have been admitted into cathedral choirs over the last
five years or so to provide an alternative top line to the trebles, a debate has been provoked over whether or not girls can provide a sound that is appropriate
in the musical context of the English cathedral sung divine office. In general, it is either the girls or the boys who sing the top line for the regular divine
offices, allowing the boys to sing less services per week than previously, although they might well be brought together for concerts or large festivals.
Whilst discussions over the appropriateness of the place of girls in cathedral choirs might for some have more bearing on sexist rather than purely musical
issues, important musical questions are posed given that there are basic differences between boys and girls in this age range in terms of their vocal
development which relates to adolescent physiological change.
In order to explore whether or not listeners can tell the difference between boys and girls singing the top line in choral music, a number of potential
variables require consideration. These include the presence or absence of organ or other accompaniment, the potential influence of other sung parts (usually
alto, tenor and bass), the acoustics of the environment, and the recording conditions. In addition, the musical repertoire chosen could influence listener
decisions. Professional recordings of one particular cathedral choir were traced in which the majority of these variables remained relatively constant, and
these formed the basis for the listening material. The remainder of this paper describes the data preparation, the conduct of the tests, the results obtained,
their statistical analysis and draws conclusions.
Method
The purpose of the perceptual listening test was to investigate whether or not listeners can perceive the difference between boys or girls singing the top line
of snippets of traditional cathedral choral music. The musical material was taken from two professionally recorded compact disks (CD) of one English
Cathedral choir. The girl or boy choristers sang with the lay clerks and were recorded in the cathedral on each disk respectively. The material therefore
encompassed minim variability between the recordings in terms of acoustic cues that could cue differences between the two recordings, other than those
arising from the top line itself. The musical repertoire itself was however, different. Indeed it is highly unlikely that professional recordings would be made
of identical pieces of music in such circumstances. The recordings are of Wells Cathedral choir. The CD with the boy choristers singing the top line was
recorded in March 1998 and is titled 'The Glorious Renaissance' (No. GCD4019). That involving the girl choristers is 'I look from afar' (No. LAMM102D)
which was recorded in November 1997.
In order to keep the listening test relatively short with a view to gaining a large number of responses, it was decided to select ten snippets each of boys and
girls singing the top line, and each snippet was to last approximately 20s. This gives a total listening time of less than 7 minutes. The snippets themselves
were chosen such that they contained a fairly continuous top line throughout. In some cases, organ accompaniment was used.
The snippets themselves were extracted from the relevant CD in stereo via a SoundBlaster Gold card using the Goldwave waveform editing package
running on a standard PC computer. Each was edited to last approximately 20s, and then amplitude normalised to keep the volume levels on replay similar
for each. The onset and offset amplitude envelopes were linearly ramped from 0% to 100% in order to avoid audible clicks at the start and finish of each
CD test track. The 20 snippets were archived in a random order onto recordable audio compact disks from which the tests themselves were played using
standard stereo audio equipment. A response sheet was produced that requested the age and sex of the listener, and a tick-box response against boys or girls
for each of the 20 snippets with the instruction: Please indicate whether you think boys or girls are singing in the choral snippets.
The listening tests were conducted with groups of listeners in different acoustic environments using standard CD replay equipment. However, in every case
the listening conditions remained constant during each test with the listeners sitting in a fixed position relative to the loudspeakers. In total, 189 listeners
agreed to take part in the experiment, of whom 81 were female and 108 male, and their distribution with respect to sex and age is shown in table 1.
Listener age ranges
1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 TOTAL
file:///g|/Sun/Howard.htm (1 of 6) [18/07/2000 00:27:40]

Male 11 36 23 9 21 5 2 1 108
Female 19 22 16 8 13 2 1 0 81
Table 1: Distribution of listeners by sex and age decade.
Data analysis
For any experiment involving psychoperceptual judgements it is important to ensure not only that enough data is gathered to provide a fully representative
statistical sample of the population as a whole, but also that all possible sub-populations are also well sampled. Further, an appropriate range of stimuli
must be selected in order to reduce possible researcher bias at the design level.
In this study, the experimental design first required that each listener must give a specific answer - male or female. Null or "don't know" answers were
forbidden in advance. Further, all listeners were exposed to the various stimuli in a similar acoustic environment and were unaware of the accuracy of their
previous responses. This means that each individual listener response forms a Bernoulli trial and that the group responses to any one specific stimulus
forms a sequence of independent Bernoulli trials with a constant probability, p, of success. Under such circumstances the total number of successful
responses over n trials, X, is a random variable which follows a binomial distribution:
(1)
However, to allow a reasonable sampling of the population of stimuli, the experimental design allowed for the initial trials to be carried out with 20
different stimuli, ten of each sex. There existed the implicit assumption that the designers neither had a priori knowledge of the relative difficulties of
correctly identifying any of the stimuli, nor of each of the sex subgroups, nor of any known probability density function describing the relative difficulties
for the full population of appropriate stimuli. Hence each different stimulus will in general be represented by a different binomial distribution, for ni trials
of the ith stimulus, with neither the individual probability Pi nor their probability distribution known in advance as follows:
(2)
The overall characteristics of this design are that the 20 stimuli represent a reasonable compromise between the limitations of pragmatic experimental
logistics and the need to provide an adequate sample of the population of suitable stimuli. In combination with 189 listeners therefore, an overall total of
3780 success/fail Bernoulli trials are available for statistical analysis. The resulting dataset is large enough to allow sensible analyses of sub-populations
which can not only be grouped listener age and/or sex, but also by their relative success in recognising the voices of boys or girls within the stimuli.
However, the full set of 3780 results and any sub-grouping which includes counts over more than one stimulus, will consequently not follow a simple
distribution - in general it will instead have a more complicated composite form, as it arises from the sum of binomial distributions of different means and
variances.
Hence, in analysing the success counts, it is not appropriate to use any statistical tool that presumes normality or makes any other distribution-specific
assumptions. In such circumstances, it is common to use an approximation based on the argument that although the underlying probability distribution is
unknown, any calculated sample mean is itself a random variable which the central limit theorem allows us to assume will follow a normal distribution - an
approximation which improves as the size of the sample is increased. With the further assumption that the measured sample standard deviation is
representative of the population standard deviation, the confidence interval for the true population mean µ is then given approximately by:
(3)
where: the dataset size is n, is the sample mean, s is the sample standard deviation and z is the point of the standard normal distribution.
A confidence level of 99% was chosen for the current work, which is equivalent to a value of z = 2.575. This approach allows the calculation of sample
means and standard deviations for all desired sub-populations as well as the estimation of 99% confidence limits on those means. The inclusion of the
group size within the calculation ensures that the calculated confidence limits correctly reflect the underlying group size, thereby avoiding dubious
conclusions being drawn from inappropriately small data sets.

Figure 1: Histogram of the number of correct identifications given by listeners.
Results
Figure 1 show a histogram of the number of correct identifications given by subjects and it can be seen that the mode is 12 correct responses. It is
interesting to note that three listeners achieved 19 correct responses whilst one achieved just 6 correct. The most important aspect of these data is that there
was considerable variation in the overall abilities of listeners to label individual stimuli themselves correctly as indicated in the analysis discussion above.
Table 2 provides details of the percentage (rounded) of correct listener responses to each stimulus. It can be seen that there is considerable variation
between the stimuli themselves, suggesting that some were rather easier to identify (e.g. stimuli 3, 4, 15, 17, 20) compared to others (e.g. stimuli 1, 2, 6, 8,
14, 16). These considerations informed the choice of statistical test employed as described above. These data are plotted in the form of a histogram in figure
2 which serves to emphasise the variation between the stimuli themselves. It is clear that stimulus 16, which was correctly identified by only 37% of the
listeners, could well contain some strong acoustic 'anti-cue' to the identity of the sex singing the top line.

Figure 2: Histogram of the number of correct listener responses (%) to each stimulus.
Stimuli 1 2 3 4 5 6 7 8 9 10
(1-10)
% correct 43 49 73 76 57 42 63 45 72 65
Stimuli 11 12 13 14 15 16 17 18 19 20
(11-20)
% correct 62 68 65 46 77 37 73 74 50 74
Table 2: Number of correct listener responses (%) to each stimulus.
Stimuli Male listeners (n=108) Female listeners (n=81) All listeners (n=189)
mean (99% limit) mean (99% limit) mean (99% limit)
Boys stimuli (10) 6.389 (±0.129) 6.494 (±0.140) 6.434 (±0.094)
Girls stimuli (10) 5.741 (±0.136) 5.605 (±0.152) 5.683 (±0.101)
All stimuli (20) 12.130 (±0.162) 12.099 (±0.156) 12.116 (±0.114)
Table 3: Statistical analysis of data by listener sex.
Stimuli Child listeners (n=51) Adult listeners (n=138) All listeners (n=189)
mean (99% limit) mean (99% limit) mean (99% limit)
Boys stimuli (10) 5.765 (±0.156) 6.681 (±0.112) 6.434 (±0.094)
Girls stimuli (10) 5.235 (±0.185) 5.848 (±0.119) 5.683 (±0.101)
All stimuli (20) 11.000 (±0.180) 12.529 (±0.136) 12.116 (±0.114)
Table 4: Statistical analysis of data by listener age (child: 0-17 years; adult: >17years).
A statistical analysis was carried to out to investigate three questions with respect to boys and girls singing the top line:
1. Can listeners tell the difference?

2. Is there a listener sex difference in telling the difference?
3. Is there a child/adult difference in telling the difference?
In order to investigate these questions, the sample means and 99% limits were calculated for:
● male listeners and female listeners (table 3), and
● child listeners and adult listeners (table 4).
Statistical significance at the 0.01 level can be ascertained with reference to the 99% limit values. If the mean plus or minus the 99% limit given does not
overlap the mean with which it is being compared and vice versa, then the difference is significant at the 0.01 level.
In regard to the first question posed, it can be observed that all listener groups represented have an ability that is better than chance at the 0.01 level to
identify the sex of the choristers singing the top line. It should be noted in addition that for all listener groups given in tables 3 and 4, boys are recognised
more often than girls when singing the top line, and this is statistically significant at the 0.01 level.
In answer to the second question, there is no statistically significant difference between the abilities of the male and female listeners to identify who is
singing the top line.
The third question has been investigated by splitting the listener group into children and adults by taking under eighteen year olds as children. These data
are given in table 4, and it can be observed that there is a statistically significant difference at the 0.01 level between the identification abilities of adults and
children in identifying either boys or girls. In all cases, the adults recognise the sex of the choristers singing the top line more often that the children
listeners. In addition, this significant difference is also exhibited for all stimuli between the adult and child listeners, whereas this is not the case between
male and female listeners (see table 3).
Discussion
It is clear from these data that there is no single obvious distinguishing acoustic feature that enables listeners to distinguish boy and girl choristers singing
the top line in choral music. However, it has been shown that listeners can distinguish differences between individual untrained boy and girl singers from
approximately the age of 8, and that this ability becomes more accurate as the age of the singers increases (Welch, 2000). In a study of unison singing
reported in Sergeant and Welch (1997), it was found that on average listeners could not do better than chance in identifying a group of unison boys from
unison girls. However, they indicate that the discrimination ability varied depending on the stimulus; a similar comments might be made about the variation
exhibited in figure 2. In these experiments, the music used was all from the cathedral repertoire and it is not believed that there was any feature of the
selected material that carried obvious identity clues (e.g. all hymns by boys and all anthems by girls).
Listeners are able, on average, to discriminate between girls and boys singing the top line and all were better identifying the boys than the girls. Perhaps
this is due to the cathedral tradition of all male choirs which almost all listeners will be familiar with, even if it is only through listening to broadcasts of
large state occasions such as the Coronation, Royal Weddings, State Funerals or Carols from King's. Are there some acoustic cues inherent in the boy
chorister blended sound that provides the listeners with an acoustic 'fingerprint' or hallmark which the girl chorister blended sound either cannot, for
physiological reasons, or has yet to achieve? Clearly, analysis of the acoustic properties of the snippets used in this test could provide some clues.
Adult listeners are able to identify the sex of the choristers more often than children listeners which suggests that there may be an issue of either familiarity
with the sound or listening experience. Could direct working familiarity with the blended sound of cathedral choirs be an important issue here? In order to
understand better this effect, it might be appropriate to carry out further listening tests with ex-choristers, current lay clerks, cathedral organists and
choirmasters as opposed to listeners who have no direct working experience with cathedral music. On the other hand, this could be a result of increased
listening experience emerging from 'lifelong learning' through broadcasts, recorded music, concerts and sacred services. Perhaps it arises from an instinct to
be aware of and look after our young which provides a heightened sense of the differences between the vocal outputs from boys and girls.
Summary and Conclusions

A listening experiment has been described to investigate whether or not listeners can differentiate between boy or girl choristers singing the top line in
choral music. The test snippets were taken from two professional compact disk recordings of Wells Cathedral choir; one involving the girls singing the top
line and the other the boys. The statistical analysis adopted was arranges to allow for the clear differences in the ease with which listeners could label the
stimuli themselves. There were 20 stimuli in the test, half with girls and half with boys singing the top line. A total of 189 listeners took part in the
experiment, where 108 were male and 81 were female.
The analysed data indicated the following in terms of telling the difference between girls and boys singing the top line (* = statistical significance at the
0.01 level):
● boy choristers are more often recognised than girl choristers*
● all listeners can tell the difference*

● male and female listeners are similar in ability
● adult listeners (18 years and over) are more successful than child listeners*
The analysis presented in this paper poses a number of further questions to ask of the data in terms of whether there are finer structures that might be
observed between listeners of different ages and sex, and the extent to which any trends therein are common to both girl and boy chorister stimuli. It is also
clear from the analysis to date that some stimuli are easier to identify than others and the snippets employed in the test will be subject to acoustic analysis,
both standard spectrography (e.g. Baken, 1987) and hearing modelling spectrography (e.g. Howard et al., 1995) the results of which can be considered in
terms of timbre (e.g. Howard and Tyrrell, 1997). Identification of acoustic differences is likely to lead to a greater understanding of the effect of chorister
training. This could in turn lead to the development of real-time feedback devices for use in practice as support tools in the vocal training process.
Acknowledgements
The authors would like to thank all the listeners who took part in these tests.
References
Baken, R.J. (1987). Clinical measurement of speech and voice, London: Taylor and Francis.
Howard, D.M., Hirson, A., Brookes, T., and Tyrrell, A.M. (1995). Spectrography of disputed speech samples by peripheral human hearing
modelling, Forensic Linguistics, 2, (1), 28-38.

Howard, D.M., and Tyrrell, A.M. (1997). Psychoacoustically informed spectrography and timbre, Organised Sound, 2, (2), 65-76.
Sergeant, D.C. and Welch, G.F. (1997). Perceived similarities and differences in the singing of trained children's voices, Choir Schools Today, 22,
9-10.
Welch, G.F. (2000). Personal communication.
Back to index

RELATIONS BETWEEN VOICE RANGE PROFILES AND PHYSIO...AL VOICE CHARACTERISTICS IN TEN-YEAR-OLD CHILDREN
Proceedings paper
PHYSIOLOGICAL AND PERCEPTUAL VOICE CHARACTERISTICS IN TEN-YEAR-OLD

GIRLS AS MANIFESTED IN VOICE RANGE PROFILES
Anita McAllister, Department of Speech Pathology, Danderyd Hospital, Sweden
Background
Voice range profiles VRP displays sound pressure level of softest and loudest possible phonation throughout
the total pitch range, this has been found to have implications also to voice characteristics other than those
related to dynamics (Schutte, 1988; Gramming & Åkerlund, 1988; Ohlsson, Järvholm & Löfqvist, 1987).
Pabon (1991) and Pabon and Plomp (1988) expanded the VRP measurements by including aperiodicity
information. They observed that dysphonic voices, apart from aperiodicity characteristics, also exhibited
deviant VRP contours.
Children's voices have also been analyzed by means of VRP in several investigations (Klingholz, Martin &
Jolk, 1985;Klingholz, Jolk & Martin, 1989; Pedersen, Munk, Bennet & Møller, 1983; Pedersen, Møller,
Krabbe, Munk, Bennet & Kitzing, 1984). All these studies described choir-singing children's pitch ranges,
dynamics and registers, and in two studies (Pedersen et al 1983; Pedersen et al, 1984) also the effects of
voice mutation.
In a previous study of the occurrence of hoarseness among 10-year-old children hoarseness was observed
more frequently in boys than in girls (Sederholm, McAllister, Sundber Dalkvist, 1993; McAllister,
Sederholm, Sundberg, Gramming, 1994). Different factors behind the phenomena such as phonatory
characteristics of non-singer children's voices as reflected in the VRP was also examined (McAllister et al,
1994).
The purpose of the present investigation, was to describe 10-year-old girls' phonatory abilities as manifested
in pitch range in semitones, sound pressure level (SPL) at phonation threshold as well as maximum dynamic
range using VRP and maximum intensity. Physiological and perceptual data are compared.
Method.
Twenty-two girls in third grade (10 years old) from three state schools in the Stockholm area with no special
singing training participated in the investigation. The girls were recorded in a sound treated room at the
clinic. They were subject to a laryngeal examination using video-laryngo-stroboscopy by the late
phoniatrician Patricia Gramming. All girls were also screened for normal hearing.
To identify the hoarse voices, seven speech and language pathologists, all experts on voice disorders,
perceptually rated the recordings. The experts were instructed first to estimate the general impression of
hoarseness before proceeding to the other parameters. Using visual analogue scales, each parameter was
represented by a 100 mm continuous line. The extremes represented non-existence and extremely high
occurrence of the trait, respectively.
The VRP recordings followed the recommendations of the Union of European Phoniatricians (Schutte &
Seidner, 1983). However, SPL was measured using a flat frequency curve rather than dB (A). The
microphone distance was 30 cm. The subjects were asked to sing as soft and as loud as possible at each
pitch on the vowel [α ] . A synthesiser, CASIO SA-20, was used to give reference pitches. Whenever
required, the experimenter also sang the pitches.
file:///g|/Sun/McAllist.htm (1 of 5) [18/07/2000 00:27:42]

Data from a previous study of boys of the same age are used for comparison (McAllister et al, 1994).
Results
Nineteen of the 22 girls could be inspected using indirect microlaryngoscopy, see Table 1. In 10 children
stroboscopy was also used. Seven girls had incomplete glottal closure, ie posterior glottal chinks and three
had complete closure all along the vocal folds according to the stroboscopic examination. No girl in the
present study had vocal nodules and/or hourglass chinks.
Table 1. Results from glottal inspection with and without stroboscopy. Note that the vocal folds of three
girls could not be inspected.
Complete Incomplete Incomplete Hourglass Inspected Total
glottal glottal glottal chinks but closure N
closure closure all closure not
along the along the evaluated
vocal folds posterior
half or two
thirds
Micro-laryngoscopy 3 1 9 - 6 19
Stroboscopy 3 - 7 - - 10
The perceptual evaluation showed that two girls were slightly hoarse. However, as can be seen in Figure 1
the girls had a lower mean hoarseness value than the boys in their peer group, (girls mean hoarseness value
= 23,1, boys mean hoarseness value = 33,4 mm rated hoarseness). The perceptual evaluation rated 13 girls
as free of hoarseness and any related voice dysfunction. One girl had a mutational voice according to a
growth index (Taranger et al, 1976) and the perceptual evaluation.

Figure 1. Rank ordered mean hoarseness values for girls (unfilled squares) and boys (filled diamonds). Note
the knee in the distribution around 40 mm for both boys and girls.
All 22 girls could complete the VRP recording with at least an octave in range. The mean fundamental
frequency range in semitones for these girls with no singing experience apart from that provided within the
regular school system was 25 st, from G3 - Giss5 (196 - 830 Hz), Table 2. The one girl with a mutational
voice had somewhat larger fundamental frequency range than her peers, 33 st.
The mean maximum dynamic range for the whole group, defined as the difference between the upper and
lower VRP contours at a given fundamental frequency was 20,7 dB. Only very minor differences could be
observed between the different voice-groups. However, regarding mean F0 range the girls with glottal
chinks had a somewhat larger F0 range in semitones (st) than the controls.
Table 2. Descriptive mean values for girls singing voices.

Voice group N M F0 range in M dynamic range, M phonation M loudest
semitones dB
threshold, dB phonation, dB
Controls, 11 23,2 20,2 62,3 89,1
Glottal chinks 10 26,3 21,1 62,7 91,0
Mutational 1 33 23 63 93
All girls 22 25,05 20,7 62,5 90,5
The averaged VRP for 10 boys of the same age-group with normal voices as compared to that of the present

11 girls with normal voices showed that the lower contour of the girls was somewhat lower that that of the
boys, see Figure 2. This may reflect the willingness of the vocal folds to vibrate at low driving pressures.
Regarding the upper contour the boys had higher values practically throughout the frequency range
indicating an ability in the vocal folds to vibrate at higher subglottal pressures.
Figure 2. Mean VRP-data for 10 boys (filled diamonds) and 10 girls (unfilled squares) with normal voices.
Discussion
Children have been reported to have a somewhat elevated lower VRP curve as compared to adult voices and
sometimes a more restricted dynamic range (Kotby et al., 1995). In the present study of 10-year old girls the
lower contour was similar to that found in adult female voices (Coleman, Mabis, Hinson, 1977; Gramming,
1991).
All girls could complete the VRP-recording. However, in the previous study of boys of the same age 28%
could not sing the desired pitch and thus could not complete the VRP recording.
Pedersen and co-workers, (1986; 1987) found that register transitions manifested in VRP as a dip of 5 dB or
more the upper contour. This could not be confirmed in the present investigation.

Conclusions
VRP analysis appears to be a useful method for evaluation of ten-year old children's voices. All girls could
produce data for a VRP recording. As compared with adult women, girls seem to have somewhat
compressed dynamic VRP contours reflecting restricted dynamic vocal capabilities.
Key Words: Voice range profiles, girls' voices, pitch range, mutation, dynamics.
References
Coleman RF, Mabis JH, Hinson JK. Fundamental frequency-sound pressure level profiles of
adult male and female voices. J Speech Hear Res 1977;20:197-204.
Gramming P. The Phonetogram. An Experimental and Clinical Study, Diss., Dept. of
Otolaryngology, Malmö General Hospital, Sweden: Lund University, 1988.
Gramming P. Vocal loudness and frequency capabilities of the voice. J Voice 1991;5:2:144-57.
Klingholz F, Jolk A, Martin F. Stimmfelduntersuchungen bei Knabenstimmen (Tölzer
Knabenchor). Sprache-Stimme-Gehör 1989;13,107-11.
Klingholz F, Martin F, Jolk A. Die Bestimmung der Registerbrüche aus dem Stimmfeld.
Sprache Stimme-Gehör 1985;9:109-11.
McAllister, A., Sederholm, E., Sundberg, J., Gramming, P. Relations between Voice Range
Profiles and Physiological and Perceptual Voice Characteristics in Ten-year-old children. J
Voice 1994, 3:230-239.
Ohlsson A-C, Järvholm B, Löfqvist A. Vocal symptoms and vocal behaviour in teachers. Scand
J Logoped Phoniat 1987;12:61-9.
Pabon P, Plomp R. Automatic phonetogram recording supplemented with acoustical voice
quality parameters. J Speech Hear Res 1988;31:710-22.
Pedersen MF, Munk E, Bennet P, Møller S. The change of voice during puberty in choir
singers measured with phonetograms and compared to androgen status together with other
phenomena of puberty. Proceedings of the Tenth International Congress of Phonetic Sciences,
1983:604-608.
Pedersen MF, Møller S, Krabbe S, Munk E, Bennet P, Kitzing P. Change of voice in puberty in
choir girls. Acta Otolaryngol (Stockholm) 1984;Suppl. 412;46-9.
Schutte HK, Seidner W. Recommendation by the Union of European Phoniatricans (UEP):
Standardizing Voice Area Measurements/Phonetography. Fol Phoniat 1983;35:286-88.
Sederholm E, McAllister A., Sundberg J, Dalkvist J. Perceptual analysis of child hoarseness
using continuous scales. Scan J Logoped Phoniat 1993;18:73-82.
Taranger J, Engström I, Lichtenstein H, Svennberg-Redegren I. The somatic development of
children in a Swedish urban community. A prospective longitudinal study. VI. Somatic
pubertal development. Acta Paediatr Scand 1976;Suppl. 258.
Back to index

VOCAL SELF-EXPRESSION AND SELF-DETERMINATION
VOCAL SELF-EXPRESSION AND SELF-DETERMINATION
Leon Thurman, Ed.D., Fairview Voice Center, Fairview-University Medical Center, Minneapolis, Minnesota,
USA
Background
During prenatal development, human biological processes evolve a unique bodymind. Newborn human
bodyminds possess a unique array of primary capability-ability clusters that gradually process experiences
into the formation of a unique neuropsychobiological self. Three interrelated primary capability-ability
clusters are predominantly involved in self-determination: (1) interactive-expressive, (2) imitative, (3)
exploratory-discovery. The neuromuscular coordinations that produce voice are an integral part of the
interactive-expressive cluster, and the other two clusters are prominently used in the development of spoken
and sung self-expressive abilities. During childhood, adolescence, and adulthood, patterned genetically
triggered brain growth spurt cycles occur that amplify those capability-ability clusters and bring new ones
on-line. Human capabilities can be converted into optimal abilities only with optimal environmental support.
Aim
This paper argues that (1) all human beings who have relatively normal vocal anatomy and physiology are
capable of learning skilled, expressive speech prosody and singing abilities, and (2) optimum development
of these abilities can play a major role in the development of empathic social relatedness, constructive
personal competence, and self-reliant autonomy, primary characteristics of constructive selfhood.
Main Contribution.
The neuropsychobiological benefits of self-expressive speaking and singing will be presented, along with
decrements to self-identity formation that can result from lack of, or suboptimal development of,
self-expressive speaking and singing. Evidence in support of the paper's contributions will be from the
neuropsychobiological sciences and from case histories of voice disordered patients and voice education
clients at Fairview Voice Center.
Implications.
Evidence cited in this paper may be used to: (1) challenge social myths about the genetic heritability of
"good speaking voices" and "good singing voices", (2) point parents, general educators, and music
educators toward ways to optimally support the development of self-expressive speaking and singing skills,
and thus, enhance the development of constructive selfhood, that is, empathic social relatedness,
constructive personal competence, and self-reliant autonomy.
Back to index
file:///g|/Sun/Thurman.htm [18/07/2000 00:27:42]

Perception of similarity and related theories
Proceedings paper

Irène Deliège
URPM - CRFMW
Department of Arts and Sciences of Music
7 place du Vingt Août
B - 4000 Liège
Tel. and fax: +32 2 660 10 13
e-mail: irene.deliege@pi.be
In this paper, a sketch will be presented of the various avenues of thought that have led to the development
of a theoretical framework that allows for the investigation of the psychological organisation underlying
musical listening. The model is based, first, on the Gestalt principles and the contribution of Lerdahl and
Jackendoff to the theory of grouping; and then, on the opposition between the idea of SIMILARITY which
links objects together and that of DIFFERENCE which distinguishes objects from each other. These
oppositions have led to the development of some new concepts -cue abstraction and the formation of
imprints - which have provided us with an approach to the study of the activity of categorisation that
underlies the mental organisation of musical information.
The Gestalt heritage
It is nowadays well-known that the Gestalt laws formalize, in essence, a spontaneous and unconscious
tendancy of human psychological mechanisms to define units in the perceptual field generated by the
properties of objects and the relationships between them: proximity, similarity, common fate and good
continuation. These factors determine how a perception is segmented and how it is organized into groups
by generating boundaries between regions.
The goal of the first chapter of Lerdahl and Jackendoff's A Generative Theory of Tonal Music (1983) was to
present a systematic formalisation of the general laws of perception as applied to musical rhythm. This
theory has already been described on more than one occasion (e.g. Deliège, 1987a) therefore there is little
point in dwelling on it here.
As I pointed out in a study on this subject published in 1987, a common idea, that can be applied regardless
of the grouping principle involved, accounts for the emergence of boundaries within a total structure: a
grouping boundary is invariably perceived when there is a perceived difference between the groups
adjoining the boundary, as opposed to a similarity between the elements within the groups (Deliège 1987a).
The perception of SIMILARITY and DIFFERENCE
This causes us to turn our attention to the rôle of similarity in perceptual processes, one of the most
important problems in contemporary cognitive psychology (as emphasized recently by Jean-François Le
Ny (1997)). The importance of the idea of SIMILARITY and, through this idea, the various degrees of
equivalence that follow from it - identity, repetition, invariant-variant relationship - has been described
many times in the context of musical experience and practice by composers, theorists and music
file:///g|/Sun/Deliege.htm (1 of 7) [18/07/2000 00:27:45]

psychologists as diverse as Schoenberg (1967), Webern (1933/1980), Souris (1976), C. Deliège (1984,
Chapter VI) and Imberty (1997). Incidentally, the subject is still topical in the field of psychology if one is
to judge from the number of recent meetings, symposia and publications dedicated to this subject.
Consider, for example, Simcat 97, the Interdisciplinary Workshop on Similarity and Categorisation
organised at the University of Edinburgh in November 1997; and the special issue of the review Computing
in Musicology devoted to melodic similarity, published recently by MIT Press.
Also, in the context of an experimental study that I carried out in 1991 (I. Deliège 1991) on the perception
of invariance/variance that took Steve Reich's Four Organs as a starting point, my conclusions converged
with those from ethnomusicology. According to Gilbert Rouget (1990), the perception of SIMILARITY
could be more developed in certain ethnic groups. For example, Simha Arom (1985) encountered in Central
Africa concepts of similarity and of identity for which the perceptual scales were more flexible than our
own and went beyond what we would customarily include under these headings in our daily perceptual
experience. In particular, he cites among others two musical fragments - one rhythmic, the other melodic -
as examples of the sense of similarity experienced by African musicians (see Figure 1). The rhythmic
sequences are, it seems, judged to be the "same" when each has the same total number of events, the
arrangement and rhythmic articulation of these events being insignificant. As for the two melodic
sequences, these are perceived to be "identical" because they have a similar melodic contour and because
each only uses notes from a single pentatonic scale. For all that, the idea of universality that is recognized
as being a feature of the problem of SIMILARITY in the field of psychology is neither breached nor
reduced, but we have just pointed out the effect of cultural milieu as a factor that could influence the ways
in which psychological mechanisms adapt to their musical environment: the analysis and understanding of
these processes could therefore force us to introduce tools that are better adapted to the problem than the
Gestalt laws alone.
Fig 1 [If you have Finale software installed click here to download file]

It was these ideas that led me, a decade or so ago, to the development of a model based on the Principles of
Similarity and Difference (I. Deliège 1989) as the organising principles underlying musical listening. These
principles arose directly from the Gestalt principles of which they constitute an extension, or, more
precisely, a generalisation insofar as the psychological mechanisms concerned, previously analysed in
terms of proximity, similarity, common fate, good continuation etc., are henceforth divided only into two
categories: that of Similarity which proposes that small differences between elements within a constituent
are minimised; and that of Difference, which assumes that the contrast between elements adjacent to a
boundary are emphasized. The Principles of Similarity and Difference are thus defined, on the one hand as
laws of description and structural analysis of musical perception, while, on the other hand, their function is
based on a preliminary act of categorisation, that is to say a line of thought that focuses on the explanation
of the underlying mental processes. I want to emphasize this latter aspect in particular because it distances
the model from its Gestalt heritage, whilst also developing it: the first priority of the Gestalt theorists was to
emphasize the importance of the structural relationships that govern perception and to express the
principles underlying these structural relationships in the form of laws that were supposed to respect the
holistic character of perceived totalities; but, as Johnson-Laird has pointed out, the description of mental
processes was not necessarily privileged (1988/1993, p.23). This aspect of cognitive processes that divides
up structures according to the categories provided by the proposed principles turns out to be essential in the
study of the formation of a mental schema of a musical work.
References to theories of categorisation
Recently, George Vignaux wrote (1999, p.13):
Similarities allow us to group objects together, differences allow us to set them apart. Meaning

arises from this double-game of similarity (which gathers together families, species and
moments in time) and difference (which sets families, species and moments apart either in
time or in form.)
Thus, the main ideas of Similarity and Difference are recognized once again as being fundamental to the
activity of thinking about the world, organising it and classifying the objects and phenomena that one finds
in it. It remains only to specify how one can gain access to these concepts.
For Bruner, "all perceptual experience represents the final product of a process of categorisation" (1958,
p.3). In other words, when faced with the daily environment, an individual refers to the knowledge that he
has acquired and stored in memory and classifies what he perceives according to the collection of
categories that he has already developed. On the other hand, an unfamiliar environment arouses a degree of
uncertainty inherent in novel experiences. In this case, the product that Bruner talks of gradually becomes
reality through intermediate attempts at classification - a play of equivalences and comparisons - that
continue until the need to understand the structures encountered succeeds in assimilating these new
structures to the world that the perceiver has internalised:
J.S. Bruner wrote that an object, an event or a sensation that is unclassifiable...is a
phenomenon that is so rare that the possibility of such a phenomenon existing is doubtful. If a
perceptual experience could be so virginal and bereft of all categorisation, then it would be a
pure diamond enclosed in the silence of internal experience (ibid., p.4).
One can easily imagine how such operations may be organised in the case of everyday experiences, but the
case of musical listening raises problems of a quite different nature. To propose the hypothesis that the
comparisons and classifications of structures are generated during the listening process seems rather
inexplicit so long as their object has not yet been identified. In other words, it is essential that the terms on
which the comparisons are made be specified if one wishes to provide a valid description of the process by
which a subject achieves an understanding of the musical environment during listening.
The dimension of time in which the work unfolds poses a particular problem. Yet, the idea that such
processes may intervene, at the conclusion of the listening process, with reference to certain incidental,
scattered memories, cannot be supported. Indeed, being able to remember events requires that the mental
schema starts being built up at the beginning of the listening process and progresses and evolves over the
course of listening to the work. Modelled on the theory of discourse perception advanced by Bartlett and
later by Kintsch and van Dijck, namely a focussing of attention on a selection of key elements that reduces
the complete collection of events to a size that can be managed by memory, I have proposed since 1987 the
hypothesis that an analogous selection process - that I call cue abstraction - operates over the course of
musical listening (I. Deliège 1987b, 1989, 1991). However, as the product of such a selection process in
this particular case lacks semantic reference, it is necessary to specify its substance.
A cue is a salient element which is prominent in the musical surface. This idea of key structures that play a
foreground rôle in the whole musical work, is undoubtedly an example of the idea of Figure/Ground
discrimination applied to the perceptual organisation of musical structures, an idea that again brings us
close to the Gestalt-influenced model. But beyond its connections with the original structures, a cue
generates different cognitive strategies by virtue of the very fact that it is "emergent" - a property that
confers upon it, cognitively speaking, a more clear definition with respect to the rest of the musical
environment. As I have pointed out elsewhere (I. Deliège 1989, p.307), the idea of a cue that is presented
here refers to that aspect of a sign that Charles Pierce talks about when he defines a cue (or index) as "a
sign that refers back to the object that it denotes because it is in a real sense affected by this object" (2.248,
p.140). This idea was echoed by Ignace Meyerson (1948/1995) for whom a cue has a close and permanent
connection with its original structures: "it is a fragment of this reality ... bound to the fragments that it
evokes by natural ties" (p.77).
As soon as it has been abstracted, the cue plays an active rôle in more than one account in the listening

process. In the first place, it has caught the attention of the listener and thus becomes fixed all the more
effectively in long-term memory; but, in conjunction with the storage mechanism, it "summarizes" the
sequences from which it arose into a succinct representation, a sort of label, that lightens the memory load
required to internalize the whole structure. Musical time is thus progressively marked out, the different cues
that are abstracted during listening acting as waymarkers or milestones. This gives rise to the notion of a
mental line which makes reference to a symbolic "musical space" in which the fundamental articulations of
the mental schema of the work are drawn. In addition, a cue always acts in concert with the other main axes
of the model - the Principles of Similarity and Difference - in the context of which a cue acts as a
cornerstone in the process of musical categorisation and has conferred upon it its primary dynamic
function. The cue therefore provides us with the basic point of reference for the comparisons between
musical structures that occur throughout the listening process.
Over the past ten to fifteen years, the theoretical aspects of the model have been investigated
experimentally in various ways, the principal areas of study being processes of segmentation, categorisation
strategies and the organisation of long-term memory in the mental schema. This research has been carried
out on a collection of works chosen from different periods of the musical repertoire (Bach, Schubert,
Wagner, Debussy, Reich, Berio, Boulez, etc.) Among the most significant contributions of this work is the
discovery that categorisation is always observed to act in the case of musical listening, even when the
studies have not been directly aimed at studying the categorisation processes in question. The results of
studies on segmentation show this explicitly (I. Deliège 1989, 1990, 1997): a cue acts as a driving force in
combination with the Principle of Similarity, that groups the structures established at the beginning of the
model, the Principle of Difference intervenes in order to signal the introduction of a new cue structure and
the beginning of a new group that will be developed along similar lines. Finally, an insistence upon using
the same cue over the duration of its employment by a composer-through literal repetitions or more or less
varied elaborations-generates a prototype figure, the imprint, in cognition. That is, human memory cannot
record in detail all the varied ways in which a cue is presented and so it finds a kind of "mean" that captures
the main features of all the presentations whilst still effecting that simplification of the musical environment
that is common to all processes of categorisation (Gineste 1997, p.95). Besides the importance of change,
the stress is placed here on Similarity and focuses on what happens between the boundaries defined by the
principle of Difference. Thus, with the intervention of processes of categorisation, the schema of the
complete piece is built up - a schema where the categories of Similarity take priority in the development of
musical time, the principle of Difference operating only briefly at certain points to punctuate the boundaries
between groups.
At this point in my exposition and in relation to the idea of psychological constants that I have stressed
since my earliest work, it is interesting to note a parallel between the fundamental principles of the theories
of Eleanor Rosch (1978) and the various articulations of the model based upon the Principles of Similarity
and Difference that have just been described. In effect, this constitutes a musical application that is quite
close to that proposed by Rosch for the case of categorisation of the environment and thus suggests the
existence of deep analogies between the cognitive foundations underlying information processing in quite
different domains.
With respect to the concept of category, Rosch defines two dimensions: horizontality and verticality.
Horizontality consists of arranging the members of a collection along a scale-for example, all the possible
variations of the same motif. This generates the concept of prototypicality which specifies the "central"
member of a category - the prototype - which is the best representative of the category as a whole. On the
other hand, verticality provides a hierarchical perspective on the cognitive organisation in question. At a
so-called basic level, the elements are mutually independent: they each have their own characteristics but
belong to the same higher, superordinal level. Finally, the lower or subordinate level can be understood as
being a sort of equivalent of the idea of horizontality, described above, inasmuch as it involves the use of a
single basic element in all the desired variants. For example, the superordinal level mammals contains, at
the basic level, elements such as dog and cat; the subordinate level being occupied by the different breeds

of dog (poodle, greyhound, basset, etc.) and cat (persian, chartreux, siamese, etc.)
A parallel between these theoretical principles and those described with respect to music could allow us to
see horizontality as being instantiated in the variations developed from a single cue - a dimension that could
prove particularly fertile in the case of musical composition and listening. Viewed from the perspective of
Similarity, they generate an Imprint, that is, an analogue of the prototype viewed from the angle of
typicality. Concerning the hierarchical aspects of the principle of verticality in the field of categorisation of
musical structures, one could find a parallel, for the basic level, in the various cues abstracted within a
piece. They satisfy the criterion of independence that Rosch's theory requires and generate their
subordinates, that is, the collection of structures derived from them in the form of variations-which
connects them with the notion of horizontality defined above. Finally, the effect of the Principle of
Difference is to segment the work into periods: those structures that are gathered around each cue that is
abstracted at the basic level-that is, the collection of variations derived from the cue-are grouped together at
the higher level: the superordinal.
In conclusion, it seems that the principles governing the psychological organisation of music perception
adhere quite closely to the properties of Rosch's model. This positive feature as regards the development of
the present model of music perception should not, however, be understood as an attempt to veil the
complexity of that cognitive ability which is musical listening. As in any model, the project aims to gain an
understanding of the psychological landscape of a particular domain and then to predict and discern as
many aspects as possible. It is dangerous, however, to believe that a model could be complete and that it
could reproduce in all respects the phenomenal reality that it aims to account for. A recent warning from
Jean-Pierre Dupuy (1999) emphasizes, precisely with respect to this, that
a scientific model is right from the start an imitation that has the same relationship with reality
that a "reduced model" has with the object for which it is intended to be a more easily
manipulable copy. (p.18)
Subsequent stages of the research will have to put forth models in which the listening process is imitated
more and more literally. This will allow us to understand aspects of music cognition that we have not yet
even glimpsed.
References
Arom, S. (1985). De l'écoute à l'analyse des musiques centrafricaines, Analyse Musicale, 1, 35-39.
Bruner, J. S. (1958). On perceptual readiness. Psychological Review, 64, 123-152.
Deliège, C. (1984). Les fondements de la musique tonale, Paris, Lattès.
Deliège, I. (1987a). Grouping conditions in listening to music : An approach to Lerdahl & Jackendoff's
grouping preference rules », Music Perception, 4 (4), 325-360.
Deliège, I. ( 1987b). Le parallélisme, support d'une analyse auditive de la musique : Vers un modèle des
parcours cognitifs de l'information musicale », Analyse musicale, 6, 73-79.
Deliège, I. (1989). Approche perceptive de formes contemporaines. In S. McAdams et I. Deliège (Eds)La
Musique et les Sciences cognitives, Bruxelles, Pierre Mardaga, pp. 305-326.
Deliège, I. (1991). L'organisation psychologique de l'écoute de la musique. Des marques de sédimentation
- indices et empreinte - dans la représentation mentale de l'oeuvre, Thèse de Doctorat, Université de Liège,
non publié.

Deliège, I. (1991). La perception de l'opposition Invariant/Variant. Etude expérimentale à partir de l'oeuvre

de Steve Reich: "Four Organs", Psychologica Belgica, XXXI-2, 239-263.
Deliège, I. (1997). Similarity in processes of categorisation : Imprint formation as a prototype effect in
music listening,Proceedings of the Interdisciplinary Workshop on Similarity and Categorisation,
University of Edinburgh , pp. 59-65.
Dpuy, J.P. (1994/1999). Aux origines des sciences cognitives, Paris, La Découverte.
Gineste, M.D. (1997). Analogie et cognition, Paris, Presses Universitaires de France.
Imberty, M. (1997). Formes de la répétition et formes des affects du temps dans l'expression musicale,
Musicae Scientiae, I (1), 33-62.
Johnson-Laird, P.N.(1988/1993). The computer and the mind. London, Fontana press.
Le Ny, J.F. (1997). Preface to the book by M.D. Gineste, Analogie et cognition, Paris, Presses
Universitaires de France.
Lerdahl, F. & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, Mass. M.I.T. Press.
Meyerson, I. (1948/1995). Les fonctions psychologiques et les oeuvres, Paris, Albin Michel.
Peirce, Ch. (1974). Collected Papers , Cambridge, Mass., Harvard University Press.
osch, E. (1978). Principles of categorization. In E. Rosch et B. Lloyd (Eds)Cognition and categorization.
Hillsdale, N.J., Lawrence Erlbaum, pp. 28-49.
Rouget, G. (1990). La répétition comme universel de la musique. In La musica come linguaggio universale.
Genesi e storia di un'idea, sous la dir. de R. Pozzi, Florence, Olschki, pp. 189-201.
Schoenberg, A. (1967). Fundamentals of musical composition., G. Strang et L. Stein (Eds) Londres, Faber
& Faber.
Souris, A. (1976). Conditions de la Musique et autres Ecrits, Bruxelles, Editions de l'U.L.B. et Paris,
C.N.R.S.
Vignaux, G. (1999). Le démon du classement, Paris, Seuil.
Webern, A. (1933/1980). Chemin vers la nouvelle musique. Paris, J.C. Lattès, coll. M & M.
Back to index

PERCEIVED SOCIAL SUPPORT AND CHILDREN'S PARTICIPATION IN MUSIC
Proceedings paper

Katherine J. Ryan, Michael J. Boulton, Susan A. O'Neill, John A. Sloboda
Psychology Department, Keele University
Introduction
Previous research has indicated that the social support received from parents, peers, and teachers can influence
children's level of engagement in music. Yet there is only limited research available that has looked at the effect
of the child's perception of support in relationship to motivation and engagement in activities, and none has
been found to-date in the music domain.
Research on elementary and middle school students' in the United States has documented the significant
relationship between students perceptions of support and caring from parents, teachers and peers to the positive
aspects of motivation and academic success (e.g. Wentzel, & Ahser, 1995; Wentzel, 1998). Wentzel (1998)
found that although supportive qualities of interpersonal relationships were significant predictors of the
academic and social aspects of motivation, they seem to play relatively independent roles in young adolescents'
lives. Additionally, the effect of having multiple sources of support on motivational and academic outcomes
were primarily additive rather than compensatory. Although these findings can not be generalised across
countries and domains, they clearly suggest that the relationships with parents, peers, and teachers can have a
potentially powerful influence on students' overall emotional well-being at school.
Due to the lack of research in the music domain, we have drawn on related studies, which have focused on
understanding and predicting the role of support in the development of musical achievement and performance
skills. Research suggests that parents provide an important source of motivation, by generating and sustaining
children's interest and commitment to music lessons, both initially and in the long-term (e.g. Davidson, Howe,
Moore & Sloboda, 1996; O'Neill, 1994; Sloboda & Howe, 1991; Sosniak, 1985, 1990). The active support
provided by parents, such as attending lessons, obtaining feedback from teachers and supervising daily practice
has been identified as a major contributory factor in the development of musical performance skills in
successful musicians (Davidson et al, 1996). Indeed many high-achieving musicians reported that without this
level of active parental intervention they would not have spent so many hours practising. The majority of past
research in this area has tended to focus on key behaviours (such as supervision of practice) rather than the
influence parental attitudes and expectations may have on young people's involvement in music. Yet according
to Csikzentmihalyi, Rathunde, and Whalen, (1993), for young people to transform their potential into actuality,
'the support of family, in both material and psychological terms, is essential' They found that one of the key
factors which separated 'talented' from 'average' young people in a variety of domains, including music, was that
talented individuals tended to come from families which provided both a stable environment where individuals
felt a sense of support, whilst at the same time, members were encouraged to develop their individuality by
seeking out new challenges and opportunities. These findings are similar to other research regarding the
importance of parental influence in children's development across a variety of domains, such as sport and maths
(e.g. see Eccles, Wigfield & Schiefele, 1998). In a study investigating children's motivation for instrumental
music Yoon (1997) found that children's perception of their parents' value for music was a good predictor of the
level of parental involvement. Yoon suggests it is also reasonable to suggest that the parents' actual level of
involvement in their children's musical activities can influence children's perception of their parents' valuing of
music. O'Neill and Boulton (1995a) found that the majority of children aged 9-11 thought that playing a musical
instrument would please their parents, although it is still unclear how children's perceptions of parents'
expectations relate to their levels of participation in musical activities and their subsequent performance
achievement.
file:///g|/Sun/Ryan.htm (1 of 13) [18/07/2000 00:27:50]

Peers are another key social group that can influence the behaviour and attitudes of young people (Best, 1983;
Boulton & Smith, 1994; Sroufe, Bennet, Englund, & Urban, 1993; Thorne & Luria, 1986; Thorne, 1986).
Research looking at the role of peers in the continuing motivation and participation in sports and the arts in
early adolescence, found that talented adolescents' relationships with peers appeared to serve an important
motivational function with respect to continued commitment to their talent area (Patrick, Ryan, Alfeld-Liro,
Fredricks, Hruda & Eccles, 1997). This is consistent with previous research on the role of peer support and
involvement with regard to participation in sport (Scanlan, Carpenter, Lobel & Simons, 1993). Berndt and
colleagues (1990) found that discussions among pairs of friends influenced their decisions in motivation-related
dilemmas (e.g. whether to complete a homework assignment) and that children tended to make friends who
were most similar in their decisions. Studies also suggest that peer group pressure to conform operates in the
domain of music. Children will hide their real musical interests in order to conform to group norms and avoid
the judgement and response of their peers (e.g. Finnas, 1987). They may even consider abandoning playing
instruments if the negative feedback from peers, such as bullying or name calling, becomes too much (Howe &
Sloboda, 1992). O'Neill and Boulton (1995b, see also O'Neill, 1997) found that both female and male
participants in their study thought a child of the same sex as themselves would be liked less, and bullied more,
by other children if they played an instrument that was considered 'gender inappropriate' (e.g. a boy playing the
flute, a girls playing the drums). Davidson, Howe, and Sloboda (1997) found it was important for children to
have the opportunity for informal musical engagement, such as playing with friends or family, as well as
opportunities for formal practice. Therefore children who have friends, in or out of school, who play
instruments could have more opportunity for these kinds of informal opportunities which may make music more
fun and acceptable in their peer group.
Research has shown that the teaching context can have a profound influence on children's performance
achievement and engagement. Indeed research investigating the development of children's musical skills has
placed great emphasis on the effect of teachers' expectations on learners' achievement, with low achievement
and low teacher expectation being highly correlated (e.g. Rosenthal & Jacobson 1968; Blatchford, Burke,
Farquhar, Plewis & Tizard 1989). An instrumental music teacher is often the first significant adult a child will
come into one-to-one contact with, other than members of the child's family or a child-minder. A music teacher
may spend up to one hour each week with a child, often for several years. It is a unique and possibly critical
learning relationship which may have long-lasting effects on the child. A study by Sloboda (1989) of
autobiographical memories of emotional responses to music in childhood found that adults who were not
involved in music, or who considered themselves to be unmusical, were more likely to report that they had
negative musical experiences in educational contexts during childhood (i.e., where some attempt to perform or
respond to music was criticised by early teachers), than adults who considered themselves to be musical.
Despite the important role music teachers play, few studies have explored the characteristics associated with
effective teachers of young instrumentalists. According to findings by Soniak (1985; 1990), Sloboda and Howe
(1991) and Davidson et al (1997), successful young musicians were more likely to have a first music teacher
who were reported to have characteristics such as warmth, enthusiasm and encouragement. These 'warmth'
characteristics were considered more important than the ability for these teachers to display impressive
technical skills on an instrument. Children who had given up lessons did not differentiate between their initial
teacher's 'personal' and 'professional' characteristics in the same way. This suggests that in the early stages of
learning, the personal characteristics of teachers are important for promoting children's musical development
and their continuing of lessons. North, Hargreaves and O'Neill (2000) found significant sex differences in the
reasons young people gave for playing instruments. Boys were more concerned than girls with creating an
external impression, such as being trendy or creating an image. Whereas, a non-significant trend indicated that
girls were more concerned with pleasing others, such as parents, teachers and friends. O'Neill (1997) reports
that more girls than boys are involved in, and successful at, musical activities at school, with approximately
twice as many girls learning to play instruments. Girls also achieve a higher percentage of passes at all levels
than boys in school music examination (DES, 1991).
There is little doubt that children's participation and level of engagement is inextricably linked to their social
and cultural environment. However, it remains unclear how children's perception of different social support
influences their involvement in music. The present study aims to address this by examining the relationship
between children's perceptions of social support by parents, peers and teachers and their levels of participation

in music.
Method
Participants
The present study is part of a longitudinal project investigating the social and motivational factors influencing
young people's participation and achievement in music. During Year 1, 1209 children, (Females = 585, Males
=624) aged 10-11years (mean age 10.5, SD .49), attending 35 primary schools in North Staffordshire
participated in the project. Children were recruited through their school and parental consent obtained.
Procedure
The children completed a questionnaire designed to assess their level of engagement in music and perceived
social support. All items were answered using 7-point Likert-style response scales, except where categorical
responses were required, such as "Do you play an instrument? yes or no". All scales have good psychometric
properties (details given below). The questionnaires were administered with verbal instructions to the children
on a classroom basis in the selected schools. The children completed the questionnaires independently typically
within 45 minutes.
Measures
The children's level of musical participation was measured in terms of how often children reported playing
instruments, for example, how often they played an instrument by themselves or with friends. The 'Playing
Instruments' scale began with the phrase "How often do you...." And had anchors of (1) never to (7) very often.
The 10-item scale had good internal reliability (Cronbach's alpha = .87). (See Appendix for full list of scale
items for all measures).
The child's perceived social support from parents, peers and the school music teacher were measured by
separate support scales. The 'Parent' support scale both began with the phrase " If you played a musical
instrument, how much do you think your parents would......" and had anchors of (1) not very much to (7) a lot.
The 12-item parent support scale assessed perceived support and expectations from parents, for example, how
much the children thought their parents would be pleased or help them to play an instrument. A principal
component analysis using varimax rotation confirmed there was one factor accounting for 50% of the variance.
The 'Parent' scale had excellent internal reliability (α = .9096). The 9-item teacher support scale began with the
same phrase and also assessed perceived support and expectations, for example, how much the children thought
their teachers were pleased with the work they do in class or wanted them to pass exams. Principal component
analysis confirmed the 'Teacher' scale also had one factor, accounting for 54% of the variance. The 'Teacher'
scale had high internal reliability (α =.8928). Peer support was measured by the 'Friendship' support scale which
also began with the same phrase as the parent and teacher scales: "If you played a musical instrument, how
much would your friends........" and had scale anchors of (1) not very much to (7) a lot. The 10 item friendship
support items measured social and active support from peers. Principal component analysis confirmed that the
scale also had one factor, accounting for 46% of the variance. The 'Friendship' scale has high internal reliability
(α =.8689).
Results
The results are presented in three sections. First, details of the principal component analysis performed on the
social support and playing instruments scales are presented, with details of the composite scale scores that were
created for further analysis. Second, descriptives of the relations between the children's level of participation
and perceived social support are described. Finally, multiple regression analysis of the social support predictors
for children's level of participation are presented.
Principal Component Analysis
Playing Instruments
A principle component analysis with varimax rotation confirmed that there were two factors, accounting for

60% of the variance. The scale had high internal reliability (α = .8720), and the two factors were interpreted as
formal (M = 2.90, SD =1.7) and informal playing (M = 2.51, SD =1.4). Scale scores were created from the
mean composite scores for 'Formal' (5 items, α = .8407) and 'Informal' (4 items, α = .7238) playing and used in
all further analysis. The possible scale range was from (1) minimum to (7) maximum. The scale name 'Formal'
refers to playing that mostly occurs within the school setting, including the Year 6 music class and instrumental
lessons, whereas 'Informal' playing refers more to out of school playing, such as with friends or family
members.
Relations Between the Children's Level of Participation and Perceived Social Support
The children were assigned to a categorical cohort based on their self-reported level of participation in playing
instruments. The three cohorts are (1) Players (those who presently play an instrument), (2) Gave up's (children
who had previously played an instrument but given up), and (3) Non-players (those who had never played an
instrument). These cohorts were used in further analyses to examine how children's perceived support
influences their level of participation in playing instruments.
Using GLM multivariate analysis a 2 (Sex) x 3 (Cohort) x 3 (Support) analyses of variance (ANOVA)
identified significant main effects for Sex with 'Parent' (F= 46.275 (df=1), p<.0001), 'Friendship' (F=63.149
(df=1), p<.0001), and 'Teacher' (F=14.587 (df=1), p<.0001) support scales. Significant main effects were also
identified for Cohort with 'Parent' (F= 53.931 (df=2), p<.0001), 'Friendship' (F=19.426 (df=2), p<.0001), and
'Teacher' (F=39.008 (df=2), p<.0001) support scales. No significant interaction effect was found. (See Table 1
for the means and standard deviations).
Table 1
Mean Scores and Standard Deviations for Perceived Support as a Function of Gender and Cohort
Support Scales
'Parent' 'Friendship' 'Teacher'
Group n Mean SD Mean SD Mean SD
Female 5.31 1.1 4.33 1.2 4.88 1.4
Players 394 5.49 1.0 4.49 1.2 5.14 1.3
Gave ups 126 5.08 1.1 4.16 1.2 4.38 1.4
Non-players 65 4.75 1.4 3.97 1.3 4.40 1.5
Male 4.74 1.5 3.57 1.4 4.37 1.5
Players 311 5.12 1.3 3.87 1.4 4.83 1.4
Gave ups 144 4.42 1.5 3.20 1.3 4.05 1.5
Non-players 168 4.31 1.4 3.30 1.3 3.86 1.4
Total
Players 705 5.33 1.1 4.22 1.3 5.01 1.4
Gave ups 270 4.73 1.3 3.64 1.3 4.20 1.4
Non-players 233 4.44 1.4 3.49 1.3 4.01 1.4

Further post-hoc analyses of cohort and support identified significantly higher levels of perceived support from
parents, friends and teachers among children who reported they currently played an instrument compared to
those who had given up or never played. Children who had previously given up also reported higher perceived
support from parents than children who had never played an instrument. Examining the means also highlights
that the significant sex difference is due to girls reporting more perceived support from parents, friends and
teachers than boys.
Regression
The objective of this study was to examine how children's perception of social support influenced their level of
participation in instrumental music. To address this issue, two multiple regression analyses were calculated in
which either 'Formal' or 'Informal' playing of instruments was the outcome variable, with sex and the three
social support scales as the predictor variables.
Predicting Informal Playing
In order to predict informal playing a hierarchical multiple regression was calculated. As sex was found to
correlate significantly with informal playing (see Table 2) it was entered as a predictor at step 1. The three
support scales were entered simultaneously at step 2. By examining the individual regression coefficients for
each of these predictors at this step we could determine which, if any, of them were unique predictors of
informal playing. The three sex x support interaction terms were entered simultaneously at step 3. This latter
step allowed us to examine if the support variables differed in terms of their predictive power as a function of
sex. The results are summarised in Table 3. After the variance shared with sex had been controlled the three
support variables together accounted for a significant proportion of the variance in informal playing.
Additionally, the analysis revealed that all three support variables emerged as unique predictors of the
dependent variable. None of the interaction terms at step 3 either collectively or individually emerged as a
significant predictor.
Table 2
Means, Standard Deviations, and Intercorrelations for Children's Level of Participation, Perceived Social
Support and Sex
Variable M SD 1 2 3 4
Formal playing 2.9 1.7 .304** .320** .405** -.305**
Informal playing 2.5 1.4 .367** .377** .318** -.214**
Predictor variables
1. Parental support 5.0 1.3 ------ .550** .476** -.209**
2. Friendship support 3.9 1.4 ----- .494** -.277**
3. Teacher support 4.6 1.5 ----- -.169**
4. Sex -----
**p <.01. Sex recoded for regression as -1 and +1.

Table 3
Hierarchical Regression Analysis Summary for Social Support Variables Predicting Children's Level of
Participation in Informal Instrumental Music

Variable B SEB β R² B R²
Step 1 .052 .051
Sex -.323 .042 -.227
Step 2 .207 .15
Parental support .202 .036 .192
Friendship support .190 .036 .185
Teacher support .113 .031 .120
Step 3 .209 >.01
Sex x Parent interaction -5.677E-02 .038 -.208
Sex x Friendship interaction -5.109E-04 .036 -.002
Sex x Teacher interaction 3.053E-03 .031 .010
Predicting Formal Playing

The same hierarchical regression was used for predicting formal playing. Sex was entered as a predictor at step
1. The three support scales were entered simultaneously at step 2, and the three sex x support interaction terms
were entered simultaneously at step 3. The three sex x support interaction terms collectively at step 3 accounted
for a statistically significant increase in variance. A summary of the results are presented in Table 4.
Table 4
Hierarchical Regression Analysis Summary for Social Support Variables Predicting Children's Level of
Participation in Formal Instrumental Music
Variable B SEB β R² B R²
Step 1 .101 .10
Sex .550 .050 .317
Step 2 .245 .14
Parental support .117 .043 .091
Friendship support 7.124E-02 .043 .057
Teacher support .349 .037 .303
Step 3 .258 .01
Sex x Parent interaction 2.846E-02 .044 .085

Sex x Friendship interaction 4.748E-02 .043 .115
Sex x Teacher interaction 9.561E-02 .036 .269
Further analysis indicated that only the sex x teacher support interaction term was a unique predictor at this
step. In order to interpret this interaction effect we calculated the zero order correlation between teacher support
and formal playing separately for girls and boys.
The correlation co-efficient of teacher support and formal playing was higher for girls (r= .404, p<.01), than
boys (r= .360, p<.01), indicating that girls reported higher levels of perceived support from teachers for formal
playing than boys.
Discussion
The significant effects of sex and cohort on the levels of perceived support confirms the importance of
perceived social support on children's level of involvement in playing instruments. Higher levels of perceived
support from parents, friends, and teachers were reported by children who currently play an instrument
compared to those who had given up or never played. Yet children who had previously given up playing an
instrument still reported higher levels of perceived support from parents than those who had never played. The
finding that children presently engaged in playing instruments perceive higher levels of support could be
interpreted as a result of their engagement leading to higher levels of involvement from parents, friends and
teachers. But as children who have given up still report higher levels of perceived support from parents than
those who had never played, it suggests that there is more than just actual engagement causing the difference in
perception. It is likely that children who perceive higher levels of parental support for playing instruments also
perceive a higher value for the engagement in the activity. This suggests a relationship between the valuing of
playing instruments and perceived higher levels of support encourages children to engage in playing
instruments more than children who do not perceive it as a valued activity in their social and cultural
environment.
Girls also reported more perceived support from parents, friends, and teachers than boys. As previous research
has found that girls engage more in musical activities than boys, and have higher levels of achievement, it is not
surprising that girls are found to perceive more support. It is likely that those who perceive support for engaging
in an activity will engage more and do better than those who do not perceive such support. However, the reverse
it also true, that those who perceive low support for their engagement in playing an instrument will be less
motivated to continue and therefore will not achieve high levels of success, possibly abandoning playing
altogether. As music is also perceived as a 'girl' activity, girls are also less likely to be bullied for playing an
instrument. It is more likely that their friends will also play instruments, providing the opportunity for 'fun'
informal playing as well as structured formal practice.
The two types of playing identified suggest that children can engage in playing instruments in different ways,
with formal playing taking place mostly at school in a structured format and informal playing occurring out of
school, with less-structured opportunities for engagement with friends and family. Using multiple regression
analyses to explore the data separately for each type of playing it was found that all three support variables
emerged as unique predictors of informal playing. This finding suggests that all the perceived supportive
relationships can influence a child's engagement in informal playing of instruments, and that like Wentzel's
(1998) findings, they can be additive rather than compensatory. As informal playing may take place in a number
of settings it is likely that whichever support relationship is most present will be the most important. In the case
of formal playing, the sex x teacher interaction term was the only unique predictor. As formal playing takes
place at school, it is not surprising to find that perceived teacher support is the significant unique predictor.
Further analysis of this finding also identified that it was girls who reported the higher levels of perceived
support from teachers. Indeed, as mentioned previously, girls reported significantly more perceived support
across all three support groups. The findings of this study highlight the importance of children's perceived social
support and their engagement in playing instruments. It also demonstrates that the different types of engagement
can require different sources of support, and that if sufficient support is found in the most prominent support

relationship, such as teachers for formal playing at school, this can lead to higher and more successful levels of
engagement in instrumental music.
Key words: social support, children, and participation
References
Berndt, T.J., Laychak, A.E., & Park, K. (1990). Friends' influence on adolescents' academic achievement
motivation: An experimental study. Journal of Educational Psychology, 82, 664-670.
Best, R. (1983). We've all got scars: What boys and girls learn in elementary school. Bloomington: Indiana
University Press.
Blatchford, P., Burke, J., Farquhar, C., Plewis, I., and Tizard, B. (1989). Teacher expectations in infant school:
Associations with attainment and progress, curriculum coverage and classroom interaction. British Journal of
Educational Psychology, 59, 19-30.
Boulton, M.J. and Smith, P.(1994). Bully/victim problems in middle school children. British Journal of
Developmental Psychology, 12, 315-329.
Csikzentmihalyi, M., Rathunde, K., and Whalen, S. (1993). Talented teenagers: The roots of success and
failure. Cambridge: Cambridge University Press.
Davidson, J.W., Howe, M.J.A., Moore, D.G, and Sloboda, J.A. (1996). The role of parental influences in the
development of musical ability. British Journal of Developmental Psychology, 14, 399-412.
Davidson, J.W., Howe, M.J.A., and Sloboda, J.A.(1997). Environmental factors in the development of musical
skill over the life span. In Hargreaves, D.J., and North, A.C. (Eds.). The social psychology of music. Oxford:
Oxford University Press. pp188-206.
Department of Education and Science (1991). Music for ages 5 to 14: Proposals of the Secretary of State for
Education and Science and Secretary of State for Wales. HMSO.
Eccles, J.A., Wigfield, A. and Schiefele, U. (1998). Motivation to succeed. In Damon, W. (Ed.). The Handbook
of Child Psychology, Vol 3, pp 1017-1095.
Finnas, L. (1987). Do young people misjudge each other's musical tastes? Psychology of Music, 15, 152-166.
Howe, M.J.A., and Sloboda, J.A. (1992).Problems experienced by talented young musicians as a result of the
failure of other children to value musical accomplishments. Gifted Education, 8, 1, 16-18.
North, A.C., Hargreaves, D.J., and O'Neill, S.A.(2000). The importance of music to adolescents. British Journal
of Educational Psychology, 70, 2, pp. 255-272.
O'Neill, S.A.(1994). Musical development: Aural. In A. Kemp (Ed.). Principles and processes of music
teaching. Reading: International Centre for Research in Music Education. pp. 1043.
O'Neill, S.A. and Boulton, M.J. (1995a). Is there a gender bias toward musical instruments. Music Journal,
60,358-359.
O'Neill, S.A. and Boulton, M.J.(1995b). Children's perceptions of the social outcomes of playing musical
instruments. Proceedings of the British Psychological Society, 3,1, 87.
O'Neill, S. A. (1997). Gender and music. In Hargreaves, D.J., and North, A.C. (Eds.). The social psychology of
music. Oxford: Oxford University Press. pp 46-63.
Patrick, H., Ryan, A. M., Alfeld-Liro, C., Fredricks, J.A., Hruda, L.Z., and Eccles, J.S. (1997). Commitment to
developing talent in adolescence: The role of peers in continuing motivation for sports and the arts. Paper
presented at the biennial meeting of the Society for Research in Child Development, Washington, DC.
Rosenthal, R. and Jacobson, L. (1968). Pygmalion in the classroom. New York: Holt, Rinehart and Winston.

Scanlan, T.K., Carpenter, P.J., Lobel, M., and Simons, J.P.(1993). Sources of enjoyment for youth sport
athletes. Pediatric Exercise Science, 5,275-285.
Sloboda, J.A.(1989). Music as a language. In F. Wilson and F. Roehmann (Eds.). Music and child development.
St. Louis, Miss.: MMB Music Inc. pp.28-43.
Sloboda, J.A. and Howe, M.J.A. (1991). Biographical precursors of musical excellence: An interview study.
Psychology of Music, 19, 3-21.
Sosniak, L.A.(1985). Learning to be a concert pianost. In B.S. Bloom (Ed.). Developing Talent in Young
People. New York: Ballentine.
Sosniak, L.A.(1990). The tortoise, the hare, and the development of talent. In M.J.A. Howe (Ed.). Encouraging
the Development of Exceptional Abilities and Talents. Leicester: The British Psychological Society.
Sroufe, L., Bennet, C., Englund, M., and Urban, J. (1993). The significance of gender boundaries in
preadolescence: Contemporary correlates and antecedents of bountary violation and maintenance. Child
Development, 64, 455-466.
Thorne, B. and Luria, Z. (1986). Sexuality and gender in children's daily worlds. Social Problems, 33, 176-190.
Thorne, B. (1986). Boys and girls together... But mostly apart: Gender arrangements in elementary schools. In
W. Hartup and Z. Rubin (Eds.). Relationships and development. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Wentzel, K.R., and Asher, S.R. (1995). Academic lives of neglected, rejected, popular, and controversial
children. Child Development, 62, 1066-1078.
Wentzel, K.R. (1998). Social relationships and motivation in middle school: The role of parents, teachers, and
peers. Journal of Educational Psychology. Vol. 90, No. 2, 202-209.
Yoon, K.S. (1997). Exploring children's motivation for instrumental music. Paper presented at the biennial
meeting of the Society for Research in Child Development, Washington.




Back to index

Hierarchical Representations
Proceedings paper
HIERARCHICAL REPRESENTATIONS OF COMPLEX METERS

Justin London, Carleton College
http://www.acad.carleton.edu/curricular/MUSC/faculty/jlondon/index.htm
1. How to picture meter

In trying to understand musical phenomena, and in trying to understand how we understand those
phenomena, our musical descriptions are centrally important. For the words we choose when we talk
about music, and the pictures we draw of it, frame and direct our understanding. To the extent to
which these verbal and pictorial representations are "true" or "veridical"--whatever true or veridical
may really mean--we may come closer to understanding the nature of our mental representations of
musical phenomena. We may also examine similarities and differences amongst these representations
in order to get a better grasp of similarities and differences in the music itself.
Traditional western musical notation is a continuous graph of pitch and time, with pitch on the vertical
axis, and time on the horizontal. Thus each musical score represents these primary musical parameters
as a long ribbon, with recurring patterns of pitch and duration rendered in similar orthographies. What
is implicit, but never actually notated in our familiar system of staves, rests, noteheads, stems, and
beams, is meter. Now to be sure, there is usually a time-signature which gives the performer an
indication of how durational patterns are to be interpreted (at least initially). But what is indicated
within (and sometimes across) each bar is a pattern of durations, not the hierarchical arrangement of
temporal locations that is the essence of meter.
Among music theorists a number of representations for meter have been developed. Perhaps most
well known to researchers in music perception and cognition, is the "dot notation" for meter
developed by Lerdahl and Jackendoff (1983; their notation is drawn from the work of Arthur Komar,
1971), which aligns a grid of time points below the staff, such that each and every musical articulation
aligns with at least one dot. Higher levels of metric structure align with lower levels, and metrical
accent is product of hierarchical relationships between dots on one level and their alignment with a
dot or dots on higher level(s). Like other metric representations, Lerdahl and Jackendoff's metric grid
is yoked to the traditional notation for pitch and time, and so it too takes the form of a long ribbon
which represents the unfolding of a hierarchy of metric time points.
I like Lerdahl and Jackendoff's representation, and one can learn much about music and our musical
understanding from it. But it seems to miss a crucial aspect of meter, and that is its cyclicity. Meter,
by just about anyone's definition, is a recurring pattern of time, one we infer from the musical surface
and then project forward in time, a hierarchically-structured anticipation of future musical events. One
important aspect of meter is its function as a timekeeper, a clock, and one of the best ways to
represent clock-like processes is with a circle.
2. Cyclical Representations of Meter
file:///g|/Sun/London.htm (1 of 10) [18/07/2000 00:27:54]

Here is a very simple meter, a cycle of four isochronous beats:
One reads these cycles by moving through them in a clockwise fashion, starting from the "12 O'clock"
position. Most metric patterns are more interesting than this, in that they involve a hierarchy of
coordinated time-cycles created by temporal connections between "non-adjacent" events on the "outer
rim" of metric diagram:
The outer rim represents the basic cycle of a meter; it represents the lowest/shortest/fastest level of the

metric hierarchy--not beats, typically, but beat subdivisions. The interior line-segments represent
higher levels of metric structure. The meter in this example is based on an 8-cycle, and the various
pathways within the cycle correspond to different levels of the metric hierarchy: the outer level
defines the cycle, the next defines the beats, the next the half-bar level, and the red loop the measure
itself.
Metric well-formedness may be expressed in terms of the following rules for constructing cyclical
representations:
Given a basic cycle of N elements, additional levels may be constructed, provided:
(a) each line segment connects non-adjacent time-points on the cycle (with the exception
of (d) noted below);
(b) each and every series of segments that represents a metric level must start and end at
the same location (for convenience, notated here as the "12 O'clock" position), forming a
sub-cycle;
(c) no crossing of line segments is permitted;
(d) the highest level of metric structure is represented by a loop to and from the cardinal
metric position.
Note in the previous example each level follows these rules recursively--what were non-adjacent
points on the basic cycle become adjacent on the "first interior sub-cycle" of the diagram. N.B., Since
my main interest is the relationship between the basic cycle and the beat level of the measure, I will
omit higher levels of metric structure in my subsequent graphs. Also, to avoid un-necessary clutter, I
will omit the directional arrows, as one may assume all basic cycles and sub-cycles involve clockwise
directed motion.
These local constraints on metric well-formedness capture some basic aspects of metric structure. One
is that higher levels of meter are usually comprised of two or three elements from the level
underneath. Another is that one point in the measure--the downbeat--is of cardinal importance in the
alignment and coordination of metric processes. What they do not capture is a global constraint on the
spacing of higher-level articulations relative to the basic cycle, the principal of maximal evenness.
Maximal evenness is a concept developed in the study of pitch-class sets (Clough and Douthett,
1991). A maximally-even pattern is one in which a subset of M-elements are spaced "as far apart as
possible" on the circle that represents their N-element superset. We may therefore note:
(a) The basic cycle itself is, by definition, maximally even
(b) Regular meters are, by definition, maximally even, since each beat is comprised of
the same number and kind of sub-division units.
(c) Complex meters are also maximally even
While the first two points are not remarkable, why should complex meters tend toward maximal
evenness? On a cyclical representation of a metric pattern, the fundamental constraint on the
formation of its metric hierarchy is the number of time-points in the basic cycle. This number
determines the various configurations that are possible within it, and so one can represent
cyclically-defined set of meters in terms of the different interior patterns it may contain.
Consider a 9-cycle. According to the metric well-formedness rules given above, it may contain the
following three and four-beat sub-cycles: (a) a pattern of three evenly-spaced beats (familiarly, 9/8),

or (b) a complex four-beat pattern, short-short-short-long (2+2+2+3):
As an aside, note in the case of the complex meter, that the location of the long relative to the
downbeat may shift, but the SSSL series remains unaffected, since the L always loops back to the first
S. Why not, however, have a sub-cycle of 2+3+4 (?):
A common argument against this configuration is that the segment which spans four articulations of
the basic cycle "naturally" devolves to 2+2, since the "duple" unit tends to persist in the listener's
perception and anticipation. This is a level-specific rationale. Maximal evenness provides a global
rationale, one that assumes that the listener will gravitate towards the most parsimonious attending
strategy. The simplest attentional frameworks are comprised of categorically-equivalent spans on each
and every metric level. As a metric pattern, the 2-3-4 sub-cycle involves three categorically different
time intervals, a short, a medium, and a long. The 2-3-4 sub-cycle is of course not maximally even.
But this pattern of time intervals does align with the maximally-even 2-3-(2-2) sub-cycle, and this
pattern involves only two categorically distinct time-intervals. Thus I would conjecture that while a
four-beat pattern is nominally more complex than a three-beat pattern, because the 2-3-2-2 pattern is

both maximally even and involves fewer distinct durational categories, it is in fact the preferable
attending strategy. This is an instance of a general problem: in inferring a meter from a durational
surface, when are long surface durations "split" into shorter sub-articulations when metrically
interpreted? An answer may be: whenever splitting a long duration preserves maximal evenness.
This brief examination of the metric possibilities of the 9-cycle is but one instance of how cyclical
representations allow one to examine the formal properties of various metric systems. One may also,
for instance, consider the differences in N-element basic cycles when N is a prime versus non-prime
number. Similarly, an examination of the 12-cycle shows that contains a great number of sub-cyclic
configurations, from symmetrical 3, 4, and 6 beat patterns to a wide variety of complex meters. This
perhaps explains its cross-cultural ubiquity, as it is so rife with metric possibilities.
3. Cyclical Representations of Meter and Cognitive Constraints

The cyclical representations of meter presented thus far involve both elements in a metric system as
well as connections or pathways between them. In addition to well-formedness rules for metric cycles
given above, one may also add some perceptual and cognitive constraints. Each line segment which
connects a pair of metric elements can be assigned a time value. Given what we know about the
maximum and minimum time-intervals between metric elements (i.e., the shortest "countable" time
interval is about 100ms, and the longest is about 5 seconds), and that we have a strong preference for
events in the 500 to 900ms range, we can thus specify the following requirements on the "length" of
the segments which comprise metric cycles and sub-cycles:
(a) a minimum length requirement: elements of the basic cycle must be spaced at least
Å100ms apart (Roederer 1995; Hirsh et. al. 1990);
(b) a maximum length constrain: the cumulative time span for basic cycle is Å5 sec.
(Fraisse 1982; Berz 1995).
These two requirements work in tandem, since, for example, as the basic interval on an 8-cycle
approaches 650ms, the cumulative time-span begins to exceed the 5 second limit. Conversely, if a
metric cycle has, for example, a cumulative time-span of about 3 seconds, and the shortest possible
interval is Å100ms, then there is a maximum of 30-32 elements which might appear as part of a
metric cycle (N.B. even if these maxima and minima change as the result of further experimental
research, the reciprocal relationships between minimum time intervals on the basic cycle, the
maximum possible number of elements in a basic cycle, and the cumulative duration should still
hold). We may also stipulate that:
(c) if possible, a metric system should contain a sub-cycle comprised of segments which
fall in the 500-900ms range (Parncutt 1994; Duke, et. al 1991).
Consider a complex 3-beat pattern within an 8-cycle:

Here the beats follow a 3-3-2 pattern of basic cycle elements; the ratio of the long beat to the short
beat is, as is typical, 3:2. Since we prefer a meter in which all beat-level periodicities fall in the range
of maximal pulse salience, we can note the following effect of tempo changes on this pattern:
Long Beat Interval Short Beat Interval Basic Cycle Interval
600ms 400ms 200ms
750ms 500ms 250ms
825ms 550ms 275ms
900ms 600ms 300ms
1050ms 700ms 350ms
1200ms 800ms 400ms
Only when the interval of the basic cycle falls within the 250-300ms range do both the long and short
beats fall within or near the range of maximal pulse salience. This suggests that complex meters may
be more sensitive to tempo constraints than simple meters, and that tempo constraints may play a role
in limiting the range of possible sub-cycles when the basic cycle itself is made up of many (that is,
more than 16) elements.
4. Expressive Variations and Cyclical Representations of Meter

Since a metric cycle can represent not just the formal relationship between metric events and levels,
but also actual timing values, we may incorporate expressive variation into our specifications for
timing relationships, either in the form of decomposable variations from an isochronous norm, or in
terms of absolute timing values from empirical data. For real-world musical performances (and the
metric attending behaviors they engender) do not involve isochronous durations. Here is an example
based on timing data collected by Repp (1998b):

This diagram is for the second measure of Chopin's E-major etude, taken from the grand average of
timings from 27 performances by nine different pianists (see Repp 1998a, p. 268). As can be seen,
neither the basic cycle nor the 4-beat sub-cycle involves isochronous time intervals. Indeed, the 4 beat
sub-cycle bears more than a little resemblance to the 4-in-9 sub-cycle given above, though of course
here the last beat is considerably shorter than the others, whereas in the 4-in-9 sub-cycle, the last beat
is considerably longer. Notice also the range of time intervals that occurs here on the basic cycle
itself, from a minimum 436ms to a maximum of 617ms.
What is going on in this measure? In a word, rubato. The story goes something like this: In the first
part of the measure we had a slight bit of expressive timing variation, with the 2nd half of each beat
being stretched by about 40-50ms; in this fashion, the first two beats follow a predictable pattern
(more on this in a moment). On the third beat, however, we have a more dramatic bit of rubato
(corresponding with the onset of a sustained tone in the melody), as the first half of beat 3 is Å100ms
longer than the first half of beats 1 and 2. Now the pianist must regain the time s/he has taken, and the
remaining time-intervals make up the "stolen time"; as such they are correspondingly short. Notice,
however, that even under this constraint the second half of the last beat has a discernible stretch
relative to its first half.
Here is where a bit of mathematical graph theory may be of use. Let us suppose that the rubato on the
third beat of this measure had not occurred. In that case, the timing data might have looked something
like this:

Here we see a pattern on the basic cycle of a regular alternation of slightly shorter--slightly longer
time intervals, what I have labeled T1 and T2. Notice that each "odd" location (the filled dots) on the
graph is symmetrically positioned, as each sub-cycle re-integrates to a constant timing value. In graph
theory, cycles with such symmetrical properties are reducible to more compact graphic
representations, what are referred to as voltage graphs (a term obviously borrowed from their usage in
electrical engineering). The right-hand panel above shows the voltage graph for an 8-cycle comprised
of alternating T1 and T2 values, values for the different "voltages" between the on-beat and off-beat
timepoints. The voltage graph above will generate various basic cycles when "counted" according to a
particular arithmatic modulus. Thus the 8-cycle is generated by the given voltage graph, modulo 4. In
the case of the E-major prelude, we can specify that the ratio between T1 and T2 should be Å48:52.
Other timing ratios may be specified, such as a shift from "straight" to "swung" 8th notes in a jazz
performance style.
Because the rubato performance reported by Repp lacks the symmetry of the simplified version given
above, one cannot reduce it to a simple voltage graph (nor can one use a voltage graph to generate its
complete metric cycle. Similarly, one cannot reduce the 4-in-9 sub-cycle in terms of a corresponding
voltage graph, since it too lacks the requisite symmetry. Let me add that it may be possible to make
alternative graphic representations of the "rubato 8" and 4-in-9 patterns, using what are known as
permutation voltages, and so it would not be correct to infer that such graphs are irreducible. But the
geometric similarities between the "rubato 8" and the "4-in-9" are highly suggestive.
To the extent that a meter can be reduced to a simple voltage graph, one need not include higher-level
timings as part of a structural representation, which is to say, as part of the listener's temporal
attending strategy. In these instances, the higher levels "take care of themselves" as a byproduct of the
cyclical generation from the underlying voltage graph. By contrast, complex meters (such as the
4-in-9) and simple meters with multi-leveled expressive variation (the rubato 8) require more levels of
structure in their representation(s); the rubato 8 example shows how low-level timing changes can
"trickle up" to affect higher levels of attending/anticipation. This implies that as attentional strategies,
these meters require a greater interplay of top-down and bottom-up information--indeed, one cannot
build up higher levels from lower levels, but must instantiate the metric hierarchy in toto (see London

1995, p. 73).
Complex meters (such as the 4-in-9) and simple meters performed with a high degree of expressive
variation (such as the rubato 8) have a number of formal and cognitive similarities, from maximal
evenness of events on each level to timing constraints on local and global levels. These meters are
thus in many ways more alike than they are different. Given that most human musical performance
involves multi-leveled expressive variation, one is led to question the validity of the simple-complex
metric distinction. While one may draw this distinction in theory, in practice, metric attending is
almost always complex.
Works Cited
Berz, W. L. (1995). Working Memory in Music: A Theoretical Model. Music Perception, 12,
353-364.
Clough, J. and J. Douthett (1991). Maximally Even Sets. Journal of Music Theory, 35, 93-173.
Fraisse, P. (1982). Rhythm and Tempo. in The Psychology of Music, ed. D. Deutsch. New York,
Academic Press: 149-180.
Hirsh, I. J., C. B. Monohan, et al. (1990). Studies in Auditory Timing: 1. Simple Patterns. Perception
and Psychophysics, 47, 215-226.
Komar, A. J. (1971). Theory of Suspensions. Princeton, Princeton University Press.
Lerdahl, F. and R. Jackendoff (1983). A Generative Theory of Tonal Music. Cambridge, MIT Press.
London, J. M. (1995). Some Examples of Complex Meters and Their Implications for Models of
Metric Perception. Music Perception, 13, 59-78.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A partial
connection between music perception and performance. Perception and Psychophysics, 57,
1217-1232.
Repp, B. H. (1998a). Obligatory 'expectations' of expressive timing induced by perception of musical
structure. Psychological Research, 61, 33-43.
Repp, B. H. (1998b). The Detectability of Local Deviations from a Typical Expressive Timing
Pattern. Music Perception, 15, 265-289.
Repp, B. H. (1999). Detecting Deviations from Metronomic Timing in Music: Effects of Perceptual
Structure on the Mental Timekeeper. Perception and Psychophysics, 61, 529-548.
Roederer, J. G. (1995). The Physics and Psychophysics of Music: An Introduction. New York,
Springer Verlag.
Sloboda, J. A. (1983). The Communication of Musical Metre in Piano Performance. Quarterly Journal
of Experimental Psychology, 35A, 377-396.
Todd, N. P. M. (1995). The Kinematics of Musical Expression. Journal of the Acoustical Society of
America, 97, 1940-1950.
Zuckerkandl, V. (1956). Sound and Symbol: Music and the External World. New York, Pantheon
Books.

Back to index

COORDINATING DUO PIANO PERFORMANCE
Proceedings paper
Coordinating duo piano performance

Aaron Williamon
Royal College of Music, Prince Consort Road, London, SW7 2BS, UK
Jane W. Davidson
Centre for Research in Musical Performance and Perception, Department of
Music, University of Sheffield, Sheffield, S10 2TN, UK
Recent interest in performance skill acquisition, social interaction and expressive body movement has
opened up exciting new areas of research in musical performance - namely, that of the methods by
which skilled musicians communicate in ensembles. In such situations, verbal, musical and visual
cues must be established and shared between co-performers to enable and drive successful rehearsals
and performances.
Verbal feedback is a fundamental mode of communication for rehearsal situations. Murningham and
Conlon (1991), for instance, studied group function among string quartet players and found that topics
were often discussed in rehearsal for points of clarification and unity.
It is important to note, however, that much of the exchange between musicians is unspoken. Indeed, a
substantive literature demonstrates that expert musicians can effectively communicate their musical
ideas through non-verbal means by varying such features of the music as timing (Povel, 1977; Shaffer,
1980, 1981, 1984; Shaffer, Clarke & Todd, 1985; Clarke, 1982, 1985; Repp, 1996, 1997), intensity
(Patterson, 1974; Kamenetsky, Hill & Trehub, 1997) and pitch (Schoen, 1922; Bartholomew, 1934;
Deutsch & Clarkson, 1957). Nevertheless, some features of the musical score can be more or less
emphasised depending on moment-by-moment and quite spontaneous modifications to interpretations
(Sloboda, 1985). So, it appears that rehearsals are occasions for co-performers to learn the score and
plan the co-ordination of timing, also to establish general expressive features of the music. In a live
performance situation, variations which occur spontaneously are critically dependent on performers
being able to detect and act immediately upon another's ideas.
Besides communicating information about the co-ordination of timing, expressive ideas and personal
support between co-performers, Davidson (1993, 1994, 1995) has shown that specific movements
reveal much about performers' expressive interpretation of musical structure. For example, in a
performance of one of Beethoven's Bagatelles for piano, a pianist used highly distinctive head shaking
movements consistently whenever he played a cadence, and a wiggle movement of the upper torso
when playing ornaments. These movements were found to be absolutely integral to the production of
the music.
file:///g|/Sun/Williamo.htm (1 of 6) [18/07/2000 00:27:56]

Aside from verbal and musical exchanges, visual cues are integral to ensemble performance. Clayton
(1985) discovered that without visual feedback co-performers found it extremely difficult to
co-ordinate musical timing. Additionally, Yarbrough (1975) showed that co-performer interactions are
successful when there are high levels of eye contact and use of facial and bodily expressions.
Of course, moment-by-moment behaviours of all kinds - be they verbal, musical or visual - are
embedded within a socio-cultural framework. Thus, when considering co-ordination between
co-performers, researchers must be aware of overriding socio-cultural factors which shape the
interaction processes (e.g. social etiquette and a learned cultural aesthetic).
In research terms, the vital social communication aspects of rehearsal and performance have been
largely ignored (Davidson, 1997). In this paper, we hope to provide initial insight into these salient
aspects of musical skill and development by examining and detailing the video data collected from
two pianists when rehearsing and performing piano duos and duets. We focus on the exchanges which
take place between players during the rehearsals and a subsequent performance of an entire recital
programme. The piano duo was specifically chosen because both performers contribute similar
musical elements to the performance and both use comparable instrumental techniques.
Method
Participants
Two highly skilled male pianists with a mean age of 23 years participated in the study. Both had a
wide range of solo and accompanying experience. Both had played the piano since the age of seven
years and so had about fifteen years of learning and performing experience. They accompanied at
least two concerts a week, and played solo repertoire in addition. Although they had met informally
on two previous occasions, they did not know one another prior to working for the current project.
Materials
A VHS camcorder was used to record all rehearsals and the performance given by the two pianists.
Two Sony Walkman tape recorders were used to record all independent, individual practice by the
performers and to collect data from post-performance interviews with them. Based on observations of
the taped material, the researchers designed a semi-structured interview schedule to ask the pianists
questions about the process of rehearsing and performing together.
Procedure
The researchers approached the two pianists and offered them the opportunity to perform a 30 minute
lunchtime recital of piano duos and duets at the University of Sheffield. Both agreed and were set a
performance date for ten weeks later. They were asked to select their own repertoire, arrange all their
own rehearsals and prepare as normally as possible, but with the additional request of recording all
joint rehearsals and the performance on video tape and all individual practice sessions on cassette
tape. The pieces they performed were:
1. Variations on a Theme by Beethoven by C. Saint-Saëns for two pianos
2. Second Movement of Concerto in C for two keyboards by J. S. Bach
3. Sonate for four hands by F. Poulenc
Explorations of the Data

General observations of the pianists' practice and performance were made across all of the prepared
compositions. Once these data were collected, the analyses took the following forms:
1. a preliminary exploration of the content of the obtained data by both researchers;
2. systematic observations leading to both qualitative and quantitative measures of eye contact,
non-verbal gestures and spoken communication;
3. thematic content analyses of interview data.
Summary of Results
General observations
Prior to this project, neither of the pianists had played the pieces. In a brief telephone conversation
arranging the first rehearsal, both mentioned that they had previously heard the Saint-Saëns and the
Poulenc and that these might be interesting pieces to explore.
Throughout the rehearsal period, the Saint-Saëns and Poulenc were practised most (i.e. on four
occasions prior to the performance). These were at fortnightly intervals up to the date of the recital.
Alongside these pieces, they also rehearsed the first movement of Bach's Concerto in C. At the end of
the second rehearsal, however, they decided to run-through the second movement of this concerto in
the next session in order to create a more musically diverse recital programme - both the Saint-Saëns
and the Bach first movement included fast fugue sections. The second movement of the Bach,
therefore, was only rehearsed twice prior to the performance, with both rehearsals taking place three
and two days, respectively, beforehand. Surprisingly, the pianists never individually practised the
pieces, only ever practising during the video recorded sessions. Both musicians attributed this
apparent lack of practice to their fluent sight-reading abilities.
The Practice Process

Musical Communication
The most striking feature of the practice process was the lack of verbal communication. Over 90% of
the rehearsal time was spent playing. The pianists reported a sense of increasing understanding of one
another - "warming" − without using speech.
The communication was not solely dependent on consciously driven acoustical information exchange.
Rather, visual communication, in the form of explicit attempts to make eye contact and more implicit
physical gestures, served as another vehicle for sharing ideas. Systematic observations of the videos
provide further insight into the importance of non-verbal communication in the pianists' rehearsals.
Eye Contact
Over the course of the rehearsals the use of eye contact between the two players increased. This can in
part be accounted for by the fact that they became increasingly familiar with the music and so felt
freer to allow their eyes to leave the score. However, since both were fluent sight-readers and only
played through the Bach, for instance, for a total rehearsal time of 28 minutes before the performance,
the looking seemed to be, in part, connected with co-ordinating musical entrances and exits, but also
an increasing awareness of one another's spontaneous ideas.

Gestures and Expressive Body Movement

Non-verbal movement information was expressed through two principal means. The first was in hand
lifts which occurred at phrase boundaries or during pedalled and held notes. As the rehearsals
developed, so did the synchronisation of the style and dynamic of these lifts between the two players.
The second type of noticeable movement was a swaying of the upper torso. This swaying was
evidently allied to the overall tempo of the piece but was also used in a variable manner, apparently to
generate rubato in the score.
The Performance Process

Musical Communication
The musical features such as timing and dynamics were largely consistent with those established in
the later stages of the rehearsal process. However, at the key structural moments of the works, such as
phrase boundaries, the musical communication seemed more intense. That is, the dynamics and timing
variations were more extreme.
Non-verbal Cues
In the performance, eye-contact was even greater than in the rehearsal, organised heavily around the
phrase boundaries. These seemed to be strongly connected with the generation of more intense
musical moments in performance, but they also seemed to reflect a more relaxed familiarity between
the players, with occasional smiles occurring. Gestures were also more prevalent, the hand-lifting
being most pronounced and the swaying being most intense - and also most erratic - in the actual
performance.
Discussion
Regardless of the repertoire being played, it seems that these two pianists were able to converse
"musically", with information being given and received, modified and consolidated with negligible
verbal interaction. In association with the musical refinements, gestural cues and eye contact became
gradually more refined, synchronous and fluent over the rehearsal period. The performance itself
clearly reflected all the refinements the rehearsal process had brought.
In line with previous research, it was discovered that the pianists were concerned about using
expressive devices, such as highlighting structural features in the music. They also spoke of the
importance of a shared emotional state and conception of each piece's narrative.
In terms of eye contact, the current study consolidates Clayton's (1985) finding that eye contact
between co-performers is critical in the co-ordination of musical content. The performers purposefully
synchronised their glances at major structural points in the compositions.
The movement gestures used could be categorised in terms of illustrators and emblems as suggested
by Davidson (1997). The high hand lifts of the two pianists illustrated the mutual energy and force
both were bringing to the performance and were simultaneously emblematic of a nineteenth century
extravagant pianistic style.
Irrespective of the composition being played, the swaying movement was a feature which developed

throughout the course of the rehearsals and was at its most obvious during performance. It was allied
to phrase structure and overall tempo of each specific piece, but most significantly was highly
illustrative of the emotional intention of the performers. This observation seems to reflect the
theoretical proposals of Runeson and Frykholm (1983) who argue that there is an attunement of
kinematics to dynamics in human actions. That is, internal states and intentions become manifest in
movement. "Movements specify the causal factors of events" (Runeson & Frykholm, 1983, p. 585).
Swaying as the key source of expressive movement in pianists has already been described by
Davidson (1997, 2000). In her theoretical proposal, it seems that the hip region of the pianist acts as a
fulcrum and generator of expressive movement.
In line with work by Cutting and Proffitt (1981), the swaying could represent the global level in a
hierarchy of expressive gestural information, with the hands providing a local indicator. From our
observations, this would seem to be the case. As mentioned above, however, we are aware of
socio-cultural influences which affect this movement production (e.g. adopting nineteenth century
emblematic gestural piano style).
This research allows us to gain initial insight into how two individuals articulate their ideas in both the
construction and execution of an ensemble piece of music. Further research is necessary to validate
and ground these findings.
References
Bartholomew, W. T. (1934). A physical definition of "good voice-quality" in the male
voice. Journal of the Acoustical Society of America, 6, 25-33.
Clarke, E. F. (1982). Timing in the performance of Erik Satie's "Vexations." Acta
Psychologica, 50, 1-19.
Clarke, E. F. (1985). Some aspects of rhythm and expression in performances of Erik
Satie's "Gnossienne No. 5". Music Perception, 2, 299-328.
Clayton, A. M. H. (1985). Coordination Between Players in Musical Performance.
Unpublished PhD Thesis, University of Edinburgh.
Cutting, J. E., & Proffitt, D. R. (1981). Gait perception as an example of how we may
perceive events. In R. D. Walk & H. L. Pick (Eds.), Intersensory Perception and Sensory
Integration (pp. 249-273). New York: Plenum.
Davidson, J. W. (1993). Visual perception of performance manner in the movements of
solo musicians. Psychology of Music, 21, 103-113.
Davidson, J. W. (1994). Which areas of a pianist's body convey information about
expressive intention to an audience? Journal of Human Movement Studies, 26, 279-301.
Davidson, J. W. (1995). What does the visual information contained in music
performances offer the observer? Some preliminary thoughts. In R. Steinberg (Ed.), The
Music Machine: Psychophysiology and Psychopathology of the Sense of Music (pp.
105-113). Springer Verlag.
Davidson, J. W. (1997). The social in musical performance In D. J. Hargreaves & A. C.

North (Eds.), The Social Psychology of Music (pp209-228). Oxford: Oxford University
Press.
Davidson, J. W. (in press for 2000). Understanding the expressive movements of a solo
pianist. Deutsche Jahresbuch fur Musikpsychologie.
Deutsch, J. A., & Clarkson, J.K. (1959). Nature of the vibrato and the control loop in
singing. Nature, 183, 167-168.
Kamenetsky, S. B., Hill, D. S., & Trehub, S. E. (1997). Effect of tempo and dynamics on
the perception of emotion in music. Psychology of Music, 25, 149-160.
Murningham, J. K., & Conlon, D. E. (1991). The dynamics of intense work groups; a
study of British string quartets. Administrative Science Quarterly, June, 165-186.
Patterson, B. (1974). Musical Dynamics. Scientific American, 231, 78-95.
Povel, D. J. (1977). Temporal structure of performed music: Some preliminary
observations. Acta Psychologica, 41, 309-320.
Repp, B. (1996). Patterns of note onset asynchronies in expressive piano performance.
Journal of the Acoustical Society of America, 100, 3917-3932.
Repp, B. (1997). Some observations on pianists' timing of arpeggiated chords.
Runeson, S., & Frykholm, G (1983). Kinematic specification of dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition,
and deceptive intention. Journal of Experimental Psychology: General, 112, 585-615.
Schoen, M. (1922). An experimental study of the pitch factor in artistic singing.
Psychological Monographs, 31, 230-259.
Shaffer, L. H. (1980). Analysing piano performance. In G. E. Stelmach & J. Requin
(Eds.), Tutorials in Motor Behaviour. Amsterdam: North-Holland.
Shaffer, L. H. (1981). Performances of Chopin, Bach, and Bartok: Studies in motor
programming. Cognitive Psychology, 13, 326-376.
Shaffer, L. H. (1984). Timing in solo and duet piano performances. Quarterly Journal of
Experimental Psychology, 36A, 577-595.
Shaffer, L. H., Clarke, E. F., & Todd, N. P. (1985). Metre and rhythm in piano playing.
Cognition, 20, 61-77.
Sloboda, J. A. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford:
Oxford University Press.
Yarbrough, C. (1975). Effect of magnitude of conductor behaviour on students in mixed
choruses. Journal of Research in Music Education, 23, 134-146.
Back to index

The expressive World of flux
Proceedings paper
The Expressive World of Flux!

Lisa M. Guile and Jane W. Davidson
Department of Music, University of Sheffield, Sheffield, S10 2TN, UK.
Email: lisa@flow.netkonect.co.uk and j.w.davidson@sheffield.ac.uk
Introduction
The theory put forward in this paper is based on the view that musical sound originates from an inner
impulse to move. During a musical performance, the many and varied movements of the body are the
vehicles through which a musical idea becomes audible. Clarke and Davidson (1998) suggest that to a
large extent the expressive intentions generated by a performer have their origin in corporeal
experiences. Therefore it may be proposed that an audible trace of the original kinetic impulse exists
within the sound spectrum of music. It is interesting to note that musical utterances of the same
intensity, contour, timing profile or overall kinetic quality can be achieved by the extremely varied
corporeal requirements of different instruments. For instance, the arm-wrist-hand movements
controlling a bow, or the tongue and breath manipulation on a mouthpiece can produce a very similar,
if not identical, musical communication. This may indicate the existence of a kinetic or body-based
'language' that can be expressed by any body movements and is detectable in the resulting sound
stream. The kinetic quality of a musical gesture, or indeed any kind of motion, is what I shall be
referring to as 'dynamics' throughout this paper. In the following theory, 'dynamics' constitute
perceptually available elements present within both musical and physical movement.
Motion in music and motion in the physical world both share essential elements that constantly
moving or evolving. It could be said that both of these activities are fundamentally in a constant state
of flux. Apart from the temporal flux of music and movement, there are countless other elements that
are constantly changing. Consider the possible parameters in flux during a single musical utterance.
The musical tones can move up or down, the degree of weight placed on each note can change
dramatically, the tone can expand or contract. Further to this, any musician would agree that the
degree of fluctuation possible within each of these factors is immense. However, it is flux that
presents, perhaps, some of the most difficult problems to psychologists interested in the perception of
any kind of motion. The moment to moment generation and perception of music and movement is
difficult to study because of the amount of complex factors that are constantly changing. The main
purpose of this paper is to describe the nature and extent of our ability to perceive flux information
from both our physical and acoustic environments. In addition, the possible usage and meaning of flux
information is also discussed.
The Transmodal Communication of Dynamics
In her work on 'intuitive parenting', Papousek (1996) notices how adults involved in active
communication with an infant will synchronise the force and temporal pattern of their own modalities
in order to convey a unified message to the child. For instance, Papousek shows that when seeking to
file:///g|/Sun/Guile.htm (1 of 8) [18/07/2000 00:27:57]

calm a child the adult may use quiet, slow, repetitive sounds alongside soft, slow and repetitive
movements; thus conveying the same energy profile, or what I term as 'dynamics', through two
modalities. In order to comprehend a transmodal communication, the infant would have to form some
kind of abstract, amodal representation of the significant perceptual elements. It is widely believed by
many developmentalists that humans are born with the ability to perceive the amodal qualities of
events and the environment (see Bower 1974, Spelke 1987) Indeed, Bower (1974) maintains that
infants learn to distinguish between sensory modalities at a later stage in development. Meltzoff
(1981) showed that infants could reliably copy the dynamic profile of visual stimuli. In one
experiment, infants were presented with a visual tongue protrusion and were able to repeat the action
with obvious attention to the manner in which it was performed. The ability of infants to achieve
intermodal matching as well as transmodal communication has been well documented by
developmentalists (Stern 1985, Maratos 1998). From the evidence presented above, it appears that we
can comprehend the transmodal communication of dynamic elements from infancy.
Stern asserts that the infant's entire world is one of "shapes, intensities and temporal patterns" (Stern
1985). The infant experiences everyday events of waking, becoming tired or becoming hungry as
certain dynamic profiles without realising the function and significance of these events as an adult
would. Further to this, Stern notes that it is more probable that infants will perceive the actions of
adults - such as the way in which an adult reaches for the feeding bottle, or the way in which the adult
picks up the child - in terms of the dynamic profile of the action as opposed to knowledge of the
action's goal. This may indicate the use of a direct perceptual mechanism as opposed to a more
cognitive process. In the above experiments and observations, it can be seen that infants display the
ability to recognise and repeat dynamic communication. This dynamic communication appears to be
based on energy fluctuations, on the flux of motion and could, perhaps, be described as a prelinguistic
language. If there is, as the evidence suggests, an amodal 'dynamic' language then what kind of
information can it communicate?
Dynamics in Movement: The Perception of Physical Movement

The perception of movement has a large body of literature. In this area, studies have revealed that
there are perceptually available elements in movement that can signify fairly specific information for
the observer. Within the area of action perception, Runeson and Frykholm (1983) showed that
subjects were able to reliably estimate the true weight of a box picked up by an actor purely from the
actor's preparatory movements. Furthermore, Runeson showed that subjects could not be fooled into
believing that the box was heavier/lighter then it actually was. The explanation which Runeson
provides for this perceptual ability is that when lifting a light box slowly, as if it was heavier, the body
won't experience the same inertia and reactive forces, or have to engage in compensatory movements
due to the weight. These factors are obviously detectable to the onlooker, which highlights the degree
of sensitivity available in our motion perception. Further to this, experiments have shown (Kozlowski
and Cutting 1977, Cutting and Kozlowski 1977) that gender and identity information is readily
available from moving bodies revealed in very limited point-light displays. Further to this, an
interesting experiment by Heider and Simmel (1944) showed that subjects naturally attributed human
intentions and goals to basic geometrical shapes in motion.
McLaughlin (1992) observed the movements of his patients in the analytical situation, noting both the
type of movement made by the client, (for example a hand or arm movement), and the dynamic
quality of the motion, (for instance banging, tapping or gently rubbing). From these movements,
McLaughlin asserts that he can interpret the "underlying affective state" of his patient, particularly in
moments where the patient finds it difficult to talk. The origin of this type of movement interpretation,

McLaughlin notes, can be traced right back to Freud who remarked that:
" He that has eyes to see and lips to hear may convince himself that no mortal can keep a secret. If his
lips are silent, he chatters with his fingertips; betrayal oozes out of him at every pore." (Freud1905 in
McLaughlin 1992)
It is apparent that our motion perception is a highly evolved, incredibly sensitive system, which is
needed, amongst other reasons, to gain knowledge of the identity and intentions of others.
Dynamics in Music: The Perception of Motion in Music
The question concerning many music psychologists today is whether our perception of motion in
music functions in a similar way to physical motion perception. Many psychologists, philosophers and
writers refer to the link between musical and physical motion, often using a combination of anecdotal
and empirical evidence in an effort to explain a perceptual link which clearly exists. Storr (1992)
discusses how music simply 'resonates' with our physical experiences. Ansdell (1995) describes the
ability of music to 'animate' us. That is, to provide an impetus for movement. Truslit, who likens
music to an "invisible, imaginary dance" (see Repp 1993) describes how musical phrases/gestures can
be thought of as physical 'shapes'. In one experiment, Truslit asked musicians to interpret the manner
in which they should play a scale from looking at a drawn, linear shape. He describes how the
performances changed markedly according to what 'shape' is being read, and notes that he could
successfully predict the gestural quality of the performance from the given shape.
Evidence supporting our ability to think in shapes as opposed to discrete, individual elements can be
seen in the work of Dowling (1982). He describes how our memory displays a definite preference to
remembering melodic sequences as a single contour (i.e. a single 'shape') instead of a group of
discrete tones. Furthermore, Chang and Trehub (1977) revealed that infants as young as five months
old could recognise contour changes in a-tonal melody; thus revealing that the infants could recognise
a contour as an entity in itself. Davidson (1993) found that subjects could distinguish between three
different manners of performance (exaggerated expression, normal expression and deadpan
expression) from the performers' movements alone, as well as from the music alone. This experiment
reveals that expressive information can be co-specified in a performer's music and movement.
The above studies may support the claim that there are similar perceptual elements available in both
physical and musical motion. Clearly many more studies need to be carried out in order to explore
motion perception in music; however, it has been shown both that movement perception is a highly
evolved, sensitive system and that similar motion information can be specified in the acoustic
spectrum. It remains to be seen whether the motion information available in music can signify
psychological states and intentions in the same way and to the same extent as physical motion.
The Language of Dynamics: The Invariant Perceptual Elements of Motion
Aristotle, describing Plato's 'Theory of Forms' notes that if thought needs an object on which to think
then there must be fixed concepts of aspects of the environment. These fixed concepts would enable
us to recognise objects for what they are, even though they appear somewhat different when they are
in motion, as "....there is no knowledge of things which are in a state of flux", (from 'Metaphysics', see
Flew 1989). This, rather neatly, presents the central conundrum facing psychologists interested in the
perception of motion. The psychologist Gibson provided the first comprehensive theoretical proposal
that tackled this problem. Gibson (1960) suggests that in order to perceive a stable environment, we
must be able to detect constant, invariant elements in the visual and acoustical array. These
'invariants', as Gibson calls them, consist of certain ratios and proportions that can be directly
perceived (i.e. without further cognition) by the observer. For instance, all the elements we need to

specify the cause of an acoustic event, Gibson argues, are available in the 'wave train'. The complex
frequency array specifies the vibratory event and the sequence of transients provide information
pertaining to the temporal profile of the event. However, I find that this theory could be somewhat
misleading for the psychologist studying expressive motion as Gibson infers that flux information is
to a large extent discarded in the process of perceiving invariants: "The hypothesis is that constant
perception depends on the ability of the individual to detect invariants, and that he ordinarily pays no
attention whatever to the flux of changing sensations" (Gibson 1960). For instance, when considering
the perception of a flat object then it would indeed be necessary to perceive the invariant properties of
the object - its flatness, density and weight - and discard the enormous amount of changing sensory
data from our fingertips. However, when it comes to the perception of expressive motion then it is just
this very complex, constantly changing information that we need to perceive. Clearly there must be
invariant perceptual factors available in motion, but they are invariants that change from moment to
moment. Mechanisms must exist to perceive invariant factors at different stages of flux and the
perceptual system would need to recognise these factors as essentially transient in form. When
listening to musical motion, there may be certain fluctuating elements of the acoustical spectrum.
These would enable us to comprehend the meaning of the motion, as acoustical information contains a
deluge of highly complex frequency and temporal information. Deutsch (1999) notes that there exists
many ordering and grouping mechanisms that serve to recreate the original acoustic event. Grouping
mechanisms rely, to some extent on the perception of constant factors within the acoustical array.
The Meaning of Flux: Physical Gestures Heard in Sound
The dance theorist Rudolf Laban describes the content factors necessary to produce movement for the
purpose of dancer/actor training. Laban (1960) states that in order to move we need to manipulate the
weight of our bodies, the space around us, the temporal profile of our actions and the flow or
continuity between movements. These four factors, (or 'motion factors' as Laban refers to them) of
Weight, Time, Space and Flow can be combined and manipulated to produce the entire spectrum of
human movement. Laban describes the construction of eight fundamental actions or gestures from
combinations of the most basic opposite attitudes that can be displayed towards the motion factors of
Weight, Time and Space. (Note that the element of Flow only comes into play when two or more
consecutive actions are carried out.) Thus our attitude can be firm or gentle towards Weight, quick or
slow towards time, and direct or flexible towards space. An action such as a 'dabbing' can be produced
by displaying a light attitude towards Weight, a direct spatial trajectory, and a quick attitude towards
Time. Changing the attitude of one of these elements results in a different action. For instance, if the
spatial trajectory becomes more flexible in approach then the dabbing action becomes one of
'flicking'.
I propose that these 'motion factors' are also active motion formatives within a musical utterance. A
performer can manipulate the degree of Weight placed on a note, the movements of notes within a
virtual musical Space, the temporal profile, and the manner in which musical ideas travel from one to
the next. The physical actions described above, and many more, are in fact used by the musician in
order to create a vast diversity of expressions. For instance, to produce a piano, staccato note on the
clarinet, the action of the tongue against the reed would be one of dabbing. Increase the Weight or
force of the tongue movement, combined with an increase in the force of the air column and the action
goes towards punching. Whereas, decreasing the 'quickness' of the dabbing would produce a more
sustained action, referred to by Laban as 'pressing'.
Weight, Time, Space and Flow can be seen in actions and heard in music. Within a single movement,
these motion factors constantly fluctuate, yet, a physical or acoustic 'punch' is still recognised as a
single gesture. This may function in the same way as Clarke (1987) describes our perception of a

non-musical sound event, such as the jangle of a bunch of keys. Clarke notes that we would recognise
the source of the sound as a unit of information i.e. 'keys' and not as a highly complex sound
spectrum. Using Gibson's (1960) theory this is possible through the perception of invariants in the
acoustic array, which signify the identity of the event. My conjecture is that the motion factors may
constitute fluctuating invariants that combine to form meaningful units of perceptual information
pertaining to the expressive motion of physical and acoustic events. For music, this means that a
passage of music may be heard as a complex array of gestures and other body-based movement
sequences. However, this theory does not suggest that musical motion must allude to physical motion
for it to be meaningful. Rather, it is proposed that the perceptual elements, which are available in both
musical and physical motion constitute a meaningful language in themselves.
The production of a singular gesture is clearly only part of our movement potential. The human body
is capable of complex sequences of gestures, different gestures produced simultaneously and motion
sequences without distinct gestures. It is feasible within Laban's theory to account for all of these
possibilities, although it is beyond the scope of this paper to describe the entire theory and how
practitioners have interpreted it. This theory does not attempt to describe the endless possible
variations of a movement, but the basis of how all movements and motion sequences may be
constructed.
The Meaning of Flux: Musical Motion as Interpersonal Communication
Laban (1960), see also Carpenter (1965), addresses the interpersonal meaning of motion through
describing the 'inner participations' of the four motion factors. These 'inner participations' constitute
the mental attitudes that can be displayed by manipulation of the physical motion factors. In the
creation of meaning, Weight corresponds to a person's Intention, Space to their Attention, Time to
their Decision and Flow to their Progression. For an in-depth discussion of this process see North
(1998) who uses this aspect of Laban's theory for personality assessment through movement
observation.
The motion factors and their corresponding 'inner participations' are used for the training of actors at
the 'Drama Centre', London. Briefly, an actor can explore their Intention towards another character by
manipulating their use of the element of Weight, both in their bodily movements and in their voice.
For instance, by increasing the weight of their hand tapping on a table, perhaps combined with an
increase in the force of their vocal utterances, an actor can display increased Intention to exert their
will upon another character/event. Similarly, if an actor wanted to display fixed pinpointed Attention
they would use direct actions in space, be it a direct hand movement or a fixed glare. Whereas flexible
movements in space can display non-direct Attention, where the hand waves around and the eyes don't
come to fix upon a certain point.
If we can hear motion sequences and gestures within music, is it possible that we can also perceive
information pertaining to inner attitudes within the acoustic array? I believe that it is feasible that a
sustained increase in the force of a musical phrase may be perceived as increased Intention. This
musical phrase could even be perceived as 'dominating' over other musical lines that display less
Intention or force. Similarly, a highly flexible musical line with no appreciable direction could be
perceived as lacking Attention. However, attributing inner psychological states to musical sequences
could only ever be one of many ways to create meaning from a musical experience. Many differing
and plausible accounts of the meaning of music exist today, perhaps reflecting the multiplicity of
meanings which music can elicit for different people. For instance, Meyer (1956) suggests that the
stimulation of emotions from 'arousal and inhibition' sequences in music create meaning. This can
occur where knowledge of the musical syntax used leads us to expect certain outcomes, which can
then be delayed or changed unexpectedly. Nketia (1966) describes the meaning of traditional African

music as, amongst other things, reinforcing social structures and community consensus for its
participants. Salgado Correia and Davidson (1999) describe how performers create meaning in a work
by means of their own individual 'metaphorical projections', which are grounded in body-based
experience.
Applications
The implications of the perception of dynamic elements in music are many for the fields of music
psychology, music education, and for the wider domain of non-verbal communication. However, one
application of the theory proposed in this paper that warrants special mention is in the field of Music
Therapy. Many Music Therapists note that it is difficult to describe the musical material that arises out
of a session with a client (Pavlicevic 1998, Ansdell 1997, Bunt 1994). Traditional and structural
musical analysis, which tends to separate music into elements of melody, harmony and rhythm, may
only explain a small part of the client's musical communication. This becomes especially true when
the client has had little musical training/experience prior to therapy. Thus, Music Therapists have
relied on descriptive language in order to discuss the complex feelings and interpersonal musical
dialogues that arise during a session.
Analysing a musical communication in terms of 'dynamics', or motion impulses, may provide more
insightful information for the therapist to gain knowledge of the nature and manner of the client's
expression. An analysis of this type could be implemented on many levels; from simply noting what
motions are present or absent in a clients expressions to a fuller personal analysis of the type
described by North (1990) using the 'inner participation' theory mentioned above.
Bunt, describing Music Therapy with children, notes that the therapist must not simply imitate the
music of a child in order to join with them, but they must somehow get, "behind the sounds to make
contact with the child's world of feeling" (Bunt 1994). Possessing an in-depth understanding of human
motion expression could help the therapist understand, compliment and suggest new directions for a
client's expressive musical communication.
Summary
It was proposed at the beginning of this paper that both music and movement share a common
'language' of dynamic, motion-based elements. Evidence has been presented which suggests that
common, transmodal perceptual elements are available in the acoustic and visual array. Further
studies have shown that these elements contain information pertaining to the expressive qualities of
motion. Laban's 'motion factors' have been utilised to describe the possible construction and operation
of these perceptual elements. Continuing from this theory, the perception of inner attitudes and mental
states from motion perception has also been discussed. It was proposed that the motion-based
perceptual elements, termed 'dynamics', could provide meaningful information alone, or they could
signify information similar to that of interpersonal communication. However, it has been noted that
this could only account for one facet of the meaning and significance of music for both listener and
performance.
References
Ansdell, G. (1995) Music for Life. Aspects of Creative Music Therapy with Adult Clients. London:
Jessica Kingsley.
Bower, T. J. R. (1974) Development in Infancy. San Francisco: W.H. Freeman and Company.

Bunt, L. (1994) Music Therapy. An Art Beyond Words. London: Routledge.

Carpenter, W. (1965) Conflict and Harmony Between Man and Woman. Surrey: Art of Movement
Studio Publication.
Chang, H.W. Trehub, S.E. (1977) Auditory Processing of Relational Information by Young Infants.
Journal of Experimental Child Psychology, 24, 324-331.
Clarke, E.F. (1987) Categorical Rhythm Perception: An Ecological Perspective. In A. Gabrielsson
(Ed.). Action and Perception in Rhythm and Music. Stockholm: Royal Swedish Academy of Music.
pp. 19-34.
_____ Davidson, J.W. (1998) The Body in Performance. In W. Thomas (Ed.). Composition-
Performance-Reception: Studies in the Creative Process in Music. Aldershot: Ashgate. pp. 74-92.
Cutting, N. Kozlowski, L.T. (1977) Recognising Friends by their Walk: Gait Perception Without
Familiarity Cues. Bulletin of the Psychonomic Society, 9, 353-356.
Davidson, J.W. (1993) Visual Perception of Performance Manner in the Movements of Solo
Musicians. Psychology of Music 21, 103-113.
Deutsch, D. (1999) Grouping Mechanisms in Music. In D. Deutsch (Ed) The Psychology of Music.
2nd Edition. San Diego: Academic Press. pp. 299-348.
Dowling, W.J. (1982) Melodic Information Processing and its Development. In D. Deutsch (Ed.). The
Psychology of Music. New York: The Academic Press Inc. pp.413-429.
Flew, A. (1989) An Introduction to Western Philosophy. Ideas and Argument from Plato to Popper.
New York: Thames and Hudson Inc.
Gibson, J.J. (1968) The Senses Considered as Perceptual Systems. London: George Allen and Unwin
Ltd.
Heider, F. Simmel, M. (1944) An Experimental Study of Apparent Behaviour. American Journal of
Psychology, 57, 243-259.
Kozlowski, L.T. Cutting, J.E. (1977) Recognising the Sex of a Walker from a Dynamic Point-Light
Display. Perception and Psychophysics 21, 575-580.
Nketia, J.H.K. (1966) A Review of the Meaning and Significance of Traditional African Music.
Accra: Institute of African Studies. University of Ghana.
Laban, R. (1960) The Art of Movement and Dance. 2nd Edition. London: MacDonald and Evans.
Maratos, O. (1998) Neonatal, Early and Later Imitation: Same Order Phenomena? In F. Simion and G.
Butterworth (Eds.). The Development of Sensory, Motor and Cognitive Capacities in Early Infancy.
East Sussex: Psychology Press Ltd. pp. 145-160.
Meltzoff, A.N. (1981) Imitation, Intermodal Co-ordination and Representation in Infancy. In G.
Butterworth (Ed.). Infancy and Epistemology: An Evaluation of Piaget's Theory. Sussex: The
Harvester Press. pp. 85-114.
Meyer, L.B. (1956) Emotion and Meaning in Music. Chicago: University of Chicago Press.
McLaughlin, J.T. (1992) Nonverbal Behaviours in the Analytic Situation: The Search for Meaning in

Nonverbal Cues. In S. Kramer and S. Akhtar (Eds.). When the Body Speaks. Psychological Meanings
in Kinetic Clues. Northvale, New Jersey: Jason Aronson Inc. pp.131-161.
Nketia, J.H.K. (1966) A Review of the Meaning and Significance of Traditional African Music.
Accra: Institute of African Studies. University of Ghana.
North, M. (1990) Personality Assessment Through Movement. 2nd Edition. Plymouth: Northcote
Papousek, M. (1996) Intuitive Parenting: A Hidden Source of Musical Stimulation in Infancy. In I.
Deliege and J. Sloboda (Eds.). Musical Beginnings. Origins and Developments of Musical
Competence. Oxford: Oxford University Press. pp. 88-108.
Repp, B.H. (1993) Music as Motion: A Synopsis of Alexander Truslit's (1938) Gestaltung und
Bewegung in der Musik. Psychology of Music 21, 48-73.
Runeson, S. Frykholm, G. (1983) Kinematic Specification of Dynamics as an Informational Basis for
Person-and-Action Perception: Expectation, Gender Recognition, and Deceptive Intention. Journal of
Experimental Psychology. General. Vol. 112 No.4, 585-615.
Salgado Correia, J. Davidson, J.W. (1999) Meaningful Musical Performance. Unpublished paper.
Spelke, E.S. (1976) Infant's Intermodal Perception of Events. Cognitive Psychology 8, 553-560.
_____ (1987) The Development of Intermodal Perception. In P. Salaptek and L. Cohen (Eds.).
Handbook of Infant Perception. Vol. 2. From Perception to Cognition. Orlando, Florida: Academic
Press, Inc. pp. 233-267.
Stern, D.N. (1985) The Interpersonal World of the Infant. A view from Psychoanalysis and
Developmental Psychology. New York: Basic Books, Inc.
Storr, A. (1992) Music and the Mind. London: HarperCollins Publishers.
Back to index

Similarity and Categorisation in Boulez’
Parenthèse from the Third Piano Sonata: A
Formal Analysis.
Christina Anagnostopoulou ? and Alan Smaill y
?
Faculty of Music, University of Edinburgh
y
Division of Informatics, University of Edinburgh
chrisa@music.ed.ac.uk, smaill@dai.ed.ac.uk
1 Introduction
Categorisation is the process of detecting structures and similarities between the
objects in the world, and grouping similar objects together into classes. This pro-
cess lies at the basis of most human cognitive activities. Equally, similarity and
difference relations play a fundamental role in the internal structure of a musical
piece, and in our musical understanding (e.g., [Deliège, 96]). Many theories and
analytical methods in music, such as traditional morphological analysis, paradig-
matic analysis, pitch class set theory, motivic analysis and so forth, are based on
similarity relations.
A problem with a categorisation-based approach to music analysis is that often
the categories in musical pieces are chosen intuitively, making it difficult to justify
the choice of a specific class for a musical segment, and introducing inconsisten-
cies into the analysis. In this paper we address this problem by presenting a formal
approach to categorisation which is based on a clustering algorithm that operates
on well-defined descriptions of musical segments, and we apply this approach to
the analysis of a musical piece, namely, Boulez’ Parenthèse, a movement from his
3rd piano sonata.
The rest of this paper is organized as follows: in the next section, we briefly
describe Boulez’ 3rd piano sonata and Parenthèse, and we discuss the challenges
that this piece poses to the analyst. Then, we explain in detail our formal approach
Many thanks to Gert Westermann and Fred Howell.
1
to categorisation, including segmentation of the piece, representing the segments
in terms of musical features, and clustering these representations with a compu-
tational algorithm. Section 4 describes the categorisation experiments that were
carried out, and the results of these experiments are presented in section 5. In
section 6 we discuss these results and suggest directions of future research.
2 Boulez’ 3rd Piano Sonata

Pierre Boulez’ 3rd piano sonata is based on difference as much as on similar-
ity. There are various strong relationships between the movements that need not
concern us here, where we aim to study the low-level relationships of a single
movement. According to Ivanka Stoianova [Stoianova, 1978], ‘repetition is vi-
tal, although it is “a repetition-difference” within the circumstances of the serial
writing [...] In reality, it is a different kind of repetition, which is the principal
generator of dodecaphony and serialism.’
Parenthèse consists of 6 fragments of music that are obligatory to play, and in
between them are 5 fragments of music in parentheses, which are optional to play.
According to Stoianova [Stoianova, 1978, p.140], Parenthèse is the
“microcosm of the symmetrical structure of the whole sonata. The

presence of the obligatory and optional (in parentheses) fragments
implies the co-existence of two symmetric structures: a circular sym-
metry of the obligatory groups and another similar one of the groups
in parentheses.”
In order to capture this structure, the analysis of the entire piece can be split
into three parts: first, the analysis of the six obligatory fragments, second, the
analysis of the optional fragments in parentheses, and third, the relation between
obligatory and optional fragments. In this paper, we demonstrate a full analysis
of the first part, that is, the obligatory fragments of the piece.
Within Parenthèse we observe different similarity relations between its seg-
ments: first, the dodecaphonic “repetition-difference”, which is based on the use
of pitch class sets, and second, similarity relations in musical properties such as
rhythm and tempo, tonal centres, intervals, contour, and way of playing.
The method of analysis that we present in this paper aims to bring out these
relations. The aim is, on the one hand, piece-specific: to demonstrate the structure
of the obligatory part of the piece. On the other hand, a more general aim is to
demonstrate how the formal method of analysis, that has previously been shown
2
to work for monophonic pieces ([Anagnostopoulou and Westermann, 97]) can
be applied to a non-monophonic, atonal piece of music with very rich internal
relations, and where a hierarchical segmentation is needed.
3 The Analysis
The analysis method is a formalised and extended version of Paradigmatic Ana-
lysis [Ruwet, 1996; Nattiez, 1975]. The formalisation consists in dividing the
analysis process into discrete steps, fully specifiying the representations at each
step, and performing the clustering of the musical segments with a well-defined
algorithm.
The analysis process is illustrated in figure 1. First the piece1 is broken down
hierarchically into smaller segments, and then each of the segments is described
as a set of properties. The description of the segments is then turned into an ap-
propriate computational input in the form of feature vectors, and the classification
algorithm takes this input and produces a hierarchic classification of the segments.
The result of this process is a categorisation analysis that makes similarity rela-
tions explicit. In the following sections, we describe each step in detail.
Segmentation
of piece
Segments as transfomration into Classification
sets of properties featurevectors of segments
List of musical
properties
Sim-Cat
analysis
re-evaluate
Figure 1: A general overview of the Similarity and Categorisation Method of

Analysis.
3.1 Segmentation
In most formal methods of analysis, the music piece is first split into segments. It
is important to consider that the precise way in which the piece is segmented has
a profound influence on the outcome of the analysis.
1
By using the term piece we mean the obligatory fragments that are analysed here.
3
In Parenthèse, segmentation is an easier task than for most pieces, since in
most places the segmentation points are clearly indicated by the composer. We
define segment boundaries
at the beginning and end of the fragments in parentheses

where the piano stave is marked with V as a break point
where there is a more or less obvious change of texture, that is, between
segments 2a and 2b, and 4c and 4d. This also coincides with the change
of a pitch-class set, and this segmentation therefore corresponds to the so-
called imbrication method of segmentation [Forte, 73].
Figure 2: The obligatory fragments of Parenthèse and their sub-segments.
The resulting segmentation of the piece is shown in figure 2. We denote the

obligatory fragments with numbers 1,. . . , 6. These fragments are then further
divided into segments 1a, 1b, 1c, 2a, 2b, 2c and so on.
In the following experiments, we use three levels of segments: the undivided
high-level segments 1,. . . , 6, the low level segments 1a, 1b, and so on, and an
4
intermediate level where we combine certain adjacent low-level segments: for ex-
ample, the low level segments 1a and 1b form the intermediate level segment 1ab.
By this we hope to capture similarities that exist between the different segmenta-
tion levels.
3.2 Description of Segments as Sets of Properties

The term property is often used interchangeably with feature and attribute, and
here we use it in the same way. A property value can denote the presence or ab-
sence of a feature in a segment (e.g., crescendo), or it denotes an attribute that can
take one of several (mutually exclusive) values, e.g., key. In order to translate such
a multi-valued property into a binary form which is required for the classification
algorithm, we use a 1-out-ot-n encoding: out of the n possible values, the one that
is present is set to 1, and all others to 0.
Like the choice of segmentation, the choice of properties to describe the seg-
ments has a profound influence on the results of the computational classification:
the algorithm groups the segments according to their similarity, and this simil-
arity is determined by the property values for each segment. What makes two
music segments identical, similar or different will obviously depend crucially on
the property selection, and on the way of representing the properties. Whereas the
choice of properties is made by the analyst, the formal method of analysis shows
precisely how this choice influences the resulting analysis.
In developing a set of properties, a segment is analysed in terms of various
musical properties that seem important for its description and for its differentiation
to other segments. Then, all properties that have been chosen for the segments
are combined into one set, and each segment is described in terms of this set of
properties.
The description of a segment by a list of properties is not complete: it is im-
possible, based on the properties, to re-create the particular segment they describe.
Instead, the proeprties contain all information about a segment that are considered
important for the further analysis of the piece. Different analyses warrant different
properties: for example, a rhythmic analysis would describe the rhythmic proper-
ties of each segment in detail, and an analysis aiming to compare certain features
across a wider music repertoire would emphasize those features.
Two kinds of properties can be used for describing a musical segment:
properties that are true for a part of the segment, for example, the existence
of a specific interval in the segment
5
properties that are true for the whole of the segment, for example a rising
melodic movement
In our approach we mainly make use of the second kind of properties, with the
exception of specific rhythmic and intervallic patterns that describe merely part of
a segment.
Table 1 shows the properties that we use in the analysis, and the segments in
which they are found. The properties considered here are:
the existence of various pitch-class sets and certain common subsets that
they share. The reason to consider the common subsets is to reinforce simil-
arity between the sets that the composer uses, which are indeed very similar
to each other. In order for a pitch-class set to be true for a segment, all the
notes of the segment have to belong to the pitch class set.
the existence of various rhythmic patterns. These in contrast do not require

for all the notes of the segment to belong to the specific rhythmic pattern.
tempo and dynamic descriptions. The composer is very specific about which
tempo and dynamic descriptions he uses, and these are important for the
distinction of the segments and the overall structure of the piece, so in a
classification task of this piece, they should be taken into account.
tonal centres, which in this case are single tones rather than keys, and rela-
tions between tones, significant intervals that the composer seems to favour,
and contour information.
Table 1 shows how each segment is “translated” from musical notation to a

set of properties. The reason for this transformation is to achieve, at a next step,
a consistent classification. For this reason we need to have the criteria set forth
before the classification takes place.
Describing the segments in terms of properties (cf. table 1) results in a 34-bit
feature vector for each segment, making it thus appropriate computational input
for the classification module.
3.3 Classification
The classification of the segments that are represented as feature vectors, is carried
out with a computational algorithm. This approach differs from the traditional
Ruwet/Nattiez Paradigmatic Analysis in that
6
property 1a 1b 1c 2a 2b 2c 3 4a 4b 4c 4d 5 6a 6b
3-1(12) y y y
4-1(12) y- -y- -y
7-2 y- -y- -y y- -y- -y
6-9 y- -y y- -y y- -y
5-2 y- -y y y y- -y
5-5 y y
all y- -y- -y- -y
inv 012 y- -y y y -y- y y y -y y- y y y- -y
inv +3 y- -y -y y -y- -y y y -y y- y y- -y
inv +5 y- -y y- -y- -y y y -y y- -y y- -y
inv +7 y -y- -y y- -y y- y
longn y y y y y y
Q,Q y y
4note y y
SQdot y y
triplex y y y y
exact y y y y y
précip y y y y
cédé y y y y y
mf+ y y y y
cresc y- -y- -y y y y
dimin y y y- -y
steady y y y?- -y- -y y y y
Gis/Aes y y y y
G,Gis,A y- -y y- -y y- -y- -y y- -y
D y y y
Cis,D,Dis y y y y
semit y y y y- -y y y y y y
tritone y y y y
third y y y y y
wob y y
down1 y y y
down2 y y
up2 y
Table 1: The lowest-level obligatory segments (1a, ..., 6b) and the properties that
are true for each segment. When a property exists in a segment, then this is marked
by a “y”. When there is a property that is true for a bigger segment but not for
the low-est level, then this is marked in the lowest-level segments that the bigger
segment is made from, by using “y-”, “-y”, “-y-”, according to which adjecent the
property is shared with. The first part of the table contains the pitch-class sets and
their common subsets, the second part contains the rhythmic patterns, the third
part contains the directions by the composer on tempo and way of playing, the
fourth part contains tonal centres and specific intervals and the last part contains
contour information.
7
Figure 3: Construction of a GNG network. Small circles represent input data, and
large circles connected with edges are the units of the network.
the classification process is formalised and depends on explicit criteria, that

is, the set of properties, thus avoiding intuitional and unfounded decisions.
the classification proceeds in an approximately hierarchic way, from the

whole piece being considered as one class to each segment being considered
as a separate class.
the algorithm develops probabilistic prototypical values of the class proper-

ties, directly showing similarities and differences between the classes.
The algorithm used here is an unsupervised neural network clustering al-

gorithm, Growing Neural Gas (GNG) [Fritzke, 1995], which has been used pre-
viously for musical analysis of different musical styles and has been shown to
produce valid results ([Höthker, Hörnel and Anagnostopoulou, 2000; Anagnosto-
poulou and Westermann, 97]).
The GNG consists of units that move towards the center of the classes, and
during the learning process it adds units at a constant interval, effectively increas-
ing the number of classes in the analysis. Each segment belongs to the class of its
closest unit.
Figure 3 shows the development of a network in a two-dimensional input space
with four distinct clusters. The network starts with two units and can therefore dis-
tinguish only between the two main clusters. After a certain number of presenta-
tions of the feature vectors (here 500 presentations of each vector), a new unit is
inserted and the units move to the positions indicated in the second picture. When
the fourth unit is inserted, the input units distribute over the four clusters.
In principal, insertion of units proceeds forever. The GNG algorithm thus lets
the analyst define the level of grainedness of her analysis and does not impose
a priori constraints on the number of clusters. Each unit forms a prototype of
8
a cluster, expressed in the probability distribution of the feature values of their
cluster members, and the distances between the units can be measured to gain
information about the similarity between the classes.
4 Experiments
We performed four experiments:
In the first experiment, the classification algorithm was trained on the feature
vectors that represent the segments on the smallest level only: 1a, 1b, 1c, 2a, 2b,
2c, 3, 4a, 4b, 4c, 4d, 5, 6a, 6b. The properties that stretch over adjacent smallest-
level segments were not taken into account.
In the second experiment, the algorithm was again trained on feature vectors
representing the smallest-level segments, but this time they were enhanced with
those features that stretch over segment boundaries. For example, if segment 1
has a property a that is not reflected in its sub-segments 1a, 1b, and 1c, then here
these sub-segments inherited this global feature.
In the third experiment, all segmentation levels were represented in parallel
and the algorithm was trained on the full set of lowest-level segments 1a,. . . ,6b,
the highest level segments 1,. . . ,6, and certain middle-level segments such as 1ab,
4bcd, and so on. In contrast to experiment 2, the lowest-level segments were only
represented by their own properties and not the shared ones.
In the fourth experiment, we considered only a selection of eight segments
drawn from all the levels: 1ab, 1c, 2, 3, 4a, 4bcd, 5 and 6.
By comparing the developing network architecture over a period of insertion
of units, we were able to observe the hierarchy of classes.
5 Results
Table 2 shows the results of experiments 1 and 2, when the number of classes is 5
(that is, when the network has inserted 5 units). Table 3 shows the results in the
same two experiments, when there are 7 and 8 classes.
The results of experiments 1 and 2 are all intuitively acceptable, although those
from experiment 2 seem slightly better. Table 2 shows the results of experiment
1 with 5 classes: here, 1a, 2b, 2c, 4b, 4c, 6b belong to the same class. This
classification would have been better if segments 1a and 6b were in a different
class from the others, since they contain long notes whereas the other segments
contain shorter notes. This difference could be enhanced by introducing an extra
9
Class Exp 1 Exp 2
Class I 2a, 4d 2a, 2b, 2c, 4b, 4c, 4d
Class II 3, 4a 1c, 5a
Class III 1a, 2b, 2c, 4b, 4c, 6b 1b, 6a
Class IV 1b, 6a 3, 4a
Class V 1c, 5a 1a, 6b
Table 2: The experimental results in the two first experiemtns when the number
of classes is 5.
Class Exp 1 Exp 1 Exp 2

Class I 2a, 4d 2a, 4d 2c, 4b
Class II 3, 4a 3, 4a 1c, 5a
Class III 1a, 6b, 4b 1a, 6b 1b, 6a
Class IV 1b, 6a 1b, 6a 1a, 6b
Class V 1c, 5a 1c, 5a 3, 4a
Class VI 2b, 4c 2b, 4c 2a, 4d
Class VII 2c 2c 2b, 4c
Class VIII – 4b –
Table 3: The experimental results in the 2 experiments when the number of classes
is 7 or 8.
feature note-length in the list of properties describing the segments. This is an

example of re-evaluation of the segment descriptions.
Table 3 shows the classification for experiment 1 with 7 and 8 classes. Here
we see that the same segments are separated into three classes when the overall
number of classes is 7. Therefore a bigger number of classes produces more
satisfactory results in this case.
Whereas experiment 1, which does not incorporate properties that stretch over
segment boundaries, emphasised the iconic similarity between segments, in exper-
iment 2 the structural similarity between segments is enhanced due to the added
“global” features relating to higher-level segments. Here, all the subsegments of
segments 2 and 4 are in the same class. Even though the iconic similarity of these
segments is low (e.g., 4b and 4d), they both share the global properties of segment
4.
Figure 4 shows the progression of the classification in experiment 2 from 2 to
10 classes. This is an intuitively successful example of hierarchic classification,
which shows the symmetrical structure of the piece.
10
2 classes 2a, 2b, 2c, 4b, 4c, 1c, 5a, 3, 4a
4d, 1b, 6a, 1a, 6b
3 classes 2a, 2b, 2c 1b, 6a,

4b, 4c, 4d 1a, 6b
4 classes 1c, 5a 3, 4a
5 classes 1b, 6a 1a, 6b
6 classes 2a, 2b 2c, 4b

4c, 4d
7 classes 2a, 4d 2b, 4c
8 classes 3 4a
9 classes 2b 4c
Figure 4: Hierarchic classification for set 2.
In experiment 3 all levels of segments are taken into account. The results for
5 and 8 classes are shown in table 4. In this case we often get segments and
their subsegments classified in the same category, since they share many of their
properties (for example, segments and subsegments of 2 and 4). This problem
cannot be avoided in such a setting and the results need further interpretation in
order to be valid. For this reason, 5 classes seems to be too few classes for an
acceptable classification. When the number of classes increases to 8, the results
improve: 3 and 4a are correctly classified into a category of their own, and the
same holds for 1b and 6a. It is interesting to see segment 4 on a category of its
own, since it is the longest segment of all. Segments 2 and 4bcd are placed in the
same category and are an example of similarity across levels. In general, 8 classes
seem to be sufficient for demonstrating the symmetry of the segments, although
one needs to consider carefully which segments denote this and which are merely
related subsegments of the same bigger segment.
Experiment 4 is the simplest experiment because we consider only a selection
of 8 segments across levels. These are chosen in order to show the structure of the
piece. Table 5 shows the resulting classification when having 4 classes: the first
and last segment, 1ab and 6, are classified together, and the same is true for 1c
and 5, 2 and 4bcd and 3 and 4a. These segments are almost mirror images of each
other, and define the symmetrical structure of the piece.
11
Class Exp 3, classes 5 Exp 3, classes 8
Class I 1c, 3, 4a, 4, 4ab 1c, 5a, 4ab
Class II 1a, 2b, 2c, 4b, 4c, 6b, 2bc 1a, 2b, 2c, 4b, 4c, 6b
Class III 2a, 4d, 2, 2ab, 4cd, 4bcd 1, 6, 1ab, 1bc
Class IV 1b, 6a, 1, 6, 1ab 2a, 4d, 2, 2ab, 4cd, 4bcd
Class V 1bc 4
Class VI 1b, 6a
Class VII 2bc
Class VIII 3, 4a
Table 4: The experimental results in the third experiment when the number of
classes is 5 and 8.
Class Experiment 4
Class I 1c, 5
Class II 2, 4bcd
Class III 1ab, 6
Class IV 3, 4a
Table 5: The experimental results in experiment 4, with 4 classes.
6 Conclusions
We presented a formal method of analysis based on categorisation of music seg-
ments according to similarity. We applied this method to the analysis of Boulez’
Parenthèse from the 3rd piano sonata, taking into account the obligatory fragments
of the piece. The resulting hierarchic classification defines the similarity and dif-
ference relations between classes and between segments. We demonstrated how a
classification analysis is appropriate for this piece and how it brings out the sym-
metrical structure that the composer intended. This method of analysis, shown
previously to work on more traditional kinds of music, is shown here to be appro-
priate for an atonal and non-monophonic piece of music.
The results give many interesting insights on the obligatory fragments. In
terms of internal relations, it is a very rich piece, each note situated in its position
for a variety of reasons, forming part of an overall plan. More specifically, we see
that the piece also has an interesting tonal structure, evolving mainly around G
sharp at the beginning and end, and around D in the middle of the piece. The pitch
class sets used are very similar to each other, segments 2 and 4 sharing sets, and
the same for segments 1, 3 and 6. Dynamics and tempo seem to be very import-
12
ant for the segmentation and difference between subsegments, whereas contour
information seems to be reflecting the symmetrical structure of the piece.
The issue of hierarchic segmentation in a classification task poses interesting
challenges to the analyst. When classifying all the levels at the same time, on the
one hand we get interesting similarities across levels, but on the other hand we get
similarities between segments and their subsegments which are redundant.
The results depend on the initial representation, that is the choice of properties
according on which each segment is described. A different choice of properties
would yield different results. However, a bad resulting classification would show
that the initial properties were not chosen carefully, and a re-evaluation of these
properties is needed. In that way, the analyst can revise the initial properties. This
procedure can go on until an acceptable classification is produced.
The principles of similarity and difference are common principles to the vast
majority of musical repertoire. It can be argued that they are responsible to a large
extent for cohesion and coherence within the musical piece.
7 Bibliography
Anagnostopoulou, C. and Westermann, G, 1997, Classification in Music: A Formal
Model for Paradigmatic Analysis, Proceedings of the International Computer Mu-
sic Conference, Thessaloniki, Greece.
Deliège, I., 1996, Cue Abstraction as a Component of Categorisation Pro-
cesses in Music Listening, in Psychology of Music, 20(2).
Forte, A., 1973, The Structure of Atonal Music, Yale University Press, New
Haven and London.
Fritzke, B., 1995, A Growing Neural Gas Algorithm Learns Topologies, in G.
Tesauro and D. S. Touretzky and T. K. Lean (eds), Advances in Neural Informa-
tion Processing Systems 7, MIT Press.
Höthker, K., Hörnel, D. and Anagnostopoulou, C., 2000, Investigating the
Infuence of Representations and Algorithms in Music Classification, in Computers
and the Humanities, 34(4), in press.
Nattiez, J.J., 1975, Fondements d’une Sémiologie de la Musique, Union Générale
d’Editions, Paris.
Stoianova, I., 1978, Geste-Texte-Musique, Union Générale d’Editions, Paris.
Ruwet, N., 1966, Méthodes D’Analyse en Musicologie in Revue Belge de Mu-
sicologie 20.
13
Self-Regulation and Musical Practice:
Proceedings paper
Self-Regulation and Musical Practice: A Longitudinal Study

Gary E. McPherson and James M. Renwick
School of Music and Music Education
The University of New South Wales
Sydney, Australia
Because practice is essential for successful learning on a musical instrument, it is the focus of attention for a
number of music psychologists (Ericsson, Krampe & Tesch-Römer, 1993; Slododa, Davidson, Howe & Moore
1996). One of the most discussed aspects concerns deliberate practice (Ericsson et al., 1993), a term used to
describe goal-oriented, structured and effortful facets of practice in which motivation, resources and attention
determine the amount and quality of practice undertaken. Studies show that a major characteristic of expert
musicians is the amount of deliberate practice they have undertaken during the many years required to develop
their skills to a high level (Ericsson, 1997). Expert musicians exert a great deal more effort and concentration
during their practice than less skilled musicians, and are more likely to monitor and control their playing by
focusing their attention on what they are practising and how it can be improved (Ericsson, 1997).
From a somewhat different perspective, Sloboda and Davidson (1996) contrast 'formal' and 'informal' aspects of
practice. They found that high-achieving musicians tend to do significantly greater amounts of 'formal' practice,
such as scales, pieces and technical exercises, than their less successful peers. High achievers are also likely to
report more 'informal' practice, such as playing their favourite pieces by ear, 'messing about,' or improvising.
Sloboda and Davidson conclude that these 'informal' ways of practising contribute to musical success because the
highest achieving students are able to find the right balance between freedom and discipline in their practice.
Similarly, in a study designed to explore motivational and self-regulatory components of instrumental
performance, McPherson and McCormick (1998) identified three aspects of practice which they defined as
'Informal/Creative Activities' (i.e., playing by ear and improvising for one's own enjoyment), 'Repertoire' (i.e.,
learning new pieces and performing older familiar pieces), and 'Technical Work' (i.e., using a warm-up routine,
practising scales, arpeggios, and études, and sight-reading music). Results showed that the amount of time
students report practising each week in each of these three areas was significantly related to the quality of their
cognitive engagement during their practice and also to how much they reported enjoying music and playing their
instrument. Students who undertook higher levels of practice were more likely to rehearse music in their minds
and to make critical ongoing judgements concerning the success or otherwise of their efforts. They were also more
capable of organising their practice in ways that provide for efficient learning, such as practising the pieces that
need most work and isolating difficult sections of a piece that need further refinement. These results suggest that
students who are more cognitively engaged while practising not only tend to do more practice, but enjoy learning
their instrument more and are also more efficient with their learning.
In the last decade, a body of observational research on expert practising has emerged (Hallam, 1995;
Miklaszewski, 1989; Nielsen, 1999). This work has analysed experts' highly developed use of learning strategies.
These strategies include (1) the manipulation of the speed of work and the size of repeated material depending on
its familiarity and complexity, (2) the creation of dependable motor programs through adherence to consistent
technical plans such as fingerings, and (3) the use of musical structure to facilitate memorisation (Chaffin &
Imreh, 1997). Other observational research has compared novices with experts to investigate the gradual
emergence of such strategies over time (Gruson, 1988; Williamon & Vallentine, in press).
file:///g|/Sun/McPherso.htm (1 of 8) [18/07/2000 00:28:02]

In contrast to these expertise-oriented perspectives, musical practice can also be studied in terms of the
self-regulated processes that students use to study their instrument. For many schoolchildren, practice plays a role
that is close to homework (Xu & Corno, 1998). Effective practice, like efficient homework, requires
self-regulation, which is evident when students are "metacognitively, motivationally, and behaviorially active
participants in their own learning process" (Zimmerman, 1986, p. 308). In this conception, self-regulation is not
seen as a fixed characteristic, but rather as a set of context-specific processes that students select from in order to
accomplish a task (Zimmerman, 1998). The degree to which these self-regulatory processes are employed by
students depends on six dimensions, which appear to be consistent across a range of diverse disciplines such as
music, sport and academic learning (Zimmerman, 1994, 1998). Reinterpreted for musical practice, these
dimensions incorporate:
1. Motive - feeling free to and capable of deciding whether to practise
2. Method - planning and employing suitable strategies when practising
3. Time - consistency of practice and time management
4. Performance outcomes - monitoring, evaluating and controlling performance
5. Physical environment - structuring the practice environment (e.g., away from distractions)
6. Social factors - actively seeking information that might assist (e.g., from another family member, teacher,
practice diary or method book).
We consider this self-regulatory perspective to be particularly attractive. Not only does it enable us to clarify key
processes involved in efficient musical practice and to compare these with other disciplines, but it may lead to a
more complete understanding of musical learning with implications for optimising practice. Consequently, it is
from this perspective that the present study is grounded.
Parameters of the Study
So far, most research has concentrated on defining the processes that lead to expert performance, often through the
use of retrospective accounts and studies in which performers are asked to prepare researcher-assigned pieces for a
formal performance. Relatively little research has studied practice in naturalistic settings, free of
researcher-imposed restrictions. Another gap in existing literature concerns the very beginning stages of learning
an instrument and particularly what young children actually do when practising their instrument at home. To
expand knowledge in this area, we analysed the videotapes of children's home practice using a procedure that
attempted to make our observations as 'normal' and therefore as ecologically valid as possible. Data we obtained
from the analyses of these videos was used to supplement information obtained from regular, detailed interviews
with a larger sample of 156 children in eight primary schools (K-6) who commenced learning at the same time,
and from interviews with their mothers, classroom teachers, and instrumental teachers. Our purpose was to
synthesise findings from the larger sample of interview data with new information obtained from analysing
children's videotaped practice, in a way that would shed light on the self-regulatory processes outlined earlier.
Procedure
At the beginning of the study all the children and their parents were invited to participate in the videotaping of
practice. Before the taping commenced, the 27 parents and their children who agreed to participate were
interviewed in order to explain the purpose of the study and in order for the researchers to stress that the home
practice sessions should be as normal as possible, and representative of how each child generally practised his or
her instrument. After viewing all videotapes, seven children (three females and four males) were selected for the
analysis reported here. The rest were excluded because they were irregular with videotaping of practice or because
the child's behaviour appeared to be unduly influenced by the recording situation. Two were novices, three had
learnt another instrument (e.g., piano) which they ceased playing before joining the school instrumental program,
and two were continuing to play piano while beginning their new band instrument. The sample consisted of two
trumpets, two clarinets, and one flute, saxophone and cornet.
Tapes of 14 practice sessions undertaken in Year 1 of the study (two for each of the seven children) and 10
sessions from Year 3 (two for each of the five children still learning) were selected for analysis. They were coded

using the software package, The Observer (Noldus Information Technology, 1995), which allows the researcher to
play the videotape at various speeds through a computer interface, and to use various 'channels' to code behaviour.
This process provides highly rigorous data that can be revisited by repeatedly viewing the videotape, although this
rigor comes at a high cost in terms of research time: a 10-minute practice session can take up to 5 hours to code.
Results and Discussion
Results for the analyses can be discussed according to the six self-regulatory processes as defined by Zimmerman
(1994, 1998).
Motive
To understand this dimension of self-regulation, it is necessary to examine the degree to which the children feel
free to and capable of deciding whether or not to practise. Results from interviews before the children commenced
learning (see McPherson, accepted) show that they were able to differentiate between their interest in learning a
musical instrument, the importance to them of being good at music, whether they thought their learning would be
useful to their short and long-term goals, and also the cost of their participation, in terms of the effort needed to
continue improving. During the first year of learning, the children's initial motives and expectations for learning,
as measured by whether they predicted that they would play until the end of primary school, high school or into
their adult lives, coupled with how much practice they undertook, provided a powerful predictor of their
achievement. Children who made the least progress tended to express more extrinsic reasons for learning, such as
being part of the school band because their friends were also involved. In contrast, children who made rapid
progress were more likely to express intrinsic reasons, such as always having liked music or wanting to play
particular pieces for their own personal enjoyment (our presentation will highlight findings from the larger sample
relevant to this aspect of self-regulation).
Method
The method dimension focuses on how the children practised, in terms of the essential conditions that allowed
them to choose or adapt a particular method when they practised. Statistics generated by The Observer revealed
that most of the children's practice consisted of simply playing the piece through without any other strategy being
used (see Table 1 - Year 1: 94%; Year 3: 96%). This type of playing was accompanied by foot-tapping for 4% of
the time in Year 1, declining to 2% in Year 3. It was also introduced by counting the beat aloud in 1% of the time
in Year 1, but this behaviour had disappeared by Year 3. Other strategies such as singing, silent fingering, and
silent inspection of the music each accounted for less than 2% of the total time in both years. No evidence of
chanting or using a metronome was observed.

Interviews with the instrumental music teachers revealed that the standard advice about practice given to the
students was to work for 15-20 minutes 5 days in the week, and that this should consist of repeating pieces and
exercises until a degree of fluency is reached. Contrary to this advice, the vast majority (Year 1: 90%; Year 3:
92%) of playing time was spent playing through a piece or exercise only once. Although the children would
occasionally stop and repeat a small section after an error, as soon as they finally reached the end of the piece they
seemed content to move on to another task. This trend was remarkably stable across the 3 years (see Table 1). As
a result, there was virtually no evidence of the deliberate practice strategies which typify expert musicians. For
these children, practice involved a rather superficial coverage of performance literature, with little evidence during

the first three years of the types of self-regulatory strategies that would enable them to more efficiently control
their own learning.
Time
How children plan and manage their time has important implications for how efficient their practice will be. In
Year 1, only 73% of the students' observed videotaped practice was spent playing their instrument. This
percentage rose to 84% by Year 3, suggesting that these five subjects were beginning to spend their time more
efficiently. As shown in Table 1, the vast majority of this playing time was spent on repertoire (Year 1: 84%; Year
3: 93%) with approximately equal time spent on ensemble parts and solo pieces. Technical work (scales and
arpeggios) took up the remainder of playing time (Year 1: 15%; Year 3: 7%), while the presence of playing by
ear, improvising, and playing from memory was negligible. This pedagogically unbalanced 'diet' (McPherson,
1998) is surprising, and reveals that the 'informal' practice found by Sloboda et al. (1996) in more experienced
young musicians had not yet emerged in this group of beginners.
Interestingly, the remainder (Year 1: 27%; Year 3: 16%) of the children's practice time was spent on non-playing
activities. These activities show an interesting pattern of change with skill acquisition. Time spent looking for
printed music to play rose from 45% of non-practising time in Year 1 to 76% in Year 3. Time spent talking or
being spoken to fell from 32% in Year 1 to only 8% in Year 3, mostly as a factor of the reduced presence of other
people in the room in the later sessions. Between Year 1 and Year 3, daydreaming fell from 4% to 3% of
non-playing time, responding to distractions fell from 4% to 2%, and expressions of frustration fell from 3% to
1%. Time spent resting between pieces rose from 3% of non-practising time in Year 1 to 6% in Year 3, possibly as
a factor of the longer pieces played at this stage.
Table 1 also reveals marked differences between individuals. For example, in Year 1, the least efficient learner
spent only 57% of his time actually playing, while the most efficient learner spent 82% of his time practising.
Research in academic subjects shows that many children actively avoid studying or use less time than allocated
(Zimmerman & Weinstein, 1994). This was also true in our analysis of the videotapes. For our least efficient
learner, 21% of his total session time was spent talking with his mother about his practice tasks in a highly
unfocussed manner, where the child's repeated errors became the primary focus, and a source of considerable
frustration. With some children, there was a high level of reference to the time, with frequent behaviours such as
calling out to a parent to ask if they were "allowed to stop yet". For our sample it appears that a minimum time
limit was often enforced, yet the efficient use of that time was not.
Performance Outcomes
A typical self-regulated approach to practice involves an ability to react by choosing, modifying and adapting
one's playing based on the feedback obtained when performing. We chose to assess this type of performance
outcome by analysing the nature of the children's errors (Palmer & Drake, 1997). Our two trumpet-players had to
be eliminated from this analysis because their pitching was too inaccurate: using aural analysis alone meant that
only the clearer pitching of beginner woodwind instruments and the cornet could be assessed. Clear pitch and
sound-production errors were coded; no attempt was made to assess rhythmic accuracy, which varied enormously
among the sample.
As Table 1 shows, nearly half the errors made by subjects in the 1st year of learning were ignored, which points to
a general inability by the children to self-regulate the accuracy of their music reading. The attentional demands of
learning an unfamiliar instrument, together with the considerable cognitive challenge of learning to read music at
the same time, left the students largely unable to verify their own accuracy. However, there were very large
individual differences between subjects in their self-regulation of accuracy. Table 1 shows the total errors per
minute and the ignored errors per minute for the two subjects with the highest (KR) and lowest (WD) error rate. It
reveals how, while the subject KR makes many more errors than WD, she also ignored a far higher proportion of
these errors than WD did. WD's regulation of his own accuracy was remarkable in Year 1 and also when we
analysed his practice in Year 3. Most notably, his rate of improvement is very high on the second run-through of a
piece: in Year 1 his error rate fell from 1.4 per min on the first run-through to 0.6 per min on the second
run-through, suggesting that he possessed an outstanding ability to retain a mental representation of his
performance between run-throughs, and to use this as a basis for learning from his errors. In Year 3, the same
phenomenon prevailed with WD. Table 1 shows the ratio of errors on the second run-through divided by the error

rate on the first run-through. Although the frequency of WD's errors had risen (Year 1: 1.4; Year 3: 6.7) because
of a steep increase in the difficulty of the repertoire he was playing, the error rate on the second run-through of his
practice was only 34% of that on the first.
Such large individual differences in children's ability to self-regulate the accuracy of their playing can partly be
explained by considering the enormous demands placed on working memory for children simultaneously learning
to read notation, to manipulate the keys or valves on their instrument, and to adjust their embouchure according to
aural feedback. The tradition from which these students come places great importance on learning to read notation
from the first lesson, and for many of them, there is insufficient opportunity to learn to associate their nascent
aural schemata with the notation. The most accurate students in the study were relieved of this high cognitive load
because they had learnt how to read music on instruments such as the piano or recorder before starting on their
band instrument. The seven children fell into three clear groups concerning prior learning, and these corresponded
clearly to their ability to monitor their playing. The two children who had previously learnt the piano and were
continuing made an average of 2.6 errors per minute in Year 1; those who had previously learnt an instrument but
had discontinued averaged 7.3 errors per minute; and those that were complete beginners (the two trumpeters)
made too many errors to count. Thus, the children with prior experience in learning another instrument, and for
whom reading had become to a certain extent automatised, displayed a more refined ability to monitor and control
their own playing.
Physical environment
Self-regulated learners are aware that their physical environment should be conducive to efficient learning. There
was a wide range of locations chosen by the children for practice, ranging from the privacy of a bedroom to a
shared family space. Some children would appear in different rooms in different sessions, suggesting that they
were choosing a quiet space according to the family situation on the day. This appeared to give the children access
to help from other family members when they needed it, but also meant that some needed to spend some of their
practice time coping with distractions from siblings, pets, and a television in the next room. Data obtained from
both the videotaped practice sessions and child/mother interviews, shows that the physical environment was
mostly well-equipped with a music stand and an appropriate chair (on the videotapes only one child stood while
practising). However, differences between children were noticeable. One child practised (in his pyjamas) while
sitting cross-legged on his pillow with the bell of his trumpet resting on the bed. The poor posture of this young
learner could be contrasted with some of his peers, who were more capable of holding their instrument correctly
while sitting or standing with a straight back and suitable playing position.
Social factors
When faced with difficulties, self-regulated learners actively seek help from knowledgeable others. The
observation of family involvement reveals a rich pattern (see Table 1) with a noticeable decline in the
participation of parents between the 1st and 3rd years of learning. In Year 1, one or both parents were present in
the room for 65% of the observed time. (This level of participation may, of course, have been affected by the role
some parents took in being a camera-operator). This time spent in the practice room further broke down into three
parental behaviours: 6% involved a parent 'teaching' the child (i.e. taking a very active instructive role). Another
12% of parental involvement can be described as 'guiding' (e.g. "What piece are you going to do first?"). The
remainder of the time (81%) was spent 'listening' less actively again. A large amount of maternal involvement
with some of the children consisted of bolstering motivation and delivering praise ("That sounds fantastic!").
Discussion between parent and child about appropriate practising strategies was found in only one subject, and
this was highly argumentative - certainly falling outside of the parental involvement that might be called
"autonomy-supportive" (Grolnick, Kurowski, & Gurland, 1999). Nevertheless, by the 3rd year of the study, a
higher level of autonomy was observed, with parents present in only 22% of the time, and now almost exclusively
in a semi-listening but supportive capacity.
In Year 1, five of the seven children showed high usage of a practice diary in which the teacher had written down
set tasks. The two trumpeters, who showed poor monitoring of their errors, were not observed referring to a diary
at all. By Year 3, only two children continued to refer to their diary, possibly implying that the other three children
were capable of remembering what had been assigned by their teacher.
Conclusions

Zimmerman (1998) concludes that the self-regulatory processes identified here are distinguishing characteristics
of experts across a number of diverse disciplines that include music, sport and professional writing. He also
believes that they can be found, to a greater or lesser extent, in the early stages of learning. It can therefore be
speculated that musicians who display these characteristics early in their development will be more likely to
practise harder and more efficiently, will display a higher self-efficacy about their own capacity to learn, and be
more likely to achieve at a higher level. Early results from our interview and videotape research show that the
practice habits of the children we studied varied considerably and that there were important differences between
the children on each of the six self-regulatory processes, even from the very earliest practice sessions.
Our results lead us to conclude that a majority of our learners possessed the will to learn their instrument, but not
necessarily the level of skill required to ensure efficient and effective practice. By this we mean that the young
learners were typically excited about learning their instrument and came to their learning as optimistic, keen
participants. However, while their instrumental teachers were making them aware of what to practise, many had
very little idea of how to practise. An important implication therefore is that teachers should spend time during
their lessons demonstrating and modelling specific strategies that their students can try when practising, such as
how to correct or prevent certain types of performance errors. However, such strategies will be ineffective unless
the learners also develop their capacity to monitor and control their own learning. Consequently, teachers should
also devise strategies whereby learners can be encouraged to reflect on the adequacy of their own practice habits,
and especially on how they might invent better ways (such as self-reflective comments in their diaries) that will
help them practise more efficiently. Our preliminary findings suggest that the skills of knowing how to
self-monitor, set goals and use appropriate strategies take time to develop in young children. Helping children to
reflect on their own progress and ability to employ self-regulatory processes may go some way to improving
instrumental instruction, especially for children who do not pick up these skills implicitly.
Realising that our study only scratches the surface of the complex issues which surround the self-regulatory
behaviour of young musicians, we intend to build on the findings reported here in order to construct a more
detailed profile of the participating instrumentalists from the data yet to be analysed. At their most basic level, our
early results, combined with the extensive body of evidence found in academic learning confirms that the six
self-regulatory processes are used to greater or lesser degrees in young musicians as a means of improving
performance. Every time a young musician self-initiates practice, consciously plans what to practise, chooses to
correct their performance, structures their learning environment, or actively seeks information from
knowledgeable others, they come one step closer to refining the self-regulatory processes that will eventually
become automatised. For researchers, the challenge involves expanding and clarifying these issues in a way that
will provide useful information that teachers can use to cater for the wide range of abilities which they encounter
in their everyday teaching.
Note
This research has been supported by a large Australian Research Council Grant (No. A79700682), awarded for 3 years in 1996.
References
Chaffin, R., & Imreh, G. (1997). "Pulling teeth and torture": Musical memory and problem solving.
Thinking and Reasoning, 3(4), 315-336.
Ericsson, K. A. (1997). Deliberate practice and the acquisition of expert performance: An overview.
In H. Jørgensen & A. C. Lehmann (Eds.), Does practice make perfect? Current theory and research
on instrumental music practice (pp. 9-51). Oslo: Norges musikkhøgskole.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review, 100(3), 363-406.
Grolnick, W. S., Kurowski, C. O., & Gurland, S. T. (1999). Family processes and the development of
children's self-regulation. Educational Psychologist, 34(1), 3-14.
Gruson, L. M. (1988). Rehearsal skill and musical competence: Does practice make perfect? In J. A.
Sloboda (Ed.), Generative processes in music: The psychology of performance, improvisation, and
composition (pp. 91-112). Oxford: Clarendon Press.

Hallam, S. (1995). Professional musicians' approaches to the learning and interpretation of music.
McPherson, G. E. (1998). Music performance: Providing instruction that is balanced, comprehensive
and progressive. In C. van Niekerk (Ed.). Conference Proceedings of the 23rd International Society
for Music Education World Conference (pp. 397-410). Pretoria: University of South Africa.
McPherson, G. E., & McCormick, J. (1998). Motivational and self-regulated learning components of
musical practice. In T. Murao (Ed.), Proceedings of the 17th Research Seminar of the International
Society for Music Education (pp. 121-130). Johannesburg: University of Witwatersrand.
McPherson, G. E. (accepted). Commitment and practice: Key ingredients for achievement during the
early stages of learning a musical instrument. Paper accepted for the XXIV International Society for
Music Education Research Commission, Salt Lake City, USA, July 10-15, 2000.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of
Music, 17, 95-109.
Nielsen, S. G. (1999). Learning strategies in instrumental music practice. British Journal of Music
Education, 16(3), 275-291.
Noldus Information Technology. (1995). The Observer, base package for Windows: Reference
manual, version 3.0 edition. Wageningen, The Netherlands.
Palmer, C., & Drake, C. (1997). Monitoring and planning capacities in the acquisition of music
performance skills. Canadian Journal of Experimental Psychology, 5(4), 369-384.
Sloboda, J. A., & Davidson, J. W. (1996). The young performing musician. In I. Deliège & J. A.
Sloboda (Eds.), Musical beginnings: Origins and development of musical competence (pp. 171-190).
Oxford: Oxford University Press.
Sloboda, J. A., Davidson, J. W., Howe, M. J. A., & Moore, D. G. (1996). The role of practice in the
development of performing musicians. British Journal of Psychology, 87(4), 287-309.
Williamon, A., & Valentine, E. (in press). Quantity and quality of musical practice as predictors of
performance quality. British Journal of Psychology.
Xu, J., & Corno, L. (1998). Case studies of families doing third-grade homework. Teachers College
Record, 100(2), 402-436.
Zimmerman, B. J. (1986). Becoming a self-regulated learner: Which are the key subprocesses?
Contemporary Educational Psychology, 11, 307-313.
Zimmerman, B. J. (1994). Dimensions of academic self-regulation: A conceptual framework for
education. In D. H. Schunk & B. J. Zimmerman (Eds.), Self-regulation of learning and performance:
Issues and educational applications (pp. 3-21). Hillsdale, NJ: Erlbaum.
Zimmerman, B. J. (1998). Academic studying and the development of personal skill: A
self-regulatory perspective. Educational Psychologist, 33(2/3), 73-86.
Zimmerman, B. J., & Weinstein, C. E. (1994). Self-regulating academic study time: A strategy
approach. In D. H. Schunk and B. J. Zimmerman (Eds.), Self regulation of learning and
performance: Issues and educational applications (pp. 181-199). Hillsdale, NJ: Erlbaum.
Back to index

Vazan
MENTAL MANIPULATON OF METER
Peter Vazan & Michael F. Schober, New School for Social Research
vazanp01@newschool.edu
Background
The fact that arbitrarily many rhythms can be constructed from a given meter,
and that arbitrarily many meters can generate a given rhythm, suggests that
rhythm and meter are independent. However, the fact that people do not
recognize the same rhythmic pattern presented in different metrical contexts
indicates that rhythm and meter are intricately related.
Aims
In this study we assess the psychological status of meter and its dependency on
the stimulus' accent structure. We do this by examining people's ability to
hear the same pattern in a new way by imposing a different ground (i.e., meter)
on an unchanging rhythmic figure.
Method
Fourteen participants first tapped 12 rhythmic variations (aligned and

syncopated) of a basic pattern (a cross-rhythm derived from the 3 by 2
polyrhythm). Then they were asked to tap 12 metrical variants (i.e., underlying
beats only) of that pattern, guided by instructions on a computer screen. The
meters to be imposed were either duple or triple, each presented in six
different alignments (phases) with the polyrhythmic pattern.
Results
Participants took much longer to impose new metrical structures than to play
rhythmic variations. This was especially true for syncopated alignments;
participants took four times as long to impose a new meter when the underlying
beats were not aligned with the sounding events as to tap a syncopated rhythm.
Some participants were not at all able to impose a new meter on some trials.
Conclusions
The difference between constructing arbitrarily many rhythms from one meter and
generating arbitrarily many meters from one rhythm is a principled one--it is
the difference between what is embedded and what embeds (i.e., the framework).
Our findings that people have trouble imposing new meters in the absence of
facilitating acoustic cues show that pattern structure provides strong
constraints on how a sequence is perceived. This suggests that we do not
arbitrarily choose and construct our perceptual frameworks and cognitive
reference points, but rather that they are given and structured by our
environment and the world around us.
Back to index
file:///g|/Sun/Vazan.htm (1 of 2) [18/07/2000 00:28:04]

Vazan
file:///g|/Sun/Vazan.htm (2 of 2) [18/07/2000 00:28:04]

The pragmatics of conducting: Analyzing and interpreting conductors' expressive gestures
Proceedings paper

Richard Ashley, School of Music, Northwestern University,
711 Elgin Road, Evanston, IL 60208 USA
Introduction
This paper presents some of the theoretical background and initial results of a large, ongoing project
in cognition and conducting being carried out at Northwestern. Our primary interest lies in
understanding the ways in which conductors and ensembles work together to deal with musical
structure and produce musically expressive performances. We see part of this as research into musical
pragmatics--with how musicians use bodily expression and sound to communicate with one another
to produce effective frameworks for mutual musical expression. What is needed is some theory of
pragmatics which allows the analysis of ensemble performance to be effectively understood. One such
approach is provided by the work of the philosopher H. Paul Grice. Grice's proposals as to processes
of conversational implicature have been fundamental to a substantial body of research into pragmatic
processes in language and, in some cases, non-linguistic behavior. This paper applies Grice's work to
the relationship between conductor and ensemble, showing how these principles apply to this
seemingly-foreign domain and provide insights into many of the phenomena we notice in them.
Conductor/ensemble relationships reconsidered: initial considerations
Our starting point is that conductor-ensemble relationships are not adequately understood as a
one-way path, where the conductor sends directions to the ensemble without interacting with the
behavior of the musicians with whom s/he is working. Rather, the relationship between conductor and
ensemble should be conceptualized as responsive and cooperative. The following quote from a
prominent conductor supports such an approach:
"A conductor and an orchestra at work are a partnership. .... When the equality of
partnership is in question, the orchestral musician will 'help' the conductor but...will NOT
comply with the conductor's directions. ... Thanks to the musicians, the orchestra
willingly supplies the missing ingredients for the conductor and shapes a performance
well beyond the conductor's capabilities. Experiencing what she/he hears, the conductor
believes she/he is responsible for the performance the orchestra creates." [Farberman
1993, p. xiii]
Interactive communicative processes of this kind are found in different kinds of situations; these
include dialog, conversation, and negotiation, among others. A large number of approaches to the
analysis and understanding of these kinds of interactive processes have been developed; it is beyond
the scope of this paper to even outline these. However, two issues bear immediate mention: temporal
relationships between participants and modality of communication between participants. Let us
address these in turn.
One primary difference between conductor/ensemble interactions and typical dialog, conversation, or
negotiation is found in the temporal and sequencing interactions between the participants. In
file:///g|/Sun/Ashley.htm (1 of 10) [18/07/2000 00:28:06]

conversation, participants most often interact by taking 'turns,' and concurrent verbalization with two
persons is rare; in conductor/ensemble frameworks, 'both' participants are producing their outputs
simultaneously under most circumstances. (There are other kinds of activities where these kinds of
simultaneous interactions may be seen: dance, some kinds of sports--rowing, football, and others--and
physical theatre come to mind. We lay these aside for the time being, despite their clear interest to the
study of musical interactions.)
A second comparison with language is instructive. In conversation, both parties use verbalization as
their typical means of interaction. In conductor/ensemble frameworks, the conductor contributes
primarily by means of gesture, and the ensemble contributes primarily by means of sound. In fact, this
two-modality interaction may even be an asset in coordinating synchronized activity. Our overall
framework is one of a closely time-coupled feedback or contribution system in which each
participant's output is closely monitored by the other, who in turn modifies his/her activity and who
contributes this modified output back to the interaction.
Conducting and pragmatic analysis: what we can learn from studies in linguistics
Conductors produce a large number of physical gestures--facial expressions, hand motions, moves
involving the entire torso--which we hypothesize are related to musical structure and musical
expression in some way(s). The question is: how are these gestures to be interpreted? Our approach is
to interpret conductors' gestures with frameworks derived from studies in linguistics. Therefore we
will begin by looking at how researchers in linguistics have related gesture to language.
Physical activities of conductors
A conductor's visual output to the ensemble incorporates a number of interrelated but separate
'streams' of information. At this time we identify four streams (facial expression, handshape, hand
movement, and torso placement and movement), but classify all of these as 'gesture' for simplicity.
We are particularly interested in those gestures which conductors make which are not explicitly
related to giving information about meter and tempo (that is, not related to 'keeping the beat'). This
paper focuses on handshape and hand movement, specifically with the non-baton-holding hand
(typically the left hand) and with facial expression.
Categorization of types of gesture: 'Kendon's continuum'
The study of gesture's relationship to language is being pursued by a relatively small group of
scholars. One of the foremost of these researchers is David McNeill, of the University of Chicago; his
book Hand and Mind (1992) is a summary of years of investigation into the relationship of gesture
and language to thought. One of the topics of the book is the way in which gestures can be
categorized,. McNeill begins with what he calls 'Kendon's continuum' (after Kendon 1988). This
involves placing gestures at some point along a line, represented by the following schema (McNeill
1993, p. 37):
Gesticulation > Language-like gestures > Pantomimes > Emblems > Sign languages
McNeill describes the traversal of this line, from left to right, as having certain characteristics: "(1) the
obligatory presence of language declines; (2) the presence of language properties increases, and (3)
idiosyncratic gestures are replaced by socially regulated signs" (ibid.). Thus, gestures may be
categorized by their independence from spoken language, their independently-perceived linguistic
properties, and the relative presence or absence of criteria of well-formedness for some gesture. Let us
now proceed to consider the relationships between this framework for understanding gesture and
speech and the connection between conductors' gestures and ensembles' musical sounds.

Kendon's continuum, conducting, and sign languages

Let us attempt to define where conductors' gestures may be placed on Kendon's continuum. It is
easiest to begin with the right-hand side of the continuum: no analog between conductor gesture and
sign languages exists. If there were such an analog, it would seem to involve gestures encoding all of
the information contained in a piece of music, such as that found in a score. From these gestures, a
'reader' conversant with them would be able to play or transcribe the piece in a materially substantial
fashion without further mediation. Even the use, by choral conductors, of Kodaly hand-signs for
solfege syllables do not encode enough information to serve in this manner.
Conducting 'emblems'
As we move from right to left along Kendon's continuum, we need to consider if some conductors'
gestures function as 'emblems.' These are gestures which have highly conventionalized,
context-independent meanings, and which function as 'signs' but are not combined in grammatical
ways to form larger linguistic structures. To what degree are expressive gestures of conductors
interpreted by a system of conventional meanings? In other words, how many of the gestures have
meanings which are relatively fixed, or lexicalized, and need little or no pragmatic apparatus for their
interpretation? The answer is: relatively few. This corpus of gestures is composed of those moves
which can be accurately interpreted in the absence of sound--for example, in a videotape with the
sound turned off. A summary of these lexicalized gestures (termed 'emblems' in the speech/gesture
literature--cf. McNeill, 1992 pp. 56ff) is given below.
● Crescendo: left hand raised, palm up
● Decrescendo (diminuendo): left hand lowered, palm down
● Cutoff (release): left (or right) hand makes a tight loop with a sudden ending
● Cue (for entrance of a part): eye contact with player(s) to be cued, then a pointing or
downward-stroking motion.
However, these descriptions are subject to more localized difference, raising the question of standards
of well-formedness for these emblems; in addition, the degree of change called for in dynamics
requires interpretation. This will be considered in more detail in a subsequent section of this paper.
Pantomime and 'language-like gestures' in conducting
McNeill defines pantomime as depicting objects or actions, without the need for co-occuring speech
(McNeill p. 37). Conducting treatises rarely mention behaviors which resemble pantomime, and when
they do, it is with disdain. Consider the following (from Farberman, p. 27): "The Shakers shake the
baton throughout a beat pattern or in the direction of individual players. They are especially addicted
to prolonged shaking on holds. (Why don't the players reciprocate by shaking their instruments at the
conductor?) The sound scatters as a result of the very wide vibrato employed by the observant
orchestra, a direct result of the shaking baton." Other kinds of pantomime can be observed in
conductors, especially mouthing the words of texts or pantomiming the bowing of stringed
instruments; nevertheless, such behavior is largely unmentioned in treatises. Moving leftward along
Kendon's continuum from pantomime, there seems to be no analog to the language-like gestures
defined by McNeill, where a gesture fills a slot for a term in an otherwise auditory speech stream (for
example, "the parents were all right, but the kids were [gesture]" (McNeill p. 37).
"Gesticulation" and conducting

The leftmost edge of Kendon's continuum includes gestures which are typically co-generated with
speech, have no conventional meanings, and are highly dependent on context for their interpretation.
Many of the 'expressive' gestures of conductors seem to fall into this category; there are relatively few
standards of well-formedness at work. Rather, these kinds of gestures, which are the focus of
McNeill's research efforts, "...are free to incorporate only the salient and relevant aspects of the
context. Each gesture is crated at the moment of speaking and highlights what is relevant..." (McNeill
p. 41). These encompass some of the most interesting and least discussed of conductors' gestures.
Grice's Cooperative Principle and maxims: A theoretical approach to pragmatics
One highly influential theory of linguistic pragmatics was laid out by the philosopher H. Paul Grice.
The fundamental components of Grice's theory were presented at Harvard in 1967 as the William
James Lectures and published piecemeal over a number of years (especially Grice 1975, 1978; for a
rather more readable discussion, see Levinson 1983, Ch. 3). These lectures deal with the gaps that
occur in conversational interchange if it is understood only in terms of the logical syntax and
semantics of the statements made by the participants. In order to show how people 'make sense' of
conversations where logic alone would be insufficient, Grice describes what he calls the 'Cooperative
Principle' (CP), which is a generalized proposal for understanding how persons interact in rational
ways. The CP states that one should behave so as to:
Make your contribution such as is required, at the stage at which it occurs, by the
accepted purpose or direction of the talk exchange in which you are engaged.
Grice then gives a set of four 'maxims,' some of which have submaxims, which elaborate on how this
CP can be implemented. These are:
The maxim of Quality: try to make your contribution one that is true, specifically:
(i) do not say that which you believe to be false
(ii) do not say that for which you lack adequate evidence
The maxim of Quantity
(i) make your contribution as informative as is required for the current purposes of the
exchange
(ii) do not make your contribution more informative than is required
The maxim of Relevance: make your contributions relevant
The maxim of Manner: be perspicuous, and specifically
(i) avoid obscurity
(ii) avoid ambiguity
(iii) be brief
(iv) be orderly
These provide the basis for Grice's analysis of how speakers can go beyond the actual words and
forms of sentences they hear, by a process of implication. Implication allows one to make sense of
interchanges such as the following:

Bill (standing next to an inoperative car): I'm all out of gas.

Al: There's a gas station just around the corner.
In logical terms, the second speaker's statement does not directly deal with the predicament posed by
the first speaker, and in fact seems to be in violation of the maxim to 'be relevant,' but Bill can make
use of his belief that both speakers are observing the CP, so he interprets Al's statement to be in fact
relevant and is able to see thereby that the answer to his problem is to be found in the garage around
the corner.
Applying Grice's pragmatic theory to conductor/ensemble situations
Grice's theory has been widely discussed in linguistic circles for years. To the best of our knowledge,
only one attempt has been made to apply it to musical situations (in Lerdahl and Jackendoff 1983, pp.
309-311, to a very different purpose). It behooves us to consider the ways in which the CP and
maxims might help us understand conductor/ensemble interactions.
Does the Cooperative Principle seem to 'fit' the ensemble performance situation?
First of all, does the CP itself seem to fit with the typical conductor/ensemble activity? It is clear that
in most circumstances conductors and ensembles join their efforts willingly to create a unified musical
experience which is to be shared with listeners. Both parties have adequate reason to cooperate, to
attempt to make their contributions appropriate and timely. So, along the lines of what Grice suggests,
a basic approximation of the CP appears to be in force.
Can this theory of language behaviors be applied to nonlinguistic phenomena?
This is an excellent question which, fortunately, Grice himself goes out of his way to answer for us.
He gives a number of examples of physical, cooperative activities (cooking together with another
person, repairing a broken automobile, and others) in which the CP and maxims are at work. We
therefore feel encouraged and justified in exploring another potential arena of application where
people work together toward musical common goals.
The maxims one by one: A close examination of implicature and conducting
In a short paper such as this one, we can only hope to lay out a few examples in which Grice's maxims
are more specifically applied to the conductor/ensemble framework. Many others might be suggested
as well. Let us take the maxims one by one and show how they apply to ensemble musicianship,
mostly by quoting from conducting treatises. Some of the quotes may in fact be related to more than
one of Grice's maxims.
The maxim of Quality (try to make your contribution one that is true)
"When a conductor faces an orchestra, she/he must know the score!" [Farberman, p. 174]
The maxim of Quantity (make your contribution as informative as necessary, and no
more)
"Every gesture the conductor makes should say something to the players." [Rudolf p.
240]
"All necessary musical information can be delivered by the baton. If special musical
refinements are addressed verbally, instructions should be delivered in precise musical
terms. ... Do not pontificate." [Farberman p. 174]

"do not let your gestures become too involved or confusion will result." [Rudolf p. 248]
The maxim of Relevance (make your contributions relevant)
"The genuinely inspired musical leader concentrates upon meeting the demands of the
music and of the orchestra; he has no time or energy for superficial gestures having only
audience appeal." [Rudolf p. 240]
The maxim of Manner (be perspicuous, and specifically avoid obscurity, avoid
ambiguity, be brief, and be orderly)
"The two extremes to be avoided are shyness and exhibitionism." [Rudolf p. 240]
"...the conductor's...technique...may be defined as a highly individualized craft to evoke
specific responses on the part of the players with the most effective gestures..." [Rudolf
p. 314]
"When and how to use the left hand are matters of individual taste, but it should always
tell the orchestra something essential. If the conductor uses the left hand continually, the
players will ignore it." [Rudolf p. 243]
A deeper look: examining the criteria for implicature to take place
As Grice states his framework, there are a number of conditions which must be met for implicature to
happen: cancellability, nondetachability, calculability, and nonconventionality.. Let us take these
conditions one by one and compare them to the conductor/ensemble framework. Only one musical
case per condition will be mentioned; in general, many parallel cases exist as well.
1) The implicature must be able to be canceled by the addition of information that undercuts the
implicature or makes it clear that a speaker is 'opting out' of the CP.
● What kind of additional information might cause an implicature from conductor expression to
be canceled? a variety of examples might be adduced. Take, for example, co-occurring gestures
which undercut the implied intent of some conductor gesture: a hand cue to enter is given at the
appropriate time for a certain performer awaiting a cue, but the conductor's gaze is directed
elsewhere. The implication that the waiting performer should come in is undercut, causing at
the least confusion and, in more extreme cases, some considerable degrees of ensemble chaos.
● A quote from Farberman is in order here: "Many beginning conductors are guilty of indicating
pulse with their head. It is a disruptive gesture that disarms the baton of authority by
supplying an alternative beat...A head pulse, or beat, most likely will not correspond with the
baton beat. Thus, at the the point of attack, the conductor offers the orchestra a choice of two
pulses. Most often the result is a weak, imprecise orchestral entrance." [Farberman p. 8,
emphasis in the original]
2) The inference in the implicature must inhere in the semantics of the statements themselves and not
just their surface forms (Grice's is 'nondetachability'). The use of synonyms in place of the exact
words of an utterance should not undermine the implicature.
● With regard to nondetachability, the application to conductor/ensemble interaction seems more
problematic, as the semantics of the various gestures themselves are often unclear.
● Again, from Farberman: "Is there a correct physical [i.e. conductor's gestural] response to a
musical problem? A single 'correct' musical/physical solution to a musical problem does not
exist. ... It is a given that any two conductors confronted with the same musical problems will

view them differently and devise distinct musical, thus physical, solutions. Even conductors
who could agree fully on the musical meaning of a score would produce dissimilar results
because of their individual motor and muscular skills and unique body structures. ... Conductors
must think of stroke choices just as string players think of bowing possibilities, there are
generally several solutions for most problems. In theory any baton stroke can be used for
any solution, so ALL strokes may be 'correct.' But in practice, the 'correctness' of the stroke
depends on who chooses what stroke and when, and how and to what effect it is used."
[Farberman p. 178, emphasis in the original]
3) Implicatures must be calculable; that is, given some unclear communication, one must be able to
construct a chain of explanation, using what is given in the gestures, and the CP and maxims, to show
how a reasonable interpretation preserving the CP is to be made.
● One might make the case that syncopation provides a good example of a musical context which
shows how calculable implicatures might be made. This example is a little more involved than a
simple quote but is, we hope, illuminating.
● In The Grammar of Conducting, Max Rudolf discusses syncopation in the following way:
"Syncopated passages without accents require no special beat. The gestures must be very
definite and the rhythm steady. You must beat, so to speak, between the notes, not on them. ...
Syncopated notes with accents are indicated on the preceding beat, which is staccato. The
sharpness of the beat increases with the degree of the accent. In contrast with an ordinary
accent, which is on the count, this staccato beat is not prepared. The beat itself is the
preparation for the syncopated note that comes after the count. Again, never beat the
syncopation, beat the rhythm!" [Rudolf, 1949, pp. 207-208]
● What is the player to make of this situation? normally the increased emphasis on the beat would
indicate that the note on the beat would receive an accent. However, in in this case there may be
no note initiated on the beat itself (the example given in the Rudolf text, from Stravinsky's
L'Oiseau de Feu, has only tremolos on the beat, giving no overt sense of rhythm). The chain of
reasoning by the players might run something like this: (1) The conductor is telling us to accent.
(2) However, there is no note to accent at the place s/he is indicating [causing the player to
wonder if the maxim of Relevance--or possibly that of Quality--is being violated]. (3) This
conductor knows what s/he is doing, so the indicated 'accents' must have something to do with
our musical purpose [the player is preferring to believe that the CP is still in force and that the
maxims of Relevance and Quality are somehow still being preserved]. (4) Therefore, it makes
most sense to infer that the accents are simply displaced from the nominally-indicated metric
position,and should be applied to the intervening notes that are on the normally-unaccented
positions [since to show accents on those positions would require indicating twice as many
beats as the conductor is currently giving us, and that would be a violation of the maxim of
Quantity, by giving more information than is really necessary, since the conductor knows we
can make the appropriate implicature to arrive at the correct interpretation ."
4) The meanings understood in implicature are nonconventional--that is, they are not part of the
conventional meanings of the words themselves (not part of the literal meanings of the words, but
derived from them through the CP and maxims).
Only gestures which are fully lexicalized would have meanings so governed by convention that they
needed no contextual interpretation; the discussion of dynamic change to follow in the next section
will explicate this further.
Two test cases for pragmatics and conducting

Having assembled all this theoretical machinery, it is time to apply it to musical situations and see
what results. The method we use employs videotapes of conductors in a variety of manners. In a
paper, of course, this method is difficult to convey. Interested readers may contact us for the actual
examples used and more detailed analysis of the examples. In the following we look at processes of
implicature in two cases: interpreting a dynamic-change gesture and interpreting a pantomime gesture.
Case 1: Dynamic alterations and implicature
Gestural patterns indicating changes in dynamics are a mainstay of the conductor's art. Commonplace
as they are, there are a few interesting pragmatic issues surrounding the communication of dynamic
changes between conductor and ensemble. One of these seems to be related to what Levinson,
following Gazdar, calls scalar implicature (Levinson pp. 132-136; Gazdar 1979). Briefly put, the idea
is this: when someone says "John wasted some of his money at the casino," the implicature is that
John did not waste ALL of his money at the casino, even though in a strict logical sense that could be
true as well. Or, if say "John has three children," one might reasonably interpret the sentence as "John
has three children and no more," when in fact if John had eleven children, the statement that he has
three children is still true. Given some ordered scale of terms, such as <none, few, some, many, all>,
to state one of these terms implicates "this level and no more" of that to which the term is being
applied, by means of the maxim of Quantity. Such inferences are the stuff of which scalar
implicatures are made.
We can make a connection to interpreting conductors' gestures regarding changes in dynamics within
the same framework. Our ordered scale of dynamic levels might be <pp, p, mp, mf, f, ff>. Given a
starting dynamic level, under most circumstances a conductor's crescendo or decrescendo gesture can
be interpreted to mean that the ensemble should move from the current dynamic level to the next
higher or lower level, on the same basis as that used for the scalar implicatures just described.
However, this is a matter of inference. Max Rudolf observes, disapprovingly, that "Many players have
a tendency to play loudly at once when they see 'cresc.' and softly when their parts indicate 'descresc.'
or 'dimin.' (Rudolf, p. 60)." This indicates that, to his mind, such players are behaving improperly,
when in fact a subito forte or subito piano would be one not-forbidden interpretation of the dynamic
indication. How can this be? Probably to Rudolf, the crescendo or decrescendo indicator is a scalar
indicator, from which the implicature is something like "from our current level, say mf, make a
decrescendo, which implies mp, and not more change than that." The player giving a more abrupt
change in the dynamics is, literally and logically, not wrong--but has failed to produce an appropriate
implicature.
For testing this implicature we used a videotape of a conductor leading an orchestra in a passage by
Mozart. At one point a decrescendo is desired, and the conductor uses the typical palm-down gesture
to indicate this. At this point in the soundtrack for the video, two different versions were produced.
One was the original audio as recorded by the orchestra, and the other was a modified version where
the decrescendo was more in the nature of a subito piano. A group of 8 musicians experienced in
ensemble playing was shown both versions and asked which one most closely reflected the
conductor's intentions, and why. All of the musicians remarked that both were possible interpretations
but that the less-marked version (the original) was preferable. When asked to give the reasons for their
choice, most gave as their reason that the motion made implied a modest decrease in volume but not a
sudden one, lending support to the notion of scalar implicature as experienced in musical
performance.
Case 2: 'Overacting' and implicature
Conducting treatises make many comments on how much guidance a conductor should give an

ensemble, usually to the effect that one should be economical, to the point, and specific in gesture and
word (Farberman, Chapter 27, and Rudolf, Chapter 31, are typical examples). One might see these as
specific applications of the maxims of Quantity and Manner. In fact, many of the behaviors which
musicians specifically dislike in conductors can be traced to conductors' violations of these maxims,
in the general domain of 'overacting.' Let us consider one such case from one of our videos.
In this instance, the conductor is mouthing the words to the music (a large choral/orchestral work of
Mozart). This kind of gesture falls under the category of 'pantomime,' as it mimics the actions of the
chorus. In this instance the maxim of Relevance is being observed as far as the chorus goes, but this is
not the case with the orchestra. One might predict, then, that orchestra members would respond more
negatively than chorus members, because of their somewhat differing interpretations of Relevance.
However, even for the chorus, another maxim might be violated here: Quantity. Mouthing the words
to the chorus is almost always unnecessary (and universally derided by teachers of conducting,
although one sees this behavior even with very prominent conductors); it violates the injunction to
make contributions no more informative than necessary. There may be a related phenomenon at work
as well; there is research which indicates that speakers have a tendency to gesture, but not listeners.
When two participants in a conversation trade roles, the one who had been speaking typically stops
gesturing, only to resume again when s/he adopts the role of speaker again. Thus, for a conductor to
constantly engage in large amounts of gesturing, over and above that which is necessary, might be
interpreted as not sharing responsibility adequately with the orchestra.
To test these implicatures, we played this videotape for the same group of ensemble musicians and
asked them to describe their reaction to the conductor. As predicted by the violation of the maxims of
Quantity and Relevance, all members of the group indicated their dislike for the conductor's mouthing
of the words. When asked why they disliked this, there were two answers: the first was that for the
singers it was unnecessary, and for the instrumentalists it was unrelated to their parts.
Concluding remarks
The ways in which linguistic theories related to study of gesture co-occuring with speech and to the
interpretation of incomplete or uncertain parts of dialog have been shown to have relevance to
understanding the ways in which conductors and ensembles communicate with each other. This paper
has focused on only a few aspects of this, with many other dimensions left for a more extensive
discussion. Our view is that the theories and methods developed in linguistics have much to offer the
study of musical behaviors, especially with regard to interactions between musicians.
References
Farberman, H. (1997) The Art of Conducting Technique. Miami: Warner Brothers.
Gazdar, G. (1979) Pragmatics: Implicature, Presupposition, and Logical Form. New York: Academic
Press.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan, eds., Syntax and Semantics, vol
3. New York: Academic Press, pp. 41-58.
Grice, H.P. (1978). Further notes on logic and conversation. In P. Cole, ed., Syntax and Semantics,
vol. 9. New York: Academic Press, pp. 113-128.
Lerdahl, F., and Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge,
Massachusetts: MIT Press.

Levinson, S. C. (1983). Pragmatics. Cambridge, England: Cambridge University Press.

McNeill, D. (1992) Hand and Mind. Chicago: University of Chicago Press.
Rudolf, M. (1949) The Grammar of Conducting. New York: G. Schirmer.
Back to index

MUSICAL PERFORMANCE AS
Proceedings paper
MUSICAL PERFORMANCE AS INTIMATE-SELF-DISCLOSURE
Jorge Salgado Correia

e-mail: jorge.salgado@mail.telepac.pt
INTRODUCTION
Charles Rosen's statement 'musicology is for musicians what ornithology is for the birds'
reflects how, in our culture, musical analysis and musical performance diverge. Analysis
tends to describe music as pure sound relations whilst performance may be regarded as an
embodied shareable meaningful action-device: The two forms seem to result from very
different cognitive operations or realms. Consequently, it is not clear to what extent analytical
knowledge of large-scale structure is important for performers to shape their own
performances; or, generally, what role analytical thought has in the creative process of
shaping musical performances.
We also often hear people saying that a particular musical performance was 'very intelligent'
or 'insightful', but what precisely is meant? Ask a number of people the question, and a
divergence between what they say, imagine and think about music will emerge. This range
of individual differences presents the teacher with a great challenge: How to encourage,
increase knowledge, assist development, yet allow an individual learner to preserve his or
her own ideas about music. Thus, Within the educational setting, clear goals and objectives
are necessary. To achieve these goals, a discourse which reflects our experience of music is
required. Musical works need to be submitted, like any other 'signifying system' or 'text' in
every culture, to hermeneutic processes in order to be understood, and discourse is, of
course, a necessary tool, if not an indispensable one, to these hermeneutic processes.
The need for the hermeneutic becomes obvious when one considers that a score itself,
though having a huge variety of indications (from composer's notes accompanying the score
to the title and expression markings), does not provide the necessary acculturation for one to
be able to play stylistically - that is, to be authentic to both the cultural style and the
individual interpretative style. Yet, the indetermination of the score provides a reason why
the interpreter can and should be creative.
In the existing pedagogical literature, the processes involved in conceiving, rehearsing and
performing the expressive elements of a musical work are not given serious attention, with
the emphasis in method books being on technical and formal aspects of playing an
instrument. Anecdotal evidence from a variety of music lesson contexts illustrate that
file:///g|/Sun/Correia.htm (1 of 6) [18/07/2000 00:28:08]

interpretation is discussed, but is mainly regarded as an account of the notation which draws
upon a large stock of standardized expressive effects acquired through stylistic imitation of a
stereotype - either coming from the teacher, a particular 'school' of playing, or a famous
interpreter.
The teacher's job is to engage the student in creative reading, translation and construction
processes, and, thus, to oppose the general tendency to encourage technically focused
practice, which seems to be potentially constraining. I believe that an expressive and
communicative engagement with music from the very beginning of practice teaches the
students to think expressively and to play intentionally. In the current paper, a series of
practical investigations where cultural symbols were used as hermeneutic triggers and
explored as a means of stimulating creativity in the student will be examined. The basis of
the approach was that the student was guided to develop her/his own ideas about
interpretation, rather than depending upon imitative models. These investigations will be
triangulated with data collected from class observation, interviews with internationally
recognised performers, and recent developments IN neuroscience. The main two arguments
of this paper are firstly, that music reveals the bodily origin of meaning, and secondly, that
the performer operates in the moment to make any particular performance more or less
meaningful to him/herself, co-performers and the audience.
In the case of music revealing the bodily origin of meaning, I propose that the focus of
discourse about music, at least in instrumental practice and teaching, should be taken out of
the usual context where either simplistic imitation or else forms of abstract theorizing are
used to convey how the interpretation should be achieved. As it hopefully will become clear
from the examples, teaching should be focused on the creative use of action or movement
metaphors and/or expressive physical gestures to ground the understanding of musical
gesture. Musical meaning is inseparable from its physical, presential and temporal
experience.
The performer operates in the moment. Given the 'here-and-now' of the performance
context, the motor programming achieved through practice and experience (both physical
and psychological) and the mental state of the performer during the performance ritual, I
propose that the performer is most likely operating in a concatenationist fashion (cf.
Levinson, 1997). When the performer is grounding his or her performance in expressive
actions and gestures emerging from their individual stock of human traits and experiences,
they are creating the conditions to function, as exclusively as possible, from core
consciousness (cf. Damasio, 1999). During their training, performers should also be
prepared to deal with this physical/psychological state.
NOTE
For all the practical work described in this paper, a number of qualitative research
methodologies were drawn upon: participant observation (myself as teacher and co-worker
with ten students as they prepared for examinations over a period of six months); video
analysis (systematic and critical reflections on recorded lessons and performances
commentaries provided by students and teachers); semi-structured interviewing, with
interpretative phenomenological analysis being used as a framework for data analysis (see
Smith, 1995).

RESULTS OF PRACTICAL OBSERVATIONS, TEACHING AND INTERVIEWING

STUDIES, DISCUSSED WITHIN A THEORETICAL FRAMEWORK
1. Music reveals the bodily origin of meaning
To perform is to make musical meaning. Making musical meaning is here understood to be
intimate self-disclosure - an intentional process where the musical sounds are 'invested' by
metaphorical projections, which are naturally grounded in past-lived experiences (expressive
actions and gestures emerging from the individual's stock of human traits and experiences).
From anecdotal experience and from the observation of musicians at work, both in
instrumental lessons and in rehearsal, exploring the sounds through the deliberate
exploitation of metaphors, making expressive decisions and engaging the body into
movements which are suggested by the chosen metaphors, seem to bring significance to
music and/or authenticity to performance. This findings suggest that movement images or
action-metaphors play a central role in the process of making sounds expressive or, in other
words, in the process of creating meaningful musical interpretations.
But, meaning must not be understood in the traditional view; instead, it "permeates our
embodied, spatial, temporal, culturally formed and value-laden understanding" (Johnson,
1987: 172). From the observation of musicians at work, in rehearsals, it becomes obvious
that, whatever the methods or approaches they use, their efforts always aim to reduce their
metaphors and/or conceptualizations to expressive gestures. The final result, the
performance, has to be an embodied shareable meaningful action-device.
From observations of flautist Patrick Gallois and many others in lessons or in Master Class
work, it is clear that they constantly use images or metaphors to express their musical
intentions. 'Tuning' the students' intentionality in every detail, in every phrase or section of
the piece they are working on, is the main concern of the masters when approaching musical
interpretation issues. This kind of approach, typical to performers, where the analysis of the
work is not drifting away from everything that makes music affective, moving, emotional and
expressive, must be carefully taken in consideration.
Performers are knowledgeable operators who turn written music into expressive sounds. By
their penetrating understanding of the scores, music becomes meaningful to them and to
their students and to their audiences. When observing top performers at work, in rehearsal
or teaching, the sharp distinction, referred by Kivy (1989) between technical and emotive
description of music, becomes obsolete. The performers' analysis is animistic, highly
pragmatic, mainly focused in achieving expressiveness. Their constant concern is: 'How am I
going to make this piece (or this section, or this phrase, or this note, or this effect) musically
more interesting'. Their approach to each work they are about to play is, from the very
beginning to its final presentation, an aesthetically oriented narration of dramatic action.
Instrumental class observation shows clearly how teachers communicate to their students
how they feel about the phrasing or the whole atmosphere of the piece: they use physical
gesture and body movement - specially for situations where expression has to do with
external/visible movement - and a wide range of metaphors, especially for situations where
expression has to do with 'inner motion' or 'inner reaction' (like fear, joy, excitement,
elevation, contraction, tenderness, tension build ups and relaxation, suspense or anger).
When rehearsing, performers make a decision, consciously or not, to adopt a particular

context or semantic field for the concerned musical work. It does not make any difference if
the context was chosen in the sequence of a complicated formal analysis or if it was just
suggested by free association. Inspired from the chosen context, performers use movement
images and action-metaphors in order to inject sounds with the right emotional content or, as
Trevarthen (1999) puts it, they use them to coordinate their emotional acting and its
channelling into an imagined narrative of purposes .
For each musical phrase or situation, they take the emotional content (that is, intrinsic
relations of movement and rest, speed and slowness, tension and relaxation, etc.) from the
metaphorical referent, as if abstracting it from the original context of experience in which it
was formally integrated. When applying this emotional content to the musical sounds they
make them expressive, but fairly abstract. Kendall Walton (1997) neatly describes this
process as creating 'a smile without a face'. The imagination of the active listener will work to
find a new face to that smile. It is in this sense that one may say that music reveals the
bodily origin of meaning.
From the observations and my own experimentation with students, it seems reasonable to
conclude that the students could easily focus on musical expression after negotiating the
musical material with their bodies, or, in other words, after embodying the musical meaning.
Suggested by the work-in-context, the action metaphors 'inspire' expressive actions and
gestures, which emerge out of our stock of human traits and experiences. To explore the
metaphors is to capture their physical qualities, their intrinsic relations of movement and rest,
speed and slowness, and simultaneously, to explore the flexibility of the musical material to
express these relations. The metaphors work like the way in to processes of symbolic
activity, linking both cognitive and bodily structures. Musical communication seems to
happen when it has this bodily basis. To teach musical interpretation is to teach how to
reach within for one's deep bodily structured experiences. In this sense, it is absolutely right
to say that music develops self-knowledge, or, strictly speaking, it involves knowledge
beyond the boundaries of the self.
2. The performer operates in the moment.

From observation and experience, it seems that once the e-motional content for each
situation in music is established, a period of training takes place: the performer works
repeatedly to articulate the emotional content all the way through the piece and to live and
'deliver' the same 'atmosphere', the same emotions every time he/she plays it. This way
he/she consolidates his/her interpretation. This training happens to be crucial to the peculiar
conditions of the performance situation: focused in the created network of emotions the
performer is preparing him/herself to be able to function, when performing, as exclusively as
possible from the core consciousness:
"Emotions and core consciousness tend to go together, in the literal sense, by being present
together or absent together". (Damasio, 1999:100).
So, in Damasio's terms, by focusing on the flux of the emotions, a process of being freed
from the extended consciousness conditioning occurs. Because of rehearsal, the performer
can operate in the concert situation almost exclusively from the transient core self, and be
freed from the established, rock-solid autobiographical self.
When performing, the performer concentrates on this sonic emotional network reproducing

what he/she decided in the rehearsals. However, as soon as he/she starts to play, the pulse
and emotional variation of movement is back in place with a strong feeling for the context
where the action happens, the 'here-and-now' of the performance. It is then that new
emotional variations may happen. Variations in emotional intensity, of course, but also
variations caused by the necessity of integrating new elements and factors which occur in
the real time of the performance.
Focused in the flux of emotions, performers, even if only for short periods of time, have
reported feelings of 'the self' being freed, feelings of acting spontaneously in a state of
euphoria, as though they are 'flying', 'taking off' or 'going with the flow', even going into a
form of trance, and so on (see Davidson, 1997, for further examples of such reports). 'The
self' could be described as flowing, a term Csikszentmihalyi (1990) uses. In this state: '...
people typically feel strong, alert, in effortless control, unselfconscious and at the peak of
their abilities'. Thus, the performer, gains new insights, becomes spontaneous and genuinely
creative in the moment.
To perform is, perhaps paradoxically, not so much to reproduce what was memorised in
rehearsal but to re-live - here-and-now - the devised emotional narrative. When performing,
the performer is both free to be surprised with what comes from the embodied automatic
actions assimilated in rehearsals, and to re-act spontaneously to eventual new happenings
coming not only from the outward context of the performance but also from its inward
context:
"The unconsciously generated emotions and motives are integrated with the discriminating
and strategic operations of consciousness, memory and skill, modulating them. Musicality
implicates and expresses both 'ergic' and 'trophic' representations in the moving mind or
spirit" (Wallin, 1991, quoted in Trevarthen, 1999: 165)
So it can happen that conscious and unconscious reactions to the 'here-and-now' context
take over, and that is when performers experience 'becoming' - a type of experience
described by Deleuze and Guattari (1980): their sense of self is suspended, and they open
themselves to the ground (to their multiple stocks of impressions, emotions, body motions,
etc.) with an acute sense of the here-and-now. Their account of time vanishing, their sense
of not being in control, of becoming one with the sound, and the dream-state comparison are
all clear signals of the process of self-effacement and consequent disclosure in their
grounded 'becoming'.
"I lost the feeling that I was controlling what was happening.... I became a single
sound and body moving. Curiously, the feeling of time passing vanished, and
there was no moving ahead. It was like the same instant was presenting itself
from many different perspectives. The performance moment was an action
without phases, and I was listening to the music but from a very, very distant
place within myself. Recalling what was going on in my mind during that
performance is as frustrating as trying to recall a dream from which you have just
awoken." Flautist reporting a concert experience...
Therefore, 'becoming' is an act of intimate disclosure. Along this line, becoming takes place
in music performances where the musicians take on the role of operators of the moment,
where expressive actions and gestures emerge, which are based directly on their stock of

human traits and experiences which might be considered as a ground beyond personality or
self-driven concerns.
"... le grand concert pour moi c'est un concert où je ne me souviens plus de ce
que j'ai fait... j'ai décollé... ça arrive quand tu es très centré... moi, j'arrive à le
provoqué chaque fois quand j'arrive à tuer mon ego..." (Patrick Gallois, quote
taken from an interview)
CONCLUSION
Thus, a meaningful musical performance is one which is grounded in and reveals its bodily
origins. It seems that what is really decisive in performance is the gratifying and convincing
'becoming' experience. If the discourse about music, at least in instrumental practice and
teaching, focuses on the creative use of action or movement metaphors and/or expressive
physical gestures to ground the musical gestures, then the students are preparing for both to
develop their own ideas about the works they are playing, and to deal with the
physical/psychological state of the performance, creating this way the conditions to
experience 'becoming'.
REFERENCES
Barthes, Roland (1977) Image-Music-Text , trans. Stephen Heath, London:Fontana
Paperbacks
Csikszentmihalyi, M. (1990). Flow : the psychology of optimal experience . New York:
HarperCollins Publishers.
Damasio, A. (1999) The Feeling of What Happens , Orlando: Harcourt Brace.
Davidson, J.W. (1997) The social in musical performance. In D.J. Hargreaves and A.C.
North The Social Psychology of Music. Oxford: Oxford University Press. (pp 209-228)
Deleuze, G. and Guattari, F. (1980) Mille Plateaux , Paris: Les ...ditions de Minuit.
Johnson, M. (1987) The Body in The Mind. Chicago: University of Chicago Press.
Kivy, Peter (1989) The Corded Shell , Temple University Press, Philadelphia
Smith, J.A. (1998) Doing interpretative phenomenological analysis, In: M. Murray and
K.Chamberlain (Eds.) Qualitative Health Psychology: Theories and Methods. London: Sage.
Swanwick (1999) Teaching Music Musically. London: Routledge
Trevarthen, C. (1999-2000) Musicality and the intrinsic motiv pulse: evidence from human
psychobiology and infant communication, in Musicae Scienciae, Special Issue, 155-215.
Walton, Kendall (1997) "Listening with Imagination: Is Music Representational?", in Music
and Meaning, edited by Jenefer Robinson, Cornell University Press, Ithaca and London
Back to index

Addessi
Proceedings paper
ON SEGMENTATION OF POST-TONAL MUSIC

Anna Rita Addessi, Roberto Caterina
addessi@muspe1.unibo.it
INTRODUCTION
The analysis of post-tonal music presents different problems compared to that of tonal music; various different procedures and methods have been
used to tackle these problems, both from the point of view of musical theory and analysis (Forte, Lerdahl, Narmour, Hasty) and of the psychology of
perception (Meyer, Imberty, Deliège).
The study we present here is part of the research proposed by the "Gruppo di Analisi e Teoria Musicale" (GATM), a group whose aim is to study
common procedures for the analysis of twentieth century post-tonal music. The group has recently launched a project to investigate the "macroform"
of such music. In this context the term "macroform" is used to indicate a higher system of segmentations which in turn contains segmentations of a
lower order (bibl.). It is necessary first of all to establish a very clear distinction between the concepts of "segmentation" and "macroform":
1. The term segmentation is used to indicate the exact point where two sections are separated: we are therefore dealing with a local
phenomenon, brought about by the presence of a contrast or discontinuity that involves one or more parameters of the musical material
(duration, dynamics, timbre, density, register, etc.);
2. The term macroform is used to indicate the result of a process of memorisation based on the division of a musical text into parts, each having
structural coherence and homogeneity. Such a division is not necessarily caused by the hierarchy of the segmentations: a strong segmentation
does not always, in fact, produce a division into parts; in the same way, it may happen that a division into parts does not coincide with a point
of strong segmentation.
One of the most important methods of research used by the group was the experimental study of perceived answers. The analysis of the results of
listening tests is an important tool for the creation of a theory about macroforms, since in this way the investigation does not simply follow the
already extensively trodden paths concerning the study of the rules of composition, but attempts to tackle the varied and complex problems involved
in actually listening to post-tonal music. In the post-tonal repertory the Nattiezian distinction between the poietic dimension and the aesthesic
dimension takes on a particularly important significance, as a great deal of the difficulties in comprehending this type of music arise from this point.
Generally speaking, the approach taken in most musicological literature tends to be one that favours the study of compositional techniques as
opposed to the analysis of listening strategies. It concentrates more on "what the composer did" than "what comes out of listening to his music". The
file:///g|/Sun/Addessi.htm (1 of 14) [18/07/2000 00:28:14]
Addessi
project set up by the GATM study group, on the other hand, aims at investigating both aspects at the same time, by comparison with the results of a
research method based on the analysis of perceived answers. In this context the results obtained from studies already performed on perceptive
analysis were an important point of reference, especially those carried out by Michel Imberty (1981, 1987) and Irène Deliège (Deliège 1989;
Deliège and El Ahmadi 1990; Deliège and Mélen 1997).
As far as the repertory is concerned, the investigation has concentrated mainly on string quartets.The choice to work on a timbrically homogeneous
repertory was dictated by the need to limit the number of variables, given the great variety of styles present within post-tonal music. So far analysis
(analysis of perception/ analysis of compositional techniques) has been made of the following pieces: A. Webern, String Quartet op. 5, first
movement; D. Milhaud, String Quartet, first movement; B. Maderna, String Quartet (1942) (Addessi and Caterina 2000).
AIMS OF THE STUDY

Our work aims, therefore, to present a comparison between the macroform analysis carried out within the GATM group and the macroforms
perceived by two groups of listeners, made up of musicians and non-musicians.
Another piece from the twentieth century quartet repertory, the 5th movement of the Quartet no.1 by G. Kurtág, was chosen for the study. Various
types of analysis were performed on the piece and these were presented and discussed within the group, with the final aim of producing a hypothesis
on segmentation and a macroform. Although different aspects of the musical structure were considered, the various analyses nevertheless gave quite
similar results as far as the division into parts is concerned. It was more difficult, however, to reach an agreement about the rules of segmentation.
In this article we will illustrate in particular the analyses that took into consideration the perceptive aspects and where the initial presumption was
that there would be no contradiction between the listening and the analysis. The analysis therefore regarded the application of the rules we
unconsciously apply when listening.
The hypothesis underlying this investigation is that the musical score can offer us clues which may explain (by means of data drawn from the
structural analysis) the reasons for the segmentations and divisions into parts that the listeners might propose.
ANALYTICAL/PERCEPTIVE HYPOTHESES
Two analytical hypothesis were made for the piece by Kurtág: the first regarded the segmentation points, the second regarded the macroform of the
piece. In this article we will present the results obtained about the macroform.
The analysis of the macroform poses different problems to that of the segmentations and especially that of memory. Although memory is also
necessary for segmentation, it is nevertheless a short-term memory, with only a few moments passing between one segment and another. In the case
of the macroform, however, it is a matter of medium- or long-term memory, encompassing the whole duration of the piece (Imberty1993). The
question is: when we reach the end of the composition have we mentally elaborated a possible division of the piece into parts? Are we able to say
exactly how many and which parts?
As far as the quartet by Kurtág is concerned, the division into parts appears to follow the principle of repetition. This principle is quite similar to the
Addessi
one underlying classical structures, even though instead of themes we are dealing here with ostinatos.
Analytical hypothesis about the macroform
On the basis, therefore, of the principal of the repetition of ostinatos and the homogeneous content of the sections, the analysists in the group
identified a first part featuring a rhythmic-melodic ostinato that begins in bar 4 and finishes in bar 11; a second part featuring the entry of a new set
of ostinatos beginning at bar 14 and ending at bar 18; a third part beginning at bar 19 with the entry of a new ostinato which goes on to the end. The
features of this last part are not, however, completely homogeneous; from bar 19 to bar 29 the ostinato motive characterising this part is only
repeated four times, the third time pianissimo, and is followed by five bars where it does not appear at all; on the other hand from bars 30 to 42 it is
repeated 8 times and in clearly different ways. The third part could therefore actually be divided into two parts. It could even be divided into three
parts if the five above mentioned bars are considered as self-standing, as they do not contain any ostinatos.
Bars 12-14 are worthy of special attention since they contain no ostinato motive and for this reason could constitute a part in itself (according to the
ostinato/non-ostinato principle). However it would be difficult to define two single bars as a "part", compared to the longer sections containing
ostinatos. Normally in such cases rhetorical terms like link, conclusion, introduction, bridge, etc. are used. But since in this phase of the study we
have not foreseen hypotheses of this type, we will group these bars with what follows and what precedes them. We hypothesize that in the listening
tests these three bars will be grouped with those that follow, since the ostinatos in bars 14-18 are not very easy to hear in the recording we used.
Moreover, on the score itself this area is less marked than the ostinato in the first part.
From the analytical point of view it is possible to propose the following three macroforms:
Macroform 1
I part: bars 1-11 (sec. 000-27"/30")
II part: bars 12-19 (3) (sec. 30"-50")
III part: bars 19 (4)- end (sec. 50"- end)
Macroform 2
I part: bars 1-11 (sec. 000-27"/30")
II part: bars 12 - 19 (3) (sec. 30"- 50")
III part: IIIa: bars 19 (4) - 29 (3) (sec. 50" - 1'19")
IIIb: bars 29 (4) - 42 (sec. 1'19"- end)

Addessi
Macroform 3
I part: bars 1-11 (sec. 000-27"/30")
II part: bars 12 - 19 (3) (sec. 30"- 50")
III part IIIa: bars 19 (4) - 29 (3) (sec. 50"-1'7")
IIIb: bars 29 (4) - 42 (sec. 1'8"-1'19")
IV part: bars 29 (4)-42 (sec. 1'19"- end)
EXPERIMENTAL STUDY
Method
Partecipants: 43 students took part in the experiment: 25 non musicians (university students) and 18 musicians (conservatory graduates,
conservatory teachers).
Materials: G. Kurtág, Quartet Op. 1, V movement (1959), duration 2'02" (CD WDR Auvidis Montagne MO 789007)
Equipment: EPM programme (this computer programme was devised at the University of Padua by Christian Temporali, and allows each subject to
indicate, by clicking on the mouse, the segmentations perceived while listening in real time), paper questionnaire.
Experimental procedure: each participant was given a questionnaire which in addition to the written answers involved three tasks to be carried out
using the EPM programme. The tasks were as follows: 1. Listen to the piece to become familiar with it; 2. Listen to the piece and, while listening,
record on the computer all the points of separation perceived; 3. Repeat the previous test, without worrying about any possible differences in the
answers; 4. Listen to the piece and indicate the main sections perceived on a line drawn on one sheet of the questionnaire; 5. Listen to the piece
again and indicate using the EPM programme the main sections perceived; 6. Describe in words the charactersitics of the sections indicated; 7.
Indicate, choosing from a series of possibilities, which element or elements influenced the division of the piece into parts.
Experimental hypotheses
The operative hypotheses we will deal with in this paper regard test number 5, which involves the perception of the macroform (division into main
sections):
1. The macroform of the piece listened to which obtains the highest frequency of replies will correspond to one of the three macroforms
proposed by the analytical group.
2. The results obtained from the subjects who are musicians will have a significantly higher correspondence to the analytical proposals than

Addessi
those obtained from the non-musicians.
We are aware, however, that the answers given by our participants could be much more ambiguous and hard to classify within only three
macroforms. The three macroforms proposed by the analysts are therefore meant to be considered as a basic structure within which subjective
variables due to preferential choices are likely to appear.
Results and comments

General remarks
The answers given in test n. 5 were grouped into different categories corresponding to the areas of the musical score where the participants and the
analysts identified the points of division of the musical piece into its main sections. Naturally such categories include a varying number of seconds
and bars and are wider than the points singled out by the analysts directly on the musical score since they involve answers given by the participants
during a listening task without reference to the the musical score. Under such conditions different subjective time reactions must be taken into
account.
We considered 19 ranges as shown in Fig. 1 and Table 1.
Generally speaking our data show how the subjects' answers concerning the perceived macroforms are extremely heterogeneous. Most of our
participants, whether musicians or non-musicians, indicated more divisions than those hypothesized by the analysts. The number of answers is
extremely variable ranging from 0 to 16. The comparison between subjects and groups (musicians, non-musicians) is therefore rather problematic.
In the data analysis we considered the different ranges in which the answers are clustered as our dependent variables and the musical education of
participants as the independent variable. Non-parametric test (Mann-Whitney) was used to check the difference between musicians and
non-musicians within the given ranges.
From the graph in Fig. 1 - where all the answers of the subjects have been collected, both musicians and non-musicians - we can see that in most
ranges the percentage differences between musicians and non-musicians are rather limited. Only in the range 57-65 seconds, bars 22-25, do we find
a difference which is statistically significant (Mann-Whitney U= 163, p<.05).

Addessi
The points of segmentation

The graph in Fig. 1 also shows that the points of perceived segmentation correspond in part to those predicted by the analysts.

Addessi
The segmentation most frequently perceived (27/ 34 seconds, bars 11-13; musicians: 84.2 %, non- musicians: 88.0 %) coincides with the first point
of division predicted in the macroforms proposed by all three analysts. The second most frequently perceived segmentation (73/ 82 seconds, bars
28-30; musicians: 72.2 %, non-musicians: 80.8 %) is only present in the macroforms proposed by the second and third analysts.
The other subdivision predicted by all three analysts, namely the one in bar 19 (4) within the 47/ 54 seconds category, was less frequently indicated
by our participants (musicians: 44.4%; non- musicians: 46.2 %). Finally, the point of division in bar 25 (2) within the 66/72 seconds category, which
had been predicted in macroform n. 3, is completely absent in our subjects' answers.
The macroform
In order to observe the relationship between the perceived macroforms and the three analysts' macroforms, it is necessary to analyze data subject by
subject, as illustrated in Table 1.
Musicians: it is possible to observe the following results concerning the answers of the participants who were musicians: no subject perceived a
macroform which is perfectly coincident with any of the three proposed by the analysts; we can see that there are two subjects (n. 1 and n. 3) who
perceived the same macroform.
From this point of view, our operational hypotheses have not been verified. As we have already said, however, it is nevertheless possible to observe
a general tendency that confirms the points of division indicated in the three macroforms: the subjects confirmed the analytical hypotheses, but with
some variations.
Non-musicians: it is possible to observe the results concerning the answers of our participants who were not musicians. In addition to the points we
have already made, we can see that among the non-musicians there are two subjects (n. 4 and n. 17) who perceived the same macroform. This same
macroform coincides with macroform n. 2: this is the only case of a complete verification of one of the three macroforms predicted by the analysts.
Table 1
Macroforms given by our partecipants and macroforms suggested by music analysts
Seconds 1/7 8/13 14/25 26 27/34 35 36/41 42/46 47/54 55/56 57/65 66/72 73/82 83/92 93/100 101 102/108 109-113 114/end
Bars 1-3 4-5 6-10(3) 10 11-13 13(4) 14-16(3) 16(4)-18(3) 18(4)-20 21 22-25 25(2)-27 28-30 31-33 34-36 37(1)-37(3) 37(4)-39 40-41(3) 4I(4)-end
(4) (3)
Range A B C D E F G H I L M N O P Q R S T U
Partecipants
Musicians
1 B E G O S
2 E G I M O P S

Addessi
3 B E G O S
4 B C E G I L M P Q S T U
5 E M O S
6 E M O
7 G O
8 E O S
9 E G I M O S
10 E I O P
11 E G I O
12 A B E G I
13 E
14
15 E P
16 A B E I O P S T U
17 A E G I M O
18 G M O
Partecipants
Non musicians
1 C I P T
3 B E G I L O
4 E I O
5 E Q
6 E O T
7 E G M O S

Addessi
8 B E O
9 G O
10 E O S
11 E G O T
13 B E H O P T U
14 E G O P
16 E I S
17 E I O
18 E G I O S
19 C E G
20 B E I O P Q T U
21 G O S
22 E I O S U
23 E G O P
24 B C E M O S T U
25 B C E G O S T U
26 C E I M O P Q S T U
27 A E I O
28 E I Q
Analyst's
macroforms
Macroform 1 E: bar I: bar

12 19(4)
Macroform 2 E: bar I: bar O: bar

12 19(4) 29 (3)
Macroform 3 E: bar I: bar N: bar O: bar

12 19(4) 25(2) 29 (3)

Addessi
We elaborated an empirical measure or index to check how near or far the perceived (by the subjects) macroforms were from those proposed by the
analysts. In each perceived macroform we checked all the points of subsegmentation. We compared these points with those in the analysts'
macroforms. We adopted the following formula: I= (c*100)/A where c is the number of points coinciding with the points in analysts' macroforms
(macroform 1, macroform 2 and macroform 3) and A is the number of all the segmentation points indicated by the participants. If A is smaller than
the proposed points in the analysts' macroforms then I is not computed.
Table 2
Corrispondence of macroforms proposed by musical analysts and macroforms perceived by our partecipants
Partecipants N Mean Index of Std. Deviation
Corrispondence*
Musicians Macroform 1 16 28,9583 15,7894
Macroform 2 14 48,1548 18,3280
Macroform 3 12 47,8472 18,5948
Non Musicians Macroform 1 25 32,4009 19,8958

Macroform 2 23 52,4127 22,6467
Macroform 3 14 43,2494 17,1749
● Index of Corrispondence is an empirical measure that we have adopted given by the number of the points coincident with macroforms (1, 2 and 3) multiplied by 100
and divided by the number of all the points given by partecipants to define their macroforms.
It can be seen from Table 2 that macroform n. 2 (I= 48, 1548 musicians; I= 52, 4127 non- musicians) is the one which is closest to the perceived
macroforms in musicians as well as in non-musicians.The results for macroform n. 3 are slightly lower (I=47, 8472 musicians; I=43,2494 non-
musicians), while macroform n. 1 is clearly the furthest from the perceived macroforms (I= 28, 9583 musicians; I=32, 4009 non-musicians). The
results obtained for macroform n. 1 can be explained if we consider that many participants did not indicate the category 47/54 seconds as a point of
segmentation. There are no significant differences in the I values between musicians and non-musicians according to the Mann-Whitney test:
Addessi
therefore our second experimental hypothesis, which predicted that musicians would give answers closer to the hypotheses made by the analysts,
has not been verified.
Table 1 also shows other aspects of the perception of macroforms that are quite interesting especially considering the results that are clearly far from
the predictions of the analysts.
For example, one of the points of segmentation chosen by many subjects, concerning bars 14-16 (36/ 41 seconds category), was not considered
suitable by the analysts. Our participants did, in fact, perceive bars 12-14 as a section; these bars, on account of their brevity, had been considered
by the analysts as belonging to the following section of the piece, even though they are quite well discernible due to the contrast between "ostinato"
and "non-ostinato". The rhetorical function of a "bridge" may also have been attributed to these bars, as had been done by the analysts. Our results
show that the internal characteristics of the sections, and therefore their homogeneity and contrast with the neighbouring sections, seem to be more
important for the listener than the duration of each section as far as the macroform is concerned: in any case listeners seems prefer such elements
when they are given a choice. In this sense the analysts' basic hypothesis, which proposed that the macroform of Kurtag's passage depends on the
presence or absence of the "ostinato" (contrast between "ostinato" and "non-ostinato") and on the content homogeneity of the individual sections,
predicted that these aspects would also operate in the minds of the subjects and thus be reflected in in their answers and macroforms. The results
seem in this case to support the idea that, although the section is rather small, the listeners were able to memorize not the presence of a cue
(Deliège), represented by the "ostinato", but rather its absence. Furthermore, the somewhat brief duration of the section could have allowed the
listeners to perceive the rhetorical function of these bars as a "bridge", as had been hypothesized by the analysts.
A similar case can be observed in bar 25 (2) where macroform n. 3 proposes the beginning of a new section (bars 25 (2)-28), characterized by the
suspension of the "ostinato". In this case our subjects did not indicate the beginning of the section (in fact there are no answers in this category):
they simply stopped indicating the repetitions of the "ostinato", as they had done up to the preceding bar (see the answers in the preceding category).
The total absence of indications in these bars, where the "ostinato" disappears, compared to those given in the preceding bars, which coincide with
the entrances of a very compact series of "ostinati" , tells us that the subjects perceived these bars as being different from the previous and following
ones, even though they did not indicate these bars in particular and did not consider them as a section of the passage, simply due to the lack of
"ostinati".
Many participants marked the beginning of a new section in the 83/ 92 seconds category at bars 31-33 whereas none of the three analysts'
macroforms regarded these bars as segmentation points. The analysts' macroforms give the beginning of a new section at the end of bar 29, where,
after the conclusion of the preceding section with notes of long duration, an "ostinato" which had already been proposed in preceding sections
returns. Perhaps many listeners wanted to wait and be sure that the "ostinato" was not there "by chance", but was actually the first element of a
series characterizing a new section. Only when the "ostinato" had been repeated twice did the listeners register the beginning of a new section. This
event has been studied by Deliège and El Ahmadi (1990), who hypothesized the existence of a lapse of time during which listeners must decide if
the new cue that allows them to identify a new section will come again or if it will be left out (the "tiling" zones). The tendency towards the
perception of segmentation points within the 83-92 seconds category and not in the previous one is still more evident in our subjects in the tests n.2
and n.3 (which will not be discussed in this paper) where there are more answers given in the 83-92 seconds category than in the previous one. It
should be born in mind that in tests n. 2 and n. 3 subjects had listened to the piece fewer times than in test n.5. This fact induces us to conclude that
most of the subjects are only able to anticipate the beginning of the section at bar 29, without waiting for the confirmation of the presence of the
Addessi
"ostinato", after many listenings (6) and therefore only after the memorization of the passage.
Finally, another difference between the macroforms of the analysts and those of the listeners can be found in the sections indicated in the final bars
(102/ 108 seconds, bars 37 (4) -39; 109/ 113 seconds, bars 40-41 (3); 114/ end seconds, bars 41 (4) - end). Following the rule of the "ostinati", the
analysts decided on a single section from bar 29 (3) until the end of the piece, since the same "ostinato" is repeated eight times and each time is
clearly recognizable. In the final bars, however, the "ostinato" is interrupted by long pauses which may have induced the subjects, both musicians
and non-musicians, to identify sections. The presence of these pauses certainly creates a strong case for segmentation: one may wonder, however, if
the pauses by themselves are able to create these sections or whether their contextual rhetorical function of closing the piece is somehow involved.
Besides, the pauses may also act as an element of variation in the repetition of the "ostinato", introducing discontinuity that may have led to the
perception of the segmentations. These results support Deliège's studies on the relationship between variant and invariant elements in the
memorization of a heard musical passage (Deliège and Mélen 1997).
CONCLUSIONS
The three macroforms hypothesized by the analysts represented the basic framework for many of the macroforms perceived by the listeners: the
second macroform in particular represented a kind of a macroform prototype that was perceived, with certain variations, by all listeners. A
macroform prototype is, therefore, at the basis of many macroforms actually perceived by the listeners. The contrast between "ostinato" and
"non-ostinato" seems to have been the principal criterion for subdividing the piece into different parts for both listeners and analysts, thus bringing
about the tendency towards the macroform prototype proposed by the analysts. In this sense, the presence or absence of an "ostinato" (even though
not always the same "ostinato") may have acted as a recognition "cue" during the memorization of the macroform of a heard musical passage
(Deliège).
The variations observed in the actually perceived macroforms seem to depend on a series of preferential choices that the subjects made by applying
in a more or less consistent manner the criteria of repetition (indicating a section for each repetition of the same "ostinato" or only the first time that
the "ostinato" presented itself), the rule of difference-sameness (such as in the case of the pauses inserted in the last bars among the repetitions of the
same "ostinato") and the rhetorical rule of the bridge and of conclusion.
The differences between the two groups of subjects, musicians and non-musicians, are not significant. This result is in line with the findings of our
study group, as well as with the results of research by Deliège and Imberty. Above all, this result would seem to support the hypothesis that, at some
levels of analysis (macroform), the competences possessed by the musicians do not affect the memorization of a musical passage and the perception
of a macroform. We could observe, however, that the non-musicians occaisonally gave solutions closer to the analysts' macroforms, particularly to
macroform n.2. Therefore, although there are no significant differences between the two groups of participants, the tendency that emerges from the
answers leads us to suppose that the criteria used by the analysts hypothesizing their three macroforms are nearer to a perceptive analysis than to an
analysis of the musical score. However, at least the initial presupposition of the analyses has been respected: i.e. , that the musical scores would be
able to offer us clues explaining (with the help of data inferred from the analysis of the structures) the reasons for the segmentations and the division
into parts proposed by the listeners.

Addessi
References
Addessi, A. R. and Caterina, R. (2000). Perceptual musical analysis: segmentation and perception of tension. Musicæ Scientiæ, in print.
Bigand, E., Parncutt R. and Lerdahl F. (1996). Perception of musical tension in short chord sequences: the influence of harmonic function,
sensory dissonance, horizontal motion, and musical training. Perception and Psychophysics, 58/1, 125-41.
Cross, I. (1998). Music analysis and music perception. Music Analysis, 17 (2), 3-20.
Deliège, I. (1989). A perceptual approach to contemporary musical forms. In S. McAdams and I. Deliège (eds), Music and Cognitive
Sciences. Contemporary Music Review, 4, 213-230.
Deliège, I. and El Ahmadi, A. (1990). Mechanisms of cue extraction in musical groupings: A study of perception on Sequenza VI for viola
solo by L. Berio. Psychology of Music, 18 (1), 18-44
Deliège, I., Mélen, M., Stammers, D. and Cross, I. (1996). Musical schemata in real-time listening to a piece of music, Music Perception, 14
(2), 117-160.
Deliège, I., Mélen, M. (1997). Cue abstraction in the representation of musical form. In I. Deliège and J. Sloboda (eds), Perception and
Cognition of Music (pp. 387-412). Hove: Psychology Press.
Dibben, N. (1999). The perception of structural stability in atonal music: The influence of salience, stability, horizontal motion, pitch
commonality, and dissonance. Music Perception, 16 (3), 265-294.
Imberty, M. (1981). Les écritures du temps. Sémantique psychologique de la musique (tome 2). (Le scritture del tempo. Milano:
Ricordi-Unicopli, 1990). Paris: Bordas.
Imberty, M. (1987). "L'occhio e l'orecchio. Sequenza III di Berio". In L. Marconi and G. Stefani (eds), Il senso in musica (pp. 163-186).
Bologna: CLUEB.
Imberty M. (1993). "Teorie musicali e teorie della memoria". In M. Baroni, M. Imberty and G. Porzionato, Memoria musicale e valori sociali,
«Quaderni della SIEM», 4, Milano: Ricordi.
Lerdahl, F. (1989). Structure de prolongation dans l'atonalité. In S. McAdams and I. Deliège (eds), La musique et le sciences cognitives (pp.
103-135), Bruxelles: Mardaga.
Meyer, L. B. (1996). Commentary. Music Perception, 13/3, 455-84.
Krumhansl, C. L. (1996). A perceptual analysis of Mozart's Piano Sonata K 282: Segmentation, tension and musical ideas. Music Perception,
13/3, 401-32.

Addessi
Back to index
.

Jorgensen
Proceedings paper
Practice planning and instrumental achievement

Harald Jørgensen
Norges musikkhøgskole, Oslo, Norway
hjorgensen@nmh.no
Introduction
Practising is an all-important part of instrumental study. Does this practising require some sort of
planning? And, does some sort of practice planning improve the instrumental achievement of students
in higher instrumental study? Considering practising as an activity directed by aims, it is highly
relevant to ask how students plan, and the effects of planning on achievement.
It is important to distinguish between at least two levels of planning, or two domains of planning. One
is the planning inherent in the formulation of performance aims and means during practising, and the
development of mental representations for performance. This is a domain which has been shown a
growing research interest (see Sloboda (1982), Gabrielsson (1999) for overview of research, and
Nielsen (1999) and Sullivan and Cantwell (1999) for recent examples).
The other planning domain, where this study is based, is the overall planning of practice activities. By
this I mean questions like the planning of how to co-ordinate practice sessions in relation to other
study activities; when and how students plan their practice activities, why they plan etc. These are
more global features of their practice planning, and they correspond to the planning activities carried
out by teachers. That is why research teachers' planning activities and how teachers think about
planning is relevant to this research project. There is, however, no research relating teachers'
«achievements» to their planning behaviour. The only field of research where there are comparable
research questions to my study, is studies in time management behaviours among college and
university students. I will return to these in my concluding discussion.
Research on this global aspect of instrumental practice planning is mostly neglected. Two previous
reports from the research project presented here have concentrated on different types of planning
behaviour among higher instrumental students (Jørgensen (1997 a), and their time perspective in
planning (Jørgensen 1997b). For professional musicians, there is a study by Hallam (1997), where
their organisation of practice is part of the study.
There is, however, no previous study of the relationship between global aspects of practise planning
and achievement.
file:///g|/Sun/Jorgense.htm (1 of 6) [18/07/2000 00:28:18]

Jorgensen
The study
The participants were students in an Academy of Music, in their four-year undergraduate program.
They were enrolled in the instrumental, vocal and church music institutes. Planning behaviour was
registered through a questionnaire. All questions were related to a «normal» study week or study
period, excluding periods where examinations etc. may disturb their regular, usual type of behaviour.
Instrumental achievement was measured as their instrumental performance grade on their major
instrument in their 2nd and 4th (final) year of study.
Grades are given from a five-point scale, where 1 is best and 5 is "fail". An examination concert is the
context for giving the grade. All except one student in this study got a grade on one of the three
highest levels. This leaves us for all practical purposes with three grade groups: The «excellent» (1);
the «very good» (2); and the «acceptable» (3).
My research questions are:
1. Do students in different grade groups differ in their co-ordination of practice sessions with other
study activities?
2. Do students in different grade groups differ in when they carry out planning activities?
3. Do students in different grade groups differ in their time perspective in planning practising on
repertoire and technical exercises?
4. Do students in different grade groups differ in respect to systematic planning?
The study has been carried out over several years, and some research questions have been replicated.
Results
Coordination of practice with other study activities
On busy study days and study weeks, the students usually attend several classes and rehearsals. Most
of these activities are on their weekly schedule, with fixed times for each of them, while some are not
on this schedule, being organised more ad hoc. This leaves the time between the scheduled activities
for practising, and the students have to coordinate their practice sessions in relation to their other
study activities. The question to the students was related to the time perspective in this coordination:
Did they include practice sessions in their week-plan, or did they coordinate practice sessions in
relation to other activities at the beginning of each study day, taking one day at a time? Or did they fit
them in during the course of the day, without any previous planning? When they answered, they had
to choose the one which fitted best for their own behaviour. This posed a problem for some, who
commented that their study weeks were so different, that they used all three alternatives over a period
of time. The answers will, accordingly, reflect a form of forced choice for some students, but I will
not conclude that this was a serious threat to the validity of the question.
Results from this part of the study are from 1991. The dominant type of coordination behaviour, for
students from all the three institutes and all the four study years, is to plan the coordination at the
beginning of each study day, taking one day at a time. This was the case for 58% of the instrumental
students (N=78), 46% of the vocal students (N=11) and 41% of the organ (church music) students
(N=17).
For students in their 2nd and 4th study year, there was a proportion of 10-15% in all the three grade
groups who coordinated their practice sessions by including them in their work-plan. Then there is a
difference between students with the lowest grade (3) and those with the two highest grades (1 and 2).

Jorgensen
Among the former, more than 50% fit in their practice session during the course of the day, while only
24-29% of the latter do that. The tendency is, accordingly, that the majority of students in the two
highest grade groups coordinate their practice one day at a time, while the majority of students in
grade group 3 fit in their practice sessions during the day, with no previous planning. The differences
are not statistically significant with chi square (2nd study year: N=27, chi square=5.682, df=6,
p=0.460; 4th year: N=23, chi square=4.246, df=4, p=0.374). My conclusion is that:
● Students in different grade groups did not differ significantly in practice coordination activity,
with the exception of a tendency for students in the lowest grade group to fit in practising
during the day, without previous planning.
Time period for planning
When do students plan? Planning can be carried out at quite different points of time, before, during,
and after practising. Are there differences between students in different grade groups in their
utilisation of these time periods for planning? The five periods I concentrated on, were «before a
practice day», «in the beginning of a practice session», «during practising», «shortly after practising»,
and «between practice days and practice sessions». The students were asked if they «always»,
«often», «sometimes», «seldom» or «never» used each of these time periods for planning.
This part of the study is based on information from the students in 1995 and 1996. For all students in
the three institutes (N=109), 55% planned «always» or «often» before a practice day in 1995, with
28% «sometimes» and 17% «seldom» or «never».The distribution for planning in the beginning of a
practise session was 76% with «always» or «often», 13% «sometimes» and 11% «seldom» or
«never». 50% planned «always» or «often» during practising, with 30% «sometimes» and 21%
«seldom» or «never». Shortly after practising 24% planned «always» or «often», with 31%
«sometimes» and 45% «seldom» or «never». And between practice days and practice sessions 34%
planned «always» or «often», 33% «sometimes» and 34% «seldom» or «never». We can see that the
different time periods had different popularity value with the students, with planning immediately
before a practice session as the most popular period for planning, and the period immediately after
practice sessions as the least popular. The distribution in 1996 was very similar.
The analysis was carried out with information from the two different student populations (1995 and
1996), each of them with students getting a grade in their 2nd study year, and others getting a grade in
their 4th and final year. This established four groups for analysis for each of the five practice
behaviours. Based on chi square analysis of differences between the grade groups in each of the four
analysis groups, I got the following results (with p-values from the four groups):
Students in the three grade groups did not differ significantly in their tendency to plan:
● Before a practice day (1995: p= 0.551 (2nd year, N=28), p=0.617 (4th year, N=27); 1996:
p=0.069 (2nd year, N=15), p=0.174 (4th year, N=24))
● In the beginning of a practice session (1995: p=0.611 (2nd year), p=0.281 (4th year); 1996:
p=0.760 (2nd year), p=0.333 (4th year))
● During practising (1995: p= 0.463 (2nd year), p=0.567 (4th year); 1996: p= 0.216 (2nd year), p=
0.332 (4th year))
● Shortly after practising (1995: p= 0.356(2nd year), p=0.373 (4th year); 1996: p=0.933 (2nd year),
p=0.295 (4th year))
● Between practice days and sessions (1995: p= 0.258 (2nd year), p= 0.920 (4th year); 1996:
p=0.947 (2nd year), p=0. 0.892 (4th year))

Jorgensen
Time perspective in planning practice on repertory and exercises

In their daily practice, the music the students are playing is mainly in one of two categories: Repertory
or technical exercises. The division of «playing content» in these two distinct categories is not without
validity problems. Practising on repertory will often include some technical exercises, and there is
also repertory that can be classified as exercises.
The students were asked about their time perspective in planning practising on three types of
repertory: Repertory they had not played before, which they now wanted to play; repertory that was
previously rehearsed, and which they now wanted to take up again («repeat»); and repertory that was
under rehearsal. The two types of technical exercises were: General exercises, not directly related to
their current repertory; and exercises related to and intended to face special problems in their current
repertory. I have no information that enables me to question the validity of this distinction.
The time perspectives were: Planning from one day to the next one; for approximately one week in
advance, for something between one week and a month in advance, and one month or more in
advance.
Information for this part of the study is from 1991. A summary of the distribution of responses in the
«from one day to the next one» will illustrate the use of different time perspectives towards the
planning of these types of musical content: 15% made decisions to start to play repertory they had not
played on a day to day basis, while 47% used this time perspective to plan to play repertory that was
previously rehearsed, and which they now wanted to take up again («repeat»); 56% planned in
relation to repertory that was under rehearsal on a day to day basis, 65% made decisions on general
technical exercises from day to day, and 76% made decisions on exercises related to and intended to
face special problems in their current repertory from day to day.
The analysis was carried out with students in their 2nd (N=27) and 4th year (N=23). Based on chi
square analysis of differences between the grade groups in each of the two analysis groups, I got the
following results (with p-values from the two groups):
Students in the three grade groups did not differ significantly in their time perspective when planning:
● Repertory not previously played (p=0.399 (2nd year) and 0.044 (4th year))
● Repertory for repetition (p=0.545 (2nd year) and 0.058 (4th year))
● Repertory that is rehearsed now (p=0.010 (2nd year) and 0.078 (4th year))
● General technical exercises (p=0.041 (2nd year) and 0.033 (4th year))
● Exercises for repertory rehearsal (p=0.067 (2nd year) and 0.063 (4th year))
Systematic planning and instrumental achievement
It is reasonable to believe that students differ in how they use planning, from those who use planning
in a daily and systematic way, to those who have no system or regularity in their planning activity.
Interviewing a group of 22 professional musicians, Hallam concluded that there was considerable
variability in the degree of organisation reported. Five (23%) of the musicians reported that they were
very well organised, ten (45%) reported being "moderately organised" in their practice, and seven
(32%) "perceived that they had a lack of "natural" organisation, some developing strategies for coping
with this, e.g. drawing up practice schedules, setting specific objectives for each practice session."
(Hallam, 1997, p. 97).
In my research project, 259 students in two different student populations were asked: «Do you regard

Jorgensen
yourself as a person who uses practice planning in a systematic way?». The answers were
approximately normally distributed, with 5% saying that they regarded themselves as «very
systematic planner», 20% were «very systematic to average systematic», 50% were «average
systematic», 18% were to «average systematic to very unsystematic planner», and 7% were to «very
unsystematic planner». My research question was now: Do students in different grade groups differ in
respect to systematic planning?
Looking at the three different grade groups in each of the four analysis groups (see above), the main
conclusion is that:
● there is no significant difference between grade groups in their evaluation of their own use of
planning in a more or less systematic way (p=0.561, 0.510, 0.235 and 0.291 for, respectively,
year 2 and 4 in 1995, and year 2 and 4 in 1996)
Discussion
The main result may seem surprising: There seems to be no systematic and statistically significant
difference between students with different grades regarding several types of practise planning
behaviour.
Since this result is from an explorative project in a field with no previous research, we have to look at
research on other students' planning behaviour, outside music, for a comparison and discussion. Even
here the research activity is very small, but there are some studies about students' time management
behaviours and their academic performance. Macan et. al. (1990) developed a «Time Management
Behavior Scale», based on «tips, ideas, and techniques repeated throughout several how-to books on
time management» (op.cit. 761). When they related the students 'grade point average' with the overall
score on the scale, the correlation was 0.23. Correlations between grade point average and the four
factors were: «Setting goals and priorities», 0.10; «Mechanics - Planning - Scheduling», 0.20;
«Perceived control of time», 0.22; and «Preference for disorganization», 0.17. All correlations are
positive, indicating a positive relationship between certain types of planning behaviour and academic
achievement. The values are, however, so small that their main message is that this relationship is
negligible.
Britton and Tesser (1991) developed a time-management questionnaire with 35 items, each answered
on a 5-point scale. Their theory was derived from research in computer-operating systems, and based
on the supposition that the information-processing resources of college students is managed by some
mental system analogous to the time-management component of a computer's operating system. They
used «cumulative grade point averages» over all four college years as a dependent measure, and
developed three factors in the questionnaire. The correlation between grade points and the three
factors were: For «Short-range planning», 0.25; for «Time attitudes» it was 0.39; and for «Long-range
planning» -0.10. The first and last of these correlations are negligibly low, while the «time attitude»
factor shows a positive and sufficiently high correlation to be of interest. I will return to this factor.
A third study, by Trueman and Hartley (1996), is also relevant for my discussion. They used a
shortened version of Britton and Tesser's scale on students in psychology in a British university. The
correlation between academic performance (on first year examination) and the whole scale was 0.15,
between academic performance and «Daily plan» it was 0.04; and between academic performance and
«Confidence in long-term planning» it was 0.19. All of them negligibly small.
These three research efforts from study contexts other than instrumental music, in my view, support
my own conclusion: There is no general and systematic relationship between certain types of planning
activity and academic achievement. Even if there are several limitations in my research and in the

Jorgensen
three reported studies in time management, both in measures of dependent and independent variables,
and possible neglect for important aspects of planning, the low correlations in the time-management
studies among college and university students, and the non-significant differences between
achievement groups in practise planning behaviour among the instrumental students, suggest that
there is no general and strong relationship between planning and achievement, relevant for all
students. The most important result from the time management studies is the suggestion given by the
«Time attitudes» factor in the Britton and Tesser study. This factor suggests that for many of the
students, it is more important how they experience their own control (or lack of control) over their
study time, than how they manage and plan the distribution and use of this time.
References
Britton, B.K. and Tesser, A. (1991). Effects of time-management practices on college grades. Journal
of Educational psychology, 83, 405-410.
Gabrielsson, A. (1999). Music Performance. In: Deutsch, D. (Ed.), The Psychology of Music. 2nd ed.
San Diego: Acadmic Press.
Hallam, S. (1997) Approaches to instrumental music practice of experts and novices: Implication for
education. In: Jørgensen H. and Lehmann A. C., (Eds.). Does practice make perfect? Current theory
and research on instrumental music practice, pp. 89-107. Oslo, Norway: Norges musikkhøgskole.
Jørgensen, H. (1997a). Higher instrumental students' planning of practice. In: Proceedings, Third
Triennial ESCOM Conference, pp. 171-176. Uppsala, Sweeden, 7-12 June 1997,
Jørgensen, H. (1997b). Higher level students' time perspective in planning instrumental and vocal
practising. In: Proceedings, IV International Symposium of RAIME, pp. 52-61. Dundee: Northern
College.
Macan, T.H., Shanani, C., Dipboye, R.L., Phillips, A.P. (1990) College students' time management:
Correlations with academic performance and stress. Journal of Educational Psychology, 82, 760-768.
Nielsen, S. (1999). Regulation of learning strategies during practice: A case study of a single church
organ student preparing a particular work for a concert performance. Psychology of Music, 27,
218-229.
Sloboda, J.A. (1982). Music Performance. In: Deutsch, D. (Ed.), The Psychology of Music. New
York: Acadmic Press.
Sullivan, Y.M. and Cantwell, R.H. (1999). The planning behaviours of musicians engaging traditional
and non-traditional scores. Psychology of Music, 27, 245-266.
Trueman, M. and Hartley, J. (1996). A comparison between the time-management skills and academic
performance of mature and traditional-entry university students. Higher Education, 32, 199-215.
Back to index

CHILDREN, PULSE AND MUSIC
Proceedings paper

Ambjörn Hugardt
Department of Musicology, Göteborg University, Sweden
Introduction
There is a substantial amount of psychological research investigating human "musical" timing. This
research has been performed for more than a century and most of it adopts an experimental approach.
Examples of such research areas are; frequency regions in relation to perception and performance of
beat; synchronization abilities; perception and performance of changes in tempi; expert behaviour
compared to nontrained; differences between perception and performance; personal spontaneous
tempo; developmental aspects, and effects of different manipulations such as rhythm training and
education, and even administrating whiskey to the subjects (Harrel, 1937).
Some of the earliest research efforts in this field focused on stability in different tempo regions and
synchronization. (Stevens, 1885; Dunlap, 1910: Harrel, 1937). One main finding is that there seems to
exists a range of frequencies that constantly is regarded as easy to perceive, easy to perform and
experienced as possible foundations for the "experienced beat". This frequency span is typically found
approximately between 60 - 120 bpm, sometimes higher. The centre frequency, around 80 bpm, is
often regarded as the best suited for these different tasks. (Stevens, 1885; Brown, 1979; Fraisse, 1982;
Grieshaber, 1987; Duke, 1989)
Previous research has also suggested that there is a concept called "spontaneous tempo", "personal
tempo", or "mental tempo" which is a voluntary tempo that is characteristic of the individual ( Fraisse,
1982).
In such research it has been argued that different individuals have their own typical and specific
spontaneous tempo which can be expected to be relatively invariant over time.
Earlier research, unfortunately, for the most part focus on adults (Pouthas, 1996) and is most often
aimed at producing results on a general level.
In practising music and in music education it is usually assumed that, children in a class or a group
can experience the tempo, or the pulse in such a way that they can act together upon that experience in
musical activities, e.g. singing, playing and dancing.
Tempo is of vital importance in both the experience of music and in the performance of it.
Synchronization is of vital importance in anticipating musical events and the ability to make music,
alone or especially together with others. Individuals differ, however, quite markedly in their
synchronization to the music played.
file:///g|/Sun/Hugardt.htm (1 of 6) [18/07/2000 00:28:20]

Practically all research on the subject also confirms this individual variability in synchronization. The
timing precision, nevertheless, is generally found to be very high, in the range of tenths of
milliseconds. (e.g.Dunlap 1910; Barttlett&Bartlett, 1959; Keele &Ivry, 1988; Franek, Mates, Radil,
Beck, Pöppel, 1991). Fraisse (1982) states that synchronizing is also possible in cases of more
complex rhythms and in cases of accelerated and decelerated sounds even if this diminishes the timing
precision.
This present investigation focuses these two major aspects of time in music, tempo and
synchronization.
The purpose is to investigate the concepts of spontaneous tempo and synchronizing to regular
external stimuli to see if children exhibits individually stable ways to handle these concepts and how
these behaviours may change over time. The concept of synchronization is investigated in two ways.
The first is synchronization to steady tempo and the second is synchronization to slowly changing
tempo, which was intended to show how synchronization was performed over a greater frequency
region.
Three fundamental questions were asked:
1. Is there individual typical stability to be found regarding these two concepts in 8-year old school
children?
2. If such stability is to be found, to what extent and in what way does it vary between the children?
3. If such stability is to be found does it still exist five years later or in what respect has it changed?
Method
The group in the investigation consisted of two school-classes from two schools in the same
neighbourhood in Arvika, Sweden. There were 18 girls and 12 boys.
The thirty children were tested in 1992 when the children were eight years old and in 1997 and both
test years involved three test sessions.
The research design adopts an experimental approach in which children in an individual setting are
tested regarding spontaneous tempo and synchronization to external pulse flow. The two concepts are
measured by computer and the external pulse flow is also computer generated.
The test equipment consists of a computer with a MIDI pad and a sound module connected to it. The
sound module is adjusted to produce the distinct sound of a snare drum. This is amplified and played
back through a speaker. The drumpad is put on a table at a height that enables the children to beat the
pad with an ordinary drumstick in a comfortable manner while standing.
Findings in an earlier investigations (Hugardt, 1987) stated that in an individual setting children could
produce a spontaneous, steady pulse flow when beating a drum. In this present investigation, the drum
was substituted by a drum-pad connected to the computer.
The computer allows for very accurate measurements and is reasonable portable so that the
investigation can be carried out in the school where the children are and not located in a laboratory.
A computer-program, specially designed for this investigation, was invented, and produced. This
software measures and analyses the spontaneous tempo, synchronization to steady pulse flow and

synchronization to slow deceleration and slow acceleration.
Results
High individual stability was detected both in spontaneous tempo and synchronizing behaviour in
both the 1992 and the 1997 test sessions.
Spontaneous tempo
The individual stability in spontaneous tempo was generally higher in the 1992 test results.
The measure used to express deviation in spontaneous tempo from occasion to occasion (or stability
in spontaneous tempo) was the mean deviation from the individual mean frequency and this was 10,7
% in 1992 compared to 13,8% in 1997.
The mean spontaneous tempo for all children was fast in both years compared to earlier suggestions of
around 100 bpm. (Fraisse, 1982). In 1992 it was 144,6 bpm and in 1997 it was slightly faster, 149,6
bpm.
The ranges in spontaneous tempo in the 90 measurements were remarkable similar in the two years
respectively. It was between 52 and 297 bpm in the 1992 measurements and between 59 and 272 bpm
in 1997.
In 1992 there were 18 children deviating less than 10% compared to 16 in 1997 and there were six
children deviating more than 20% in both years. These are, however not the same children in both
years. The conclusion was that even if the deviations between occasions were greater in 1997 relative
1992 the increase was only 3%. The absolute amount of deviation for most children from their
individual mean frequency in both test years indicated a consistency in spontaneous tempo
performance. This was supported by ANOVA tests on the three test sessions in respective year and on
all the six test occasions in 1992 and 1997 together which all reveals a significant difference between
the children indicating individual stability in performance in each test year and also in the longitudinal
perspective.
Synchronization to steady tempo
The children exhibited great differences in synchronizational precision. Despite this the mean
deviation from stimulus tempo in both test years for all children was below ten percent of stimulus
tempo.
The most striking difference between the 1992 and the 1997 test results was the dramatic drop in
deviation from stimuli in synchronizing behaviour indicating a higher precision in synchronizing in
the 1997 measurements. The average mean deviation for all children was reduced from 7,83% in 1992
to 3,46% in 1997. The standard deviation was reduced from 6.86 to 1.67 indicating both low deviation
and low variation in these from occasion to occasion in 1997. The range in deviation from stimulus
tempo also decreased. The range in 1992 was between 2% - 36% and in 1997 between 1% - 9%.
It was apparent that it foremost was the children with the highest deviation readings in 1992 that
dramatically reduced their deviation in the 1997 measurements.
Individual stability in synchronization to steady tempo
ANOVA test on the three test occasions in both 1992 and 1997 indicates that the mean values in

deviation from each child were significantly different from the others indicating individual stability in
this respect. When the results from the six test occasions in both test years are analyzed together, the
ANOVA test still displayed significant difference between the children's performance, indicating
individually stable performances even in the longitudinal perspective.
Most children exhibited low variability in their synchronizational performance from occasion to
occasion, both in 1992 and in 1997. In 1992, four children displayed a considerably higher deviation
from stimulus tempo in each test occasion. These most deviating children in 1992 also varied most in
their amount of deviation from stimulus tempo, from occasion to occasion. No such extreme result
was to find in the 1997 test sessions where all children performed in a much more uniform way.
Synchronization to slowly changing tempo
This part of the investigation displayed great similarities to the steady tempo measurements in the
difference between the children and in the average amount of deviation.
The deviation from stimuli in the synchronization to slowly changing tempo was nevertheless
generally a little higher than the deviation in the steady tempo measurement. The deviation had
dropped in the 1997 test sessions compared to 1992 but the drop was far from as big as in the steady
tempo results. The average deviation from stimuli tempo in the slowly changing tempo measurements
only decreased from 9.47% in 1992 to 8.37% in 1997.
There were, just as in the steady tempo measurements, extreme cases of high deviations in 1992,
which exhibited a remarkable drop in deviation in the 1997 measurements. The range between
smallest and largest deviation also decreased in 1997. In 1992 it was 4% - 26% and in 1997 4% -
19%.
Individual stability in synchronization to slowly changing tempo
ANOVA test revealed that there was, as in the synchronization to steady tempo measurements, a
significant difference between the children in respective year. When the test results from both years
are analyzed together the ANOVA test still displayed difference between the children's performance
in a significant way, indicating individually stable performances even in this longitudinal perspective.
The most deviating children also exhibited the greatest variation in performance from occasion to
occasion in the 1992 test sessions while this tendency is gone in 1997. This result also corresponds to
the findings in the steady tempo measurements.
Gender results
Substantial differences between boys and girls were found in the synchronizational performances in
the 1992 measurements. Mean deviation from stimulus tempo for girls in the steady tempo
measurements was 6,2% and for boys, 10,3%. The corresponding figures in the slowly changing
tempo measurements were 8,1 % for girls and 11,5 % for boys.
T-test on this difference was nevertheless not overwhelmingly significant and exhibited p=0,068 in
the steady tempo measurements and p=0,023 in the changing tempo measurements.
Conclusions

The results from the investigation give the following answers to the questions asked in the
introduction:
There was individual typical stability to be found regarding performance of spontaneous tempo and
synchronization in 8-year old school children?
The range in variation between children was big. In the spontaneous tempo measurements the range
was over 200 bpm from the slowest to the fastest tempo. The synchronization to external stimuli also
displayed great differences between the children in their synchronizational precision and these have
proved to be individually stable.
The stability in performance found in these children in 1992 was still present in 1997. The variation in
individual spontaneous tempo from occasion to occasion was somewhat higher in 1997 indicating a
little drop in individual stability here. The deviation from stimuli tempo in the synchronization task,
on the other hand, was lower in the 1997 sessions indicating a higher synchronizational precision.
Differences between high deviating children and the rest in synchronization task in 1992 was
dramatically reduced and exhibits a much more uniform picture in 1997
Discussion
The increase in accuracy in synchronization to external stimuli with age is earlier documented (i.e.
Grieshaber, 1987) and it is interesting to note that this investigation conforms to this notion and that
the children at the same time maintains their individually stable performance. It is also interesting to
note that children that were stable in high deviation in their synchronization when they were eight
years old have made the greatest change and dropped most in deviation from 1992 to 1997. The
synchronization to slowly changing tempo exhibits a drop in timing precision compared to the steady
tempo measurements, which is a result that conforms to earlier findings, (Fraisse, 1982). The
individually stable performance is nevertheless present in both synchronization to steady tempo and to
slowly changing tempo.
The results altogether suggests the importance to pay attention to individual differences while the
developmental aspects of the results stresses the importance in not regarding individual differences as
static. In practising music and in music education, paying attention to this last remark might be of
vital interest.
References
Bartlett, NR, Bartlett, SC (1959) Synchronization of a motor response with an anticipated sensory
event. Psyc Rev 66:203-218.
Brown, P. (1979). An enquiry into the origins and nature of tempo behaviour. Psychology of Music
7/1: 19-35.
Duke, R.A. (1989) Musician's perception of beat in monotonic stimuli. Journal of research in Music
Education. 37: 61-71.

Dunlap, K. (1910) Reactions to rhythmic stimuli with attempt to synchronize. Psych Rev 17: 399-416.
Fraisse P. (1982) Rhythm and Tempo, In D. Deutsch (Ed.). Psychology of music. New York:
Academic Press. pp. 149-180.
Franek, M., Mates, J., Radil, T., Beck, K., Pöppel, E. (1991). Sensorimotor synchronization: Motor
responses to regular auditory patterns. Perception & Psychophysics, 49 (6), 509-516
Grieshaber, K. (1987). Children's rhythmic tapping: A critical review of research. Bulletin of the
Council for research in music education 90 (Winter 1987): 73-81.
Harrel, T.W. (1937) Factors influencing preference and memory for auditory rhythm. Journal of
General Psychology. 17:63-104.
Hugardt, A. (1987) Puls och rytmik, barn och motorik, ingen är den andre lik. Göteborg: Göteborg
University.
Keele S.W., Ivry R. (1988) Modular analysis of timing in motor skill. In G.Bower (ed.). The
psychology of learning and motivation vol. 21. San Diego: Academic Press. pp. 183-228.
Pouthas, V. (1996). The development of the perception of time and temporal regulation of actions in
infants and children. In I. Deliège and J. Sloboda (Eds.). Musical beginnings. Origins and
development of musical competence. Oxford: Oxford University Press. pp. 115-141.
Stevens LT (1886) On the time-sense. Mind 11: 393-404
Back to index

DISTLIST/
Proceedings paper
Listening to Oneself at a Distance

Bengt Edlund, University of Lund, Sweden
Das Ohr ist des Musikers ganzer Verstand.

(Robert Schumann)
Introduction
It is likely that Schumann referred to the inner hearing that turns sounds into music, and that he, as the
ardent romanticist he was, wanted to maintain intuition at the expense of intellectual reasoning. It may
perhaps be taken as a sign of more prosaic times if attention is instead paid to the conditions that
determine how musicians perceive the vibrations produced by their instruments.
There are several reasons why the musicians themselves are not the best judges of the sounds out of
which they make music. The player, actually producing the music, is apt to "hear" a confluence of
physical sound waves and sensations emanating from the bodily motions that generate these sounds,
and it is also likely that musicians sometimes confuse the actually emitted sound sequence with their
musical intentions - they may hear what they wish to be heard rather than what there is to be heard.
Finally, it is obvious that players more often than not listen to themselves from a peculiar and
misleading acoustic perspective, very different from the one that really should count in professional
work, viz. that of the audience.
The directions of sound propagation and also the frequency-dependent angles of diffusion may be
such that part of the immediately emitted sound is likely not to hit the musicians' ears. This means that
the players' perception of the direct sound is often biassed towards low frequencies, and that
musicians are more or less dependent on reflected sound to get an idea of the spectral quality, a
reflected sound which, due to absorption, is impoverished with respect to high-frequency partials. On
the other hand, due to the very short distance to the instrument, players do hear a lot of secondary
sounds associated with the tone production - sounds that, particularly if they have high frequencies,
are not audible at greater distances because of air absorption. Proximity also causes musicians to hear
themselves as very loud in relation to their fellow players (a violin in a string quartet is a case in
point) though perhaps not as loud as they really are (trumpets, if you ask the woodwind players seated
in front of them). Finally, musicians and listeners alike hear a mixture of direct and reflected sound,
but in the ears of the former direct sound is bound to dominate over reflected sound, and therefore
musicians at play are poor judges as regards the effects of reverberation on their playing.
file:///g|/Sun/Edlund.htm (1 of 9) [18/07/2000 00:28:22]

DISTLIST/
A few further examples may serve to illustrate the problems involved. All brass players (except those
playing the French horn) are behind the bells of their instruments, which is hardly a good position if
you want to get full and reliable information as to your actual sound quality and loudness. Singers are
even worse off since their sound perspective is dominated by low-frequency biassed sound
transmitted via bone conduction. Organists and conductors deal with a multitude of different
intensities and sound qualities, and whereas the spread of sound sources might help to separate the
various components, it is still very difficult to form a correct idea of the joint effect of, and the
balances within, the organ registers and orchestral parts as they are heard in the auditorium - indeed,
conductors sometimes step back from the orchestra during rehearsals to find out.
Clever musicians have somehow learnt from long experience to cope with the fact that they cannot
always trust their auditory feedback when they play - at least we like to think that they are not victims
of the peculiarities of their listening conditions. But this experience is hard-earned; there is a lot of
trial and error, and much waste of time, involved in this process of learning.
It is true that ever since tape recorders came into general use, musicians have had an equipment at
hand making it possible to listen to themselves at a distance. But it seems that tape recorders have
been little used to guide artistic judgement in practising and rehearsals. The reason for this is probably
the fact that one can only listen to tapes afterwards - there is no push to corrective change when it
would be most effective, i. e. during the very act of performing. The ideal thing would be immediate
feedback, to listen to oneself at a distance while playing.
Theoretical considerations
In this paper is proposed, tried out, and evaluated a method that in a number of ordinary situations
makes it possible to judge one's own playing with "distant ears".
In short the method works as follows. In order to prevent as much as possible the player from hearing
the sound from the instrument in the natural, airborne way, he/she is wearing high-performance
protective earmuffs. The sound is instead picked up by microphones mounted at some distance in the
room and then relayed to earphones in the earmuffs.
In order to work satisfactorily from the perceptual point of view, distant listening must fulfill two
conditions. The proximate sound travelling directly from the instrument to the player, and leaking
somewhat through the earmuffs, must seem to be exchanged for the distant sound fed back from the
microphones to the earphones. Secondly, the dominating distant sound must not confuse the player
because of its somewhat delayed arrival.
The first condition implies that the distant sound led back to the ears must have a substantially higher
intensity level than the remainder of the proximate sound, finding its way into the ears in spite of the
efforts to muffle it. Otherwise the proximate sound will not be properly masked.
The sound intensity inevitably decreases in proportion to the square of the distance from the
instrument. On the other hand and depending on the amount of absorption, the room will be uniformly
filled with reflected sound. The intensity level of the distant sound received by the microphones is
therefore the sum of the direct sound spread from the instrument and thus reduced in intensity, and a
considerable loudness increment due to reflected sound. Indeed, a few meters away from the
instrument the reflected sound begins to dominate over the ever-weaker direct sound, determining the
sound intensity and making it practically constant no matter the further distance from the instrument.

DISTLIST/
The microphones should be mounted outside this reverberation radius (which depends on the amount
of reflected sound in the room) if one wants to gain information as to how the music is heard by the
audience; cf. Sundberg 1991, p. 176.
First-rate protective earmuffs of the kind used in the present experiments reduce the sound level with
approx. 16 dB at 125 Hz, 23 dB at 250 Hz, 32 dB at 500 Hz, and 39 dB at 1000 Hz. The masking
effect of tones within the same critical bandwidth as the tone to be masked is approx. 20 dB; cf.
Sundberg 1991, p. 67. Excepting perhaps very low tones (that generally have weak fundamental
frequencies anyway) it seems that the intensity difference between relayed distant and muffled
proximate sound may allow of proper masking without excessive amplification of the signals from the
microphones - as a last resource, the relayed sound can of course be amplified until it drowns the
proximate sound.
The dual fact that masking implies adding the intensities of the sounds involved, and that
amplification of the distant sound above its actual level may be necessary, means that distant listening
is not suitable for checking the authentic loudness of the direct sound at the location of the
microphones - a minor drawback in most applications since the much more important relative
intensity differences within the distant sound are preserved. When setting the volume of the relayed
sound, proper masking must be the primary consideration; next comes a level that makes possible a
comfortable and attentive study of the distant sound. Only in the third place, and if it is of any interest,
one might try to adjust the volume so as to approximate the authentic intensity.
Turning to the second condition, the time interval between the muffled proximate sound and the
stronger distant sound is also critical - the distant sound having travelled through the room to the
microphones is bound to arrive at the player's ears somewhat later than the proximate sound. But
double onsets (pre-echoes) must be avoided, and so must any sense of delayed onset in general -
discrepancies between motor and auditory onsets may be gravely confusing for players.
If however the time difference between the arrival of the early proximate sound and the late distant
sound does not exceed a certain value, and if the intensity of the delayed distant sound is greater than
that of the proximate sound, a variety of the "precedence effect" may be taken to apply. (Cf. Benade
1976, p. 204; Hall 1980, p. 363.)
The precedence effect as used in public-address systems regulates the relative positions (the times of
sound arrival) and the intensity levels of sound source and loudspeakers, and it means that the
amplified sound is added to the orginal one in such a way that you locate your sensation to the
original sound source, and that you take it to start at the onset of the original sound. In order for this
illusory disappearance of the original sound to work properly, two limits must be respected. The
relayed sound must not reach the listener more than approx. 30 ms after the direct sound, and it must
not be more than approx. 10 dB louder than the direct sound.
In this specific application, however, the 10 dB maximum intensity difference between proximate and
distant sound can be disposed of - there is no need to secure correct localization, since both the
proximate and the distant sounds are heard as being within the earmuffs. Respecting only the 30 ms
time-difference limit, the listener will hear the late distant sound as starting at the onset of the early
proximate sound, and the illusion necessary to avoid double or delayed onsets has been attained.
The second condition for distant listening thus stipulates a maximum delay of approx. 30 ms, which in
turn introduces a limit for how far the microphones can be placed from the instrument: since the
velocity of sound is approx. 343 m/s, the distance from instrument to microphones should not amount

DISTLIST/
to more than approx. 10 meters. But this is more than many rooms measure, and also (for practically
all music rooms) well beyond the reverberation radius, outside which reverberated sound dominates
the aural impression.
To conclude, it should be observed that the masking situation involved, and hence the criteria of
masking, is somewhat unusual. Masking and masked sound are the same except for the crucial quality
differences, and this sameness is furthermore a necessary condition for the early-onset illusion of the
precedence effect. This implies that for the present purposes, masking is a fact when the ordinary
sound of the instrument, recognized by any musician playing it, is said to be replaced by a distinctly
different sound, associated with how the instrument sounds at a distance. (In practice, the sound
immediately heard to be replaced when the distant sound is switched on is of course the very dull,
muffled proximate sound.)
It thus appears that the distant-listening method might work, but rather than indulging in further
theoretical considerations, suitable equipments, procedures and applications should be tried out in
practical trial-and-error experiments. However, before presenting such a pilot study, another crucial
problem should be shortly mentioned.
There may be some instruments that do not lend themselves to distant listening. Singers, who would
benefit most from the opportunity to judge their voice at a distance, have their instruments inside their
own heads and must therefore be excluded. The low-frequency biassed, bone-conducted sound that
determines the impression of one's own voice can of course not be quenched by ear-protecting
devices. The violin, pressed against the jaw-bone, the clarinet, more gently held against the teeth, and
the brass instruments, fed by the vibrations of the lips, might possibly also leak too much sound
directly into the skull. And loud low-frequency sounds in general may be problematic since they are
well transmitted through solids - the vibrations from double basses standing on a podium might for
instance be taken up by the feet and relayed to the ears via the skeleton.
Experimental procedure
It was decided to test the distant-listening method for eight different instruments - violin, violoncello,
flute, clarinet, trumpet, trombone, piano, and organ - as well as for a baritone singer in order to find
out what insights a singer might eventually gain from the experience. The subjects were asked to
prepare and perform a few short excerpts from their repertoire, varying with respect to tempo,
dynamics and articulation.
In addition, beyond the plan, distant listening was also tried by some further musicians passing by -
two horn players, a vibraphone player, two more singers (a soprano and a baritone) - and by the author
at the piano. The author, wearing earmuffs with earphones supplying distant sound, sometimes
accompanied the subjects at the piano in order to find out what assistance one might get from distant
listening when it comes to balancing parts in ensemble playing.
Excepting the organ session in St. Andreas Church in Malmö (a rather large, modern church with
fairly generous reverberation) the experiments were carried out in the Rosenberg Hall at Malmö
College of Music, a newly-built concert hall of moderate size, featuring adjustable reverberation by
means of courtains.
The typical set-up of the equipment (it was sometimes changed or simplified) was as follows. A first

DISTLIST/
pair of microphones were mounted just above the musician's head in order to pick up the proximate
sound from the player's perspective; a second and third pair of microphones were used for the distant
sound. The nearest of these microphones were set up at a distance of 5-6 meters - just outside the
reverberation radius as measured by means of a sound-level meter - whereas the other pair was
mounted as far away as the musicians could accept - beyond a distance of 8-9 meters the subjects
began to notice double onsets. The signals from the various microphones were fed into a
mixer/amplifier, and from there on to a tape recorder and (in the case of distant sound) to the
earphones in the earmuffs.
To test equipment of different technical quality, two kinds of microphones were used alternatively to
supply the distant sound - Sony ECM 909A microphones representing good standard quality, and
professional Brüel & Kjær 4006 microphones. At the other end of the transmission chain, the standard
earphones of the Peltor HT7A protective earmuffs were used alternatively with no-communication
Peltor H7 earmuffs, combined with Sony MDR E 565(B) free-style earplugs devised for musical use.
And sometimes the Tascam M 216 mixer/amplifier unit was exchanged for the standard amplifier of
the Revox B77 tape recorder. The amplification volume corresponding to equal sound pressure at the
distant microphones and within the earmuffs, respectively, was determined for each combination of
components by means of a sound-level meter (Brüel & Kjær 2225) as well as aurally; this volume was
used as a normal, starting value for amplification in the experiments.
Besides testing different equipment and procedures and learning about the limitations of the method,
the distant-listening experience was also evaluated by the musicians. In addition to reporting on the
perceptual qualities of the various distant-listening conditions, they were asked whether the equipment
or the musical situation was disturbing, and if they considered distant listening musically suggestive -
the latter question leading to talks that turned out to be quite informative.
Since it was of interest to study spontaneous modifications in performance due to distant listening, the
subjects played some of their short music selections first without, and then with the earmuffs
supplying distant sound; these renderings were recorded for subsequent musical evaluation. In order
to simplify comparisons of the playing characteristics, the left and right channels of the tape recorder
registered the sound from the proximate and the distant microphones, respectively.
Results and musical evaluation
The purpose of the experiments was to find out if and when the distant-listening method works, and to
have its merits, if any, assessed by the musicians taking part as subjects.
The aim was not primarily to establish the optimal equipment, but rather to test the method in various
technical and musical conditions. Expensive equipment is likely to give the most satisfactory auditory
results, but it was considered more important to try the method at more modest levels, allowing distant
listening to be put to everyday and handy use.
Turning to the musical evaluation of the method, main importance was attached to some core issues in
music making, such as the character of the sound at a distance as opposed to the biassed ordinary
sound perspective, questions of loudness balance, and the influence of reverberation on performance.
Special applications will require more sophisticated equipment and procedures. Thus, if you want to
carefully study matters of sound quality, it is of course crucial to use first-rate microphones and
earphones, and also a high-quality amplifier. If you are particularly interested to know exactly what a

DISTLIST/
listener hears, you must ensure that the recording of the distant sound models human binaural hearing
as closely as possible; dummy-head microphones might perhaps be used to this end. And if you want
to try to get some idea of the actual dynamic level of your playing as it is heard at a certain distance, it
is necessary to use a sound-level meter in order to check that the intensity of the sound fed into the
ears equals that received by the microphones.
Technically the distant-listening method worked well in most situations.

The distant microphones could be mounted as far away as approx. 7 meters, i. e. at a distance allowing
the influence of reverberation to be clearly perceived, but precluding any disturbing double or dealyed
sound onsets or any distracting lack of co-ordination between motor and auditory onsets.
As could be expected, the singers' ordinary way of hearing their own voices could not be supplanted
by an impression of their voices as heard at a distance - the sound when wearing the earmuffs was
certainly different from the usual one, but still appreciably mixed with bone-conducted sound. For the
instrumentalists, on the other hand, the distant sound relayed to the earphones turned out to mask the
residual proximate sound transmitted to the ears in spite of the earmuffs - all subjects testified that
they heard their playing in an entirely different, "out-there" way. This effect could be attained in all
cases but one without excessive amplification above the actual intensity level at the microphones - the
trombone player, especially when he played loudly, needed a considerably raised intensity level in
order to hear himself as if playing at a distance.
All subjects could appreciate the difference between the high-quality microphones and earphones, and
the standard ones. But whereas some subjects held that first-rate sound reproduction was desirable,
indeed necessary, others did not and were rather repelled by the somewhat clinical sound from the
extremely sensitive professional microphones. The use of free-style earplugs inside silent protective
earmuffs was considered uncomfortable. According to most subjects, the standard amplifier of the
tape recorder was sufficient for the purpose. In conclusion, then, it turned out that rather inexpensive
standard equipment may work satisfactorily, and that the most worthwhile technical improvement
might be to devise protective earmuffs with properly adjusted, top-quality built-in earphones.
Turning to the individual musicians' evaluation of the method but avoiding duplication of frequent
points of view, a number of observations made by the subjects will be reported and shortly discussed.
The violinist sometimes noticed a slight over-hearing directly from the instrument. This effect was not
due to bone conduction since it remained when the violin was not in contact with the jaw. (When
playing the piano, the author could also hear some such additional sound when he tipped his head
backwards and slightly to the side.) It seems probable that for certain sound-propagation angles (or
certain positions of the head) sound might more easily leak in under the edges of the earmuffs. When
using the most distant microphones, the violinist had some difficulties to co-ordinate properly with the
piano. He found the method especially useful for improving details of bowing technique - he could
distinguish those noise components that really count at a distance and work with them. Engaged in
selecting a new violin, he wanted to use the method in order to compare different instruments with
regard to how they might sound to an audience.
The violoncello player found the distant-sound perspective with its brightness and transparence quite
inspiring, and she was especially interested in the opportunity to get an idea of the acoustic
invironment - the curtain at the rear wall was operated to test how a varied amount of reverberation
influenced performance. Both wearing distant-listening equipment, this subject and the author

DISTLIST/
rehearsed the exposition from the first movement of Brahms' E-minor Sonata, evaluating the
potentials of the method as an aid to achieve a good balance between the instruments.
The first of the two flute subjects was very pleased with the fact that distant listening relieved him of
the ordinary way of hearing himself differently with the left and the right ear. (For flutists the sound is
very loud in the right ear, whereas the left ear is exposed when playing the violin.) He also
appreciated the possibility to hear to what extent the distant tone was free from noise associated with
the blowing. Having brought his baroque and classical flutes in addition to his modern one, he played
the same passages on all three instruments and found it quite informative to listen to the distant-ear
impressions. He also played notes in different registers and at different dynamic levels on these flutes,
and compared the sound at various microphone distances with the dB-values obtained from the
sound-level meter at these positions.
The other flutist found the distant sound richer in overtones, and used the method to instantly evaluate
the effects of different (both proximate and distant) microphone positions. He also found it quite
useful to study the relationships between various attack articulations and the distinctness of tone onset
as heard at a distance, and to check the balance of multiphonic and whistling effects.
The clarinet player did not find that the difference between distant and ordinary listening was very
great, but used the method with profit to evaluate timbre differences associated with various
fingerings.
The subject playing the trumpet did not hear the distant sound as quite representative, but he found it
very interesting to play with the relaying microphones as far away as 11 meters although the
substantial delay robbed him of the immediate auditive feedback necessary to control the onsets. This
condition reminded him of the fact that trumpet players (and other musicians seated far back in the
orchestra) must play slightly ahead of the conductor's signs in order to make for good joint precision.
(This idea might perhaps be developed into a practising method helping students to acquire a feeling
for the proper degree of temporal "push" in large-ensemble playing. Conducted by an assistant at the
position of the distant microphones, one player is seated close to the conductor whereas the other one,
wearing earmuffs and being seated far away from the conductor, tries to play in exact co-ordination
with his colleague.)
As already mentioned, the trombone player tended to hear both the bright distant sound and some
amount of low-frequency biassed sound, a situation that made for intonation problems. The intonation
was appreciably bettered, however, when the intensity of the distant sound was raised to secure full
masking. He took the opportunity to test the rule that the listeners' auditory impression of the
trombone is more favourable if the player does not stand face to face with the audience, directly
exposing the listeners to all high-frequency components.
The pianist was perhaps the most enthusiastic. He found the rich, balanced and transparent distant
sound of the grand piano much more attractive than the ordinary sound heard at the keyboard, being
dominated by low-frequency components and by quite a lot of thumping mechanical noise. (The
author can also testify to this - the difference is quite extraordinary.) He also stressed that it was of
great value to hear more of the reverberation in the hall, and he could immediately use this
information to refine articulation and pedaling.
Due to the position of the organ, the distant microphones had to be placed at a substantially greater
distance down at the floor of the church, but in spite of this the organist did not complain about
delayed feedback. (Organists are likely to have acquired great tolerance with respect to late onsets.)
He considered the method profitable for judging registrations - especially such involving both the

DISTLIST/
subjectively quite offensive Brustwerk just above the organist's head, and the Rückpositiv behind his
back with its shut-off sound. Since it gave an idea of what it takes to achieve an impression of
musically effective silence in a highly reverberant room, he found distant listening useful for checking
articulation and choice of tempo. He also mentioned that organists sometimes do use microphones
suspended under the vaults to gain a distant perspective of the musical events issuing from the gallery;
they tend to use ordinary headphones, not protective earmuffs, however, and thus they are not likely
to mask effectively the proximate sound from the organ or the choir.
Turning finally to the baritone singer, he could only hear the distant quality of his voice as part of a
mixture of realyed and bone-conducted sound, but on the other hand he found it quite interesting to
sing wearing just silent earmuffs. This condition, implying a substantial reduction of emitted sound in
favour of internally transmitted vibrations, offered opportunities to check the head resonances for
various pitches and vowels - a suggestion that deserves to be studied and that may be put to
pedagogical use. Both wearing distant-listening equipment, the singer and the author at the piano
worked with problems of balance: at what accompaniment loudness and in which voice registers does
the pianist run the risk of dominating the singer? Distant listening made it possible to gain an
objective idea of the actual relative intensities involved: habit prompts a singer to stand more or less
with his/her back towards the piano - an unfavourable listening position not only for the pianist,
getting little or no direct sound, but also for the singer, hearing the piano quite loudly. The lid of the
grand piano was sometimes closed or almost so, sometimes against current practice left wide open.
The quality of the piano sound was of course appreciably bettered in the latter condition, which did
not necessarily result in drowning the singer. It seems that the custom of closing the piano may derive
more from the fact that the singers, standing in the middle of the acoustic draught, do not want to feel
overwhelmed, than from well-founded considerations with regard to sound balance.
It thus appears that distant listening works in a number of ordinary musical conditions, and also that it
might yield valuable musical insights. Distant listening can of course not be used all the time or even
very often, but frequent use is not necessary in order to gain musical experience and to arrive at
specific musical conclusions. It seems that this method, sparingly applied, may be an important
resource within higher musical education, offering students opportunities for reconsideration of
ingrained performance habits. Distant listening may also be used with some profit when it comes to
certain difficult problems in professional work. While the method may certainly be developed in
various ways to satisfy specific demands, distant listening can already be applied in a variety of
musical situations as a tool for improving professional training and refining artistic efforts.
References
Benade, Arthur H. (1976). Fundamentals of Musical Acoustics. New York, Oxford University Press
Hall, Donald E. (1980). Musical Acoustics. Belmont, Wadsworth Publishing Company
Sundberg, Johan, (1991). The Science of Musical Sounds. San Diego, Academic Press
Acknowledgements
I wish to express my thanks to Johan Sundberg (Royal Institute of Technology, Stockholm) and
Anders Jönsson (Department of Audiology, University of Lund) for their constructive interest. I am

DISTLIST/
also grateful for the open-minded co-operation of my subjects, all teachers or students at Malmö
College of Music: Anders Frostin (violin), Hege Waldeland (violoncello), Anders Ljungar-Chapelon
and Terje Thiwång (flutes), Christophe Liabäck (clarinet), Peter Meyer (trumpet), Mattias Cederberg
(trombone), Andrzej Ferber (piano), Hans Hellsten (organ), and Johan Weigel (baritone).
Back to index

Jane Davidson
Proceedings paper
Exploring the body in the production and perception of performance

Jane W. Davidson
Centre for Research in Musical Performance and Perception,
Department of Music, University of Sheffield, Sheffield S10 2TN
1. Background Over the past ten years, I have worked at describing and interpreting the
body movements of musicians in an attempt to understand the relationships between the
physical control of an instrument, the musical material being performed and the performer's
implicit and explicit expressive intentions. To date, my work has suggested that the interface
between physical execution and the expression of mental states is a subtle and complex
one. For instance, performers appear to develop a specific vocabulary of expressive
gestures, yet these gestures - though perceptually discreet - co-exist and are even
integrated to become part of the functional movement of playing. Additionally, the' meaning'
of these individual gestures - unlike the specific emblems used to accompany speech -
appears to change dramatically according to context. In parallel with these highly
individualised concerns of function and expression for each performer, there is the matter of
how both musical and extra-musical concerns are coordinated between co-performers using
body movements. Also, how both group and individual concerns are communicated to the
audience.
2. Aim I wish to explore the interaction between individual performance body style, musical
expression and communication in order to understand how a coordinated and meaningful
performance is created. I shall explore this question through a case studies of three singers
from different Western styles of performance: classical, jazz and pop.
3. Contribution This work builds on previous research to bring music production and
perception research within a social psychological framework.
4. Implications In theoretical understanding and practical teaching the implications of this
paper are far-reaching, regarding the performance process as a 'tuning-in relationship'
between co-performers and audience - each party familiarising him or herself with the
gestures of the other individuals and interpreting them from a basis of shared stocks of
knowledge.
Extended Paper
Background
Davidson (1993, 1995) demonstrated that information about both structural features and
expressive intention is communicated to observers through body movement. For example,
she showed that performances of the same piece of music with three different expressive
intentions (to perform the piece without expression, with normal expression and with
file:///g|/Sun/Davidson.htm (1 of 6) [18/07/2000 00:28:24]

Jane Davidson
exaggerated expression) could be clearly differentiated on the basis of a video of the

performers' movements alone. Indeed, when both aural and visual information were used as
stimuli, non-musicians relied almost entirely on the visual information from the body
movements for their judgements of the performers' intentions.
In two studies of the performances of a single pianist, Davidson (1994) explored in detail the
kinds of movement that may guide the perceptions of observers/listeners. The first study
used a tracking technique to quantify the movements in two dimensions (up/down and
forwards/backwards in relation to the keyboard), and showed the expected relationship
between movement size and expression - the more intense the expressive intention, the
larger the movement. In the second study, observer judgements explored the extent to which
different regions of the body were informative of the performance intention, demonstrating
that the upper torso/head region alone was sufficient for an accurate perceptual judgement
('deadpan', 'projected', 'exaggerated') to be made.
In further studies, Davidson (2000) explored whether the movement information about the
expressive intentions of a pianist was available to observers in a continuous stream, or
whether it was limited to particular moments within a performance. On the one hand,
observers reported a cyclical body sway that was continuously present and expressive, but
on the other hand clips of only two seconds of visual material, whilst allowing accurate
judgement of expressive intention to be made, revealed that some moments were more
obvious indicators of expressive intention than others. These latter results also showed a
significant link between the detectable intentions and key musical structures, therefore, it
seemed likely that key structural moments (a cadence point, for instance) were the most
obvious indicators of expressive intention.
In Davidson's work, systematic observations indicated that the pianist's cyclical movements
emanated from the hip region. Given the sitting position of a pianist, the hips represent the
fulcrum for the pianist's centre of gravity, and therefore provide the pivotal point for all upper
torso movements. This centre of gravity seemed to be the central location for the generation
of physical expression.
Research on the origin of physical expression is scant, but a parallel between Davidson's
findings and the work of Cutting, Kozlowski and Proffitt can be drawn. Cutting and Kozlowski
(1977) explored the nature of physical expression by examining walkers (in their work the
'expression' was the walker's identity and gender). Their studies showed that any part of the
walking movement cycle and any body joint provides similarly expressive information. For
example, any two second excerpt from an isolated body part such as the ankle would reveal
information about gender and identity. Cutting, Proffitt and Kozlowski (1978) explained these
results by demonstrating that there is a point (referred to as 'the centre of moment') within
the walker's body which acts as a reference around which movements in all parts of the body
are organised. Clearly, the expressive information in piano playing is not equally distributed
around the body for the simple reason that the body is not engaged to equal degrees in
producing a performance. Thus, a centre of moment for expressive movements at the piano
will inevitably be related to the sitting position, the centre of gravity, and hence will be
revealed in the swaying movement. This description certainly provides a framework from
which we can begin to understand the perceptual importance of the swaying movement in
the pianist's performances.
Cutting and Proffitt (1981) refined the description of the centre of moment by suggesting that
different parts of the body convey similar expressive information at different levels. This

Jane Davidson
explains the finding in piano playing that there was both continuously available expressive
information (the swaying motion) and much more local information (for instance, specific
information limited just to two seconds of the performance): that is, some areas of the body
are global indicators of expression whilst other local parts of the body provide more specific
information. In music, of course, there are many potential demands on the player. Adhering
to the score means that there will be differing technical requirements made on the body. This
in turn will affect the presentation of expressive intentions, making some areas clearer
indicators of expression than others. Additionally, key musical structures may be the
individual points around which expression of the intention is most pronounced. Given the
strong evidence that structure and expression are closely related, significant structural
moments are likely to provide the focal points around which specific examples of expression
will be organised, accounting for the very local nature of some of the expressive moments.
All of the above mentioned studies show the critical role of movement and give some
insights into the specifics of the types of movement being used, and even perhaps why the
movements are used. However, on these basis of these data, it would be misleading to imply
that bodily movement in performance can be accounted for simply in terms of the primary
processes of physiology, sensori-motor coordination, and the cognitive mechanisms of
expression. There is a powerful social component to the way in which we use and present
our bodies - in musical performance no less (and arguably rather more) than in other
aspects of our lives. As argued elsewhere (see Clarke and Davidson, 1998), Gellrich (1991)
has shown how a set of specifically learned mimetic movements and gestures furnish a
performance with expressive intention, and suggested that these gestures can have both
negative and positive effects on the production of the performance. In the positive sense,
they can provide the observer with information which assists in understanding the
performance since the gestures can intensify and clarify meaning, even when the movement
itself is 'superfluous' to the production of the musical whole. In other words, there can be a
'surface' level of movement - a kind of rhetoric - which the performer adds to the
performance. On the negative side, if these gestures are not consistent with the intentions of
the performer, they can create physical tensions in the performer, inhibiting technical
fluency, and disturbing observers with the incongruity between the gesture adopted and the
performance intention.
Support for Gellrich's observations about the negative consequences of incongruous mimetic
gestures is found in the work of Runeson and Frykholm (1983) who demonstrated that
covert mental dispositions become specified in movement and can be detected by
observers. Using the simple task of lifting a box, they asked observers to report what they
could see, and discovered that the box weight, and how much that weight differed from the
lifter's expectation about the weight, could be detected. Most relevantly, attempts to give
false information about the box weight were detected by the onlookers. Thus in this case, the
lifter's expectation, the deceitful attempt and the real weight of the box are specified.
Clearly, 'surface' gestures may contribute significantly to the production and perception of a
musical performance. Indeed, a further interpretation of the finding that some two second
excerpts of the pianist's performances in Davidson's study were more richly informative than
others could be that mimetic gestures are used at certain points during the performance, and
that these movements heighten the expressive impact of a specific moment. For instance, a
large head shaking gesture may have occurred which could have had its own distinguishable
form, yet been part of the all-pervasive swaying movement. A fairly extensive literature on
physical gesture in spoken language (cf. Ekman and Friesen, 1969; Ellis and Beattie, 1986)

Jane Davidson
indicates that gestural repertoires emerge which are associated with specific meanings, and
it could be that the pianist in Davidson's studies had developed specific gestures for
particular musical expression - a gestural movement repertoire.
The current study will explore how and why these identifiable gestures are used, and the
extent to which social factors such as performance etiquette influence the shaping of them.
The current paper builds on Davidson's previous work in that the social context is explored,
examining how style and culture influence the movement patterns. Furthermore, the notion
of a 'centre of moment' for musicians other than pianists will be considered, as here the case
will be singers.
Methodological Note
All the data used in this study came from video recordings of performances. The study
examines live performances by a classical singer, a jazz singer and a pop singer. These raw
data were subjected to repeated observations by the author to explore the nature of the
expressive gestures used. The first study, an analysis of one of the author's own
performances, provided the grounding for the subsequent analyses. Interpretations of the
analyses were obtained by asking two independent evaluators for their feedback and
commentaries on the raw material and the interpretations of it.
Results
The self-analysis revealed that the classical singer regularly engaged in a forward and
backward rocking motion, shifting her weight from side to side. There were also individual
gestures during the course of a full one hour programme, however, these were few in
number but they used were of seemingly very different types. Observations suggested that
they were expressive of the following:
i) movements directly related to and reactive of material in the texts of the poems - a priest
giving a sermon was portrayed with outstretched preacher-like gestures;
ii) movements linking together sections of the music or ideas between musical passages -
hands in a slow moving 'begging dog' position to connect one phase end to the opening of
the next song;
iii) gestures with clear technical orientation - a lifting and turning hand and forearm
'illustrating' the action of the soft palette lifting;
iv) movements of direct instructional nature about musical entrances and exits, as signals to
the accompanist - head nods to indicate 'now'.
These movements were regarded as a combination of being:
a) performance process-oriented - to assist the moment-by-moment issues of co-ordination:
making the performance start, remain fairly co-ordianted and finish;
b) expressive of emotional intention;
c) rhetorical in terms of both the narrative of the poem and the music. Additionally, revealing
a 'story' about how the singer had been trained to move to produce the performance. The
palette-lifting gesture, for example, presumably having been learned in singing lessons and
then integrated as an expressive gesture in the performance.

Jane Davidson
The subsequent analyses of the other two singers similar features in that both included
many of the elements listed above. However, both displayed a lot more in terms of
non-musical performance elements. For example, the jazz performer engaged in dance
movements with her co-performers, the pop singer engaging directly with the audience,
using her body to make sexually enticing gestures and signals.
Discussion
This work certainly adds to the previous study in that by looking at singers who have text as
well as music to communicate, some insights into the different types of movements being
used can be more readily deciphered: linking narrative and gesture, for instance. It is
possible that such differences may also occur in instrumental performance, but perhaps it is
more difficult to differentiate between the narrative gesture, and the gesture used for
primarily technical ends, but which has become synthesised into an expressive movement
vocabulary (like the palette lifting movement used by the classical singer).
Furthermore, it is evident that socially motivated movements are used a lot in singing
performance: musical content being co-ordinated though interactive movements between
co-performers; but also, movements of how singers typically or stylistically move within a
certain performance context.
In a recent paper, Cook ( 2000) has argued that the movements of a musical performance
shows how music and action combine to create a 'different' work, not simply a piece of
music: rather, a performance 'multi-media', as Cook terms it. The analysis of the three
singers allowed for these issues to be explored. But, additionally, it enabled the author to
make a reflexive turn and note that within the tradition of classical concert singing it was
rather less likely for 'multi-media' devices to be used, whereas jazz and to the largest extent
pop performances were structured around the body as the provider of an additional element
or embellishment to the music.
References
Clarke. E.F. & Davidson, J.W. (1998) The body in performance In W.Thomas (Ed.)
Composition-Performance-Reception. Aldershot: Ashgate.
Cook, N. (2000) Demise of the Work Ethic: Jimi Hendrix's Improvisation as Performance Art.
Royal Musical Association Conference : Performance 2000, University of Southampton,
April.
Cutting, J.E. and Kozlowski, L.T. (1977) Recognising friends by their walk: Gait perception
without familiarity cues. Bulletin of the Psychonomic Society, 9, 353-56.
Cutting, J.E., Proffitt, D.R. and Kozlowski, L.T. (1978) A biomechanical invariant for gait
perception. Journal of Experimental Psychology: Human Perception and Performance, 4,
357-72.
Cutting, J. E. and Proffitt, D.R. (1981) Gait perception as an example of how we may
perceive events. In R.D. Walk and H.L. Pick (eds) Intersensory Perception and Sensory
Integration, New York: Plenum.
Davidson, J.W. (1993) Visual perception of performance manner in the movements of solo
musicians. Psychology of Music, 21, 103-13.

Jane Davidson
Davidson, J.W. (1994) What type of information is conveyed in the body movements of solo
musician performers? Journal of Human Movement Studies, 6, 279-301.
Davidson, J.W. (forthcoming September 2000) Understanding the expressive movements of
a solo pianist. Deutsche Jahresbuch fur Musikpsychologie
Ekman, P. and Friesen, W. V. , (1969) The repertory of nonverbal behaviour:Categories,
origins, usage, and coding. Semiotica, 1, 49-98.
Ellis, A. and Beattie, G., (1986) The Psychology of Language and Communication, London:
Weidenfield and Nicolson.
Gellrich, M. (1991) Concentration and Tension, British Journal of Music Education, 8,
167-79.
Runeson, S. and Frykholm, G. (1983) Kinematic Specification of Dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition, and
decpetive intention, Journal of Experimental Psychology: General, 112, 585-615.
Back to index

ALNDKE1.DOC
Proceedings paper
Motivic Structure and the Perception of Similarity

Nicola Dibben, University of Sheffield, & Alexandra Lamont, University of Leicester
Introduction
This research combines theory and empirical findings from a range of different fields, including music theory,
music analysis, music perception, music psychology and cognitive psychology, to explore the ways in which
listeners understand similarity relationships in music. In so doing we attempt to place this research on a firm
theoretical footing, and to account for the differences in interpretation which have arisen in previous research
through uncritical use of the same terminology: words like 'deep' and 'surface' structure will be explored from a
range of different perspectives in order to deepen our understanding of these concepts.
Background
In the context of music theory and analysis, the dominant view of similarity is based on perceptual similarity
primarily considered in terms of motivic and harmonic relationships - as exemplified in Western music of the
classical period (Réti, 1951). According to this traditional view of similarity in music, themes are similar to one
another by virtue of shared features such as contour, mode, and rhythm.
Music theory relating to common practice tonal music appears to adopt a prototype-based approach to similarity
relationships (see e.g. Réti, 1951; Meyer, 1973), in which the level of the motif corresponds to that of the basic
level of categorisation described by Rosch et al. (1976; for applications to music see Deliège & Mélen, 1997, and
Zbikowski, 1999). Motifs can create similarity, and manipulation or transformation of those motifs can create
difference. The understanding of a musical motif can be described as a core of pitch and rhythmic information
which may be subjected to variation by a range of musical transformations. Some of these are considered as
'surface' (changes of texture, orchestration, register, pace and so on), whilst others are considered as 'deep'
(derivation and fragmentation of the original pitch and rhythm information, for example) (using Réti's terms, deep
transformations are more important as central to the strength and mystery of musical composition).
For dodecaphonic music, the question of similarity is more problematic, particularly because the theoretical
underpinning of coherence is similarity of tone rows yet the audibility of the row for the listener is doubtful.
However, dodecaphonic music theory still tends to focus on the abstract prototype view as appropriate for this
style. For example, Alegant's review of analyses of Schoenberg's op. 33b notes that the section beginning in bar 1
introduces the row in the region, constituting a prototypical sentence as outlined by Schoenberg and evidence of the
pervasive influence of (neo-) Classical formal models in his mature twelve-tone works (Alegant, 1996: 145).
Coherence is guaranteed by the row and not by 'surface' similarity, and so a similar theoretical view is adopted for
both classical sonata form and dodecaphonic music. Alegant concludes that 'the brute force repetition of these
partitions is the glue holding together the pitch-class structure and, more importantly, establishing a partitional
syntax - or the closest thing to syntax we are likely to encounter in Schoenberg' (ibid: 166).
From a music theory perspective, then, there seems to be much support for the prototype view as underpinning
similarity and coherence in music, in both tonal and dodecaphonic styles. From a cognitive-psychological
perspective there have been many different approaches to similarity and categorisation. Studies of similarity and
categorisation during the 1970s focused upon similarity-based classification (Medin & Schaffer, 1978; Rosch,
1975; Rosch & Mervis, 1975): a means of classification which commonly forms the basis of 'natural' categories.
These kinds of classifications have been shown to be stable and long-term categories which show a strong link to
similarity, they lack an explicit definition (unlike ad hoc categories, such as 'things to take on a camping trip'), they
have a number of associated properties which are generally true of category members, although not universally so,
and they have a graded structure such that some items are more clearly and uncontroversially members of the
category than are others.
file:///g|/Sun/Dibben.htm (1 of 8) [18/07/2000 00:28:26]

ALNDKE1.DOC
In contrast to this similarity-based classification, a second strand of research has highlighted the importance of
explicitly defined concepts, or theory-based classification (e.g. Murphy & Medin, 1985; Rips, 1989). According to
this research, similarity is insufficiently clear and constrained to act as an explanation of categorisation (e.g. 'whale'
and 'bat' are members of the category 'mammal' despite perceptual dissimilarity) and suggests that we categorise
not on the basis of clusters of similarity but on the basis of selecting the concept that best explains the instance to
be categorised. The role of this kind of classification is also supported by developmental research, where it has
been argued that children's concepts are first based upon similarity but are later replaced by more theory-like
categorisations (Keil, 1989). (This distinction between similarity-based and theory-based classification resembles
the distinction between 'natural' and 'artificial' categories, and Zbikowski's Type 1 and Type 2 classifications).
Overlapping with this, a distinction is often drawn between perceptual ('surface') similarity based on immediately
obvious (usually visual) features of objects (e.g. colour, shape), and theory-based ('deep') similarity, which suggests
that is not only the surface appearance of members of objects/events that determines conceptual distinctions but
that aspects of their deeper character are also involved (Keil, 1989; Medin & Ortony, 1989; Rips, 1989).
One question raised by these uses of the terms 'surface' and 'deep' therefore is whether it has the same meaning in
the psychological as in the musicological context. In both cases the notion of a polarity between 'deep' and 'surface'
similarity seems to lead to a rather limited understanding of what might be responsible for creating similarity.
Hampton has suggested a revised model of similarity and categorisation in which he cites evidence for the role of
similarity in categorisation (Hampton, 1997): namely, the fuzziness of concepts, the degree of flexibility and
context sensitivity of conceptual categories and the ability of similarity to pervade attempts to reason logically. He
concludes that similarity-based categorisation is a widespread phenomenon but that it should be broadened to
encompass information which goes beyond the perceptual appearance of objects and that although we have the
ability to think in a precise, logical fashion, this is generally more difficult and requires training. In sum, he
suggests that similarity forms the basis of people's concepts most of the time (Hampton, 1997: 109).
Adopting this understanding of similarity and the continuity between 'surface' and 'deep' structures we
conceptualise similarity relationships in music in terms of a continuum of levels of structure, ranging from very
deep through to very surface. We can map this onto Dowling's (1982) conceptual framework for levels of pitch
organisation. Dowling describes musical systems at four levels of abstraction from the actual notes of real melodies
(Figure 1). The first, most abstract level of the psychophysical scale represents the physical materials of the pitch
continuum produced by logarithmic frequency. The second level of tonal material represents the set of pitches in
use within a particular musical system. For Western tonal music this represents the twelve chromatic notes which
divide the octave (comprising all the notes available in instruments of fixed pitch such as the piano). At the third
level, the tuning system is a subset of the available pitches used in actual musical pieces, and for Western tonal
music this typically comprises the diatonic scale. The final, most concrete level of mode provides an anchor for
frequencies, a tonal focus in the tuning system, and a tonal hierarchy determining which notes are more important
within the pitches of the tuning system. This level of mode equates roughly to tonality.
Figure 1
Dowling's pitch framework (shown for Western tonal music)
Psychophysical Scale low fx ------------------------------------------------------------- f2.02x high
Tonal Material C C# D D# E F F# G G# A A# B C'
Tuning System (C C D E F G A B C'

major)
Mode (C major) C D E F G A B C'
(adapted from Dowling, 1982: 21)
Our preliminary review of the features of music to which listeners have been found to show sensitivity in various

ALNDKE1.DOC
settings and under various conditions will thus adapt this framework to consider many dimensions of music in this
'levels of depth/abstractness' manner. The theoretical views of similarity reviewed above can now be applied to
consider how listeners might make sense of relationships between different parts of a piece of music whilst they are
listening to it. We first apply the framework outlined above to consider how the piece might be perceivable, before
looking at research focusing specifically on how the piece is actually perceived.
The music analysis literature implies that compositions are unified by their underlying thematic connections, but
that surface differentiation is included to create interest and variety. An analysis of any given piece of music should
thus show that the more surface features of similarity do not necessarily serve to emphasise the underlying thematic
similarity. There are (functional) places in a piece where we might expect all the levels of similarity relations to
reinforce one another. For example, at the start of a piece it would be important to establish the pattern of what will
follow by highlighting the critical features. This notion is consistent with theories of key derivation (Brown, Butler
& Jones, 1994) and of metre induction (Povel & Essens, 1985) which indicate that the clearest and most
unambiguous statements occur at the beginning of a piece of music to orient the listener and provide a guiding
framework. It also concurs with the notion of psychological essentialism (Medin & Ortony, 1989), which proposes
that features of the world tend to co-occur with both surface and structural similarities in order to assist us in
making sense of important events and objects in the world.
This then suggests that there will be a network of similarity relations within a single piece of music. Some parts of
the music will be strongly related at a range of different levels: for example, the exposition and recapitulation in
sonata form (which are often identical, with the exception of occurring at different time points). Other parts of the
music may share more surface similarities but with underlying fundamental differences. These would be the
similarity relations that 'deceive' the casual listener. Still other parts of the music may share deeper similarities but
not more surface similarities (again, deceiving the casual listener that they are different when after repeated
listening, for example, the similarities may reveal themselves). It is also important to note that there will be parts in
the majority of musical styles which are dissimilar on many levels.
Another important point relates to the way that the particular musical style, and further, the particular musical piece
itself, both set constraints on similarity. For instance, a tonal classical piece will have certain boundaries around the
acceptable tonalities employed within it - an example of style-specific similarity. A piece written for solo piano
will typically not suddenly introduce the sound of a domestic hoover in its final bars. Less extremely, each piece of
music sets up its own similarity criteria. It is thus difficult to set out precisely how similarity relations operate at
more generic levels since these are likely to be highly context-specific (to draw on another notion of psychological
similarity). This issue is theorised elsewhere through the notion of frames (Barsalou, 1992; and its application to
music in Zbikowski, 1999).
Empirical studies of the perception of similarity in music have found evidence for prototype effects, and
differences in the extent to which different kinds of motivic transformations effect recognition and similarity
judgements. Using specially constructed stimuli, Welker (1982) found a melodic prototype received high
recognition and similarity judgements whilst higher-order transformations led to decreases in accuracy and lower
similarity ratings. However, Rosner & Meyer (1986) found surface features of contour and rhythm influenced
similarity judgements as much as underlying melodic processes. Judgements are also influenced by experience and
exposure to the particular piece in question: Francès (1988) found musically inexperienced subjects unable to
identify correctly the two themes of a classical piece, erroneously labelling them as new material (having been
unduly influenced by surface differences), while Pollard-Gott (1983) showed that after a single hearing of a
classical tonal piece, listeners' similarity and descriptive ratings of passages drawn from the piece were primarily
influenced by surface features of the passages such as loudness and complexity of texture. Thematic relationships
only emerged as important with subsequent hearings. Similar findings have been obtained by Dowling and Bartlett
(1981) showing listeners' failure to confuse 'related' thematic material with the 'target' materials in a recognition
task, which contrasts with short-term comparison studies where contour-preserving variants of melodies are often
confused as 'same' (Bartlett & Dowling, 1980). Furthermore, Chapin (1982) found 8 year old children performed
better on a thematic recognition task than adult pianists and choir members, suggesting that different strategies may
be employed at different points in development.
Some of the contradictions which these findings raise may be related to the experimental approaches adopted.
Where instructions, training and musical materials are relatively complex (Francès, 1988; Pollard-Gott, 1983),
thematic recognition or robust similarity judgements based upon thematic criteria appear to be slow to emerge,

ALNDKE1.DOC
whilst in less cognitively demanding situations such relationships appear more easily extractable. In a slightly
different vein, Leonard Meyer, Eugene Narmour and Robert Gjerdingen have provided stimulating discussions of
the perception of archetypes and schemata in music, and Lawrence Zbikowski shows how categorisation can be
used to account for the role and effects of motives and motivic structure (Zbikowski, 1999), but as yet there has
been only limited attempts to investigate such issues empirically.
Empirical Study
Method
Forty university students participated in the study: twenty with and twenty without musical training. The materials
were two different pieces of piano music: Beethoven's piano sonata Op. 10 no. 1, first movement, and Schoenberg's
Klavierstuck Op. 33a. The Beethoven piece was chosen on the basis of Réti's analysis of it, which reveals a single
basic shape underlying a variety of surface manifestations. The Schoenberg piece was selected as an example of a
dodecaphonic piece which is built on a hexachordal combinatorial series, providing a basic shape underlying both
themes in serial sonata form, and because Schoenberg makes explicit reference to the importance of motives and of
motivic development in his own writings (e.g. Schoenberg, 1967). As well as incorporating two themes, the piece
involves a 'modulation' back to the home transposition at the recapitulation. Both pieces thus use the principle of
developing variation within a sonata form, with more than one thematic group, and are written for the same
instrument, and neither were particularly well known to the participants.
Nine extracts were selected from each piece on the basis of the theoretical framework elaborated above, with some
sharing many features on many levels, others sharing more surface elements but not deeper elements, still others
sharing deeper elements but not surface elements, and others with very low levels of similarity at any level. The
extracts were paired and the order of presentation to subjects began for each piece with the opening motive as the
first extract presented. This was to set up the piece-specific criteria for real-life similarity perception.
Following one complete hearing of each piece, listeners heard the 37 pairs of extracts and judged each pair for
similarity on a scale of 1 to 11, where 1 represented minimal similarity and 11 represented maximal similarity.
They also heard each extract individually and provided a series of adjective ratings on bipolar scales, although
these results are not discussed here due to limitations of space.
Results
Similarity judgements
A two-way analysis of variance with pair-wise comparisons as the repeated measure revealed a significant effect of
fragment pairs for both the Beethoven (F35,1190=13.862, p<.0001), and the Schoenberg piece (F(35,1190)=23.196,
p<.0001). In neither case was there any effect of experience (trained versus untrained), although the interaction
between the similarity ratings on pairs of fragments and experience approached significance in the case of the
Schoenberg piece (F(35,1190)=1.408, p=.0588).
The similarity ratings made on pairs of fragments were analysed for each piece separately using multi-dimensional
scaling to reveal any underlying criteria in the similarity judgements. Given that the interaction between similarity
judgements and experience approached significance for one of the pieces, the results for the trained and untrained
listeners were analysed separately.
Beethoven similarity ratings
Four dimensions emerged as the best solution (stress levels of the MDS solution were approximately .05 in each
case - a relatively low value). Interpretations of these dimensions are given below.
Musicians:
• Pitch shapes and intervals. All the melodic material is very similar, so the differences that there are seem to
be due to emphases on particular notes: one cluster of fragments are characterised by a G to G contour with
Ab-G motion, while another is characterised by the interval of a minor sixth (G to Eb) and semitone
appoggiaturas. Absolute pitch seems to be an important factor here rather than scale step or degree.

ALNDKE1.DOC
• Thematic function and tonality. Fragments are polarised with the initial statement of the first theme in the
minor key judged furthest from the beginning of the development (same material as theme 1) in the major
key, with the 'new' development theme close to this.
• Well-formedness and closure. At furthest remove are those fragments with perfect cadences and one which
ends on a prolonged dominant, with more and less cadencing sequences in between.
• Major/minor and modulation. The 'most' minor within the similarity space of the MDS is the subdominant
minor (F minor) rather than the home key (C minor), which appears in the middle of the space. A fragment
with a major/minor contrast is the most major (Ab to F minor), with other major key fragments close by,
perhaps because the major-minor contrast highlights the 'majorness' of the major key.
Non-musicians:
• Pitch shapes. Although pitch shapes emerge as an important similarity criteria again, in this case the
emphasis is on particular notes rather than shapes, i.e. those fragments with the same pitches are judged as
most similar (Eb, G, etc.) rather than those with the same intervals.
• Thematic function and salience. The first and second themes are clustered together at one end of the MDS
with restatements and recapitulations at the other. This factor is more apparent for the non-trained listeners
than it is for the trained listeners: the non-trained listeners treat the first and second themes as very similar on
this dimension, whereas the trained listeners the preparation for the second theme seems more salient, and
the first theme is polarised by mode.
• Rate of harmonic change. One-bar harmonic changes are positioned furthest from the more flowing
harmonic changes which happen over two or four bars.
• Major/minor and metric emphasis. This dimension appears to be a combination of tonality and of which
beat in the bar is emphasised, with syncopated and minor-mode fragments maximally distant from metrically
clearer 4/4, major fragments.
In sum, the non-trained listeners appeared to be focusing more on less detailed levels: pitch shapes alone rather
than pitch shapes and intervals, thematic function and salience rather than thematic function and tonality, and
temporal aspects like harmonic change and metric emphasis rather than more syntactic and style-specific
parameters such as modulation or closure. The non-trained listeners also seemed to be focusing on the identity of
the two main themes and differentiating less between other material which is often grouped together.
Schoenberg similarity ratings
Musicians:
• Thematic function and salience. The first statements of thematic material are grouped together and judged
maximally different from restatements of material which are also grouped together. Less salient material is
placed between these extremes.
• Register and (conjunct/disjunct) melodic motion. Three groups of fragments emerge in the MDS: fragments
that are static and registrally constrained, those involving intervallic oscillations, and fragments characterised
by stepwise descents.
• Texture. On this dimension, three groups of fragments emerge characterised by different textures: melody
with accompaniment, fragments including both chords and arpeggios, and fragments using homophonic
chordal movement in crotchets.
• Global contour. Fragments on this dimension are characterised by different kinds of contour: from those
descending and then ascending, to those ascending and then descending, then ascending fragments only.
Non-musicians:
• Thematic function and salience. As with the trained musicians, first statements of thematic material are
grouped together at one extreme of the dimension, and restatements of the same thematic material are
grouped at the other extreme, with less salient material in between.

ALNDKE1.DOC
• Length and tempo. Shorter and faster extracts appear to be polarised from longer and slower ones.
• Melodic motion (conjunct/disjunct). The influence of conjunct or disjunct melodic movement is clearer for
here than in the case of the trained musicians: fragments involving stepwise versus disjunct melodic
movement are clearly arranged along this dimension.
• Global contour and complexity. Those fragments involving simple ascending and descending movement
are placed furthest from those with more complex contour and more changes in contour on this dimension.
Perhaps unsurprisingly, given the findings of previous research, there is no contribution of row structure on
listeners' ratings. Both trained and untrained listeners appear to be focusing on the thematic function and/or
salience of fragments in the Schoenberg piece (unlike the situation for the Beethoven piece where tonality appeared
to be interacting with thematic function for the trained musicians). Once again, the influence of temporal aspects of
the music appears to be a stronger factor for the untrained listeners.
Conclusions
The results of this study suggest that in tonal music, listeners' judgements are characterised by pitch, tonality and
thematic function, with additional influences for non-musicians of tempo-based features (harmonic rhythm and
metric emphasis). In atonal music, listeners' judgements are characterised by thematic function, but also show the
influence of factors which could be characterised as being more 'surface' features of the music, namely melodic
motion and texture, length, tempo, and contour. These similarity judgements seem to be piece- and/or style-specific
(though in order to verify this the empirical study would need to be repeated with a range of other tonal and serial
pieces).
If these differences are indeed style-specific, there may be two possible reasons for this. The first explanation is
that these differences are a result of differing degrees of familiarity with the two styles: tonal music can be assumed
to be the more familiar of the two styles on the basis of likely exposure, and it may be that the structural principles
underlying it are better internalised those for serial music. A second possible explanation is that the differences are
due to 'essential' features of the musical styles: perhaps tonal music is more 'obvious' in its delineation of motivic
and thematic similarity - a possibility which would require further research. One further point to emerge from this
study is that the similarity judgements are also context-sensitive: listeners treated the pieces as setting the
constraints on similarity, rather than using global context-independent features like loudness or pitch height. This is
congruent with observations that categorisation and similarity are context-dependent.
We wish to highlight here, however, one point emerging from the similarity ratings: the polarisation of first
statement material versus elaboration/restatement on one dimension. This polarisation is an exact counter to the
intuitive expectation that a dimension of 'theme' would emerge (i.e. that listeners would judge all statements of the
first theme as similar and maximally different from all statements of the second theme). In fact, listeners appear to
be judging a theme and its restatement as maximally dissimilar on one dimension; no dimension of 'theme' emerges
as such. There are two possible explanations for this. First, listeners may be sensitive to the rhetorical role of
material, hearing the extracts in terms of their characteristics as 'statement' or 'elaboration' / 'restatement'. Although
music theorists recognise the rhetorical character of material (e.g. Agawu, 1991), its pertinence for the listener
remains to be verified. Therefore we consider a second possible explanation: it may be that listeners' similarity
judgements are influenced by the relative salience of extracts. According to theories of salience and the asymmetry
of similarity judgements (e.g. Tversky, 1977), well-formed objects or events are judged both more similar and
more different than less salient material. This would explain the polarisation found on this dimension, where
thematic material (which could be assumed to be well-formed and salient) is judged as being maximally different
from its own elaborations, and where extracts from the development and coda (which might be assumed to be less
salient) are located in the middle of the dimension).
This finding questions the basis for salience and well-formedness in music. Salience, well-formedness, and
thematic function have not previously been found to be important, perhaps because it is harder to pin down and
formalise 'salience' than it is to describe harmonic relations or melodic progressions. Indeed, this study suggests
that further research is needed in order to determine what constitutes salience and well-formedness in music. The
present findings suggest only that they are realised differently in different styles or pieces: for example,
well-formedness in Schoenberg appears to correspond to a hexachordal statement of the row, whereas

ALNDKE1.DOC
well-formedness in Beethoven is an eight-bar phrase with antecedent and consequent, and balanced harmonic
movement.
The implications of this main effect of salience for compositional practice is that material which is well-formed
causes perception of both more similarities and more differences. This is an effective compositional strategy and is
congruent with the notion of 'connected antithesis' as central to the creation of tension and formal structure since it
maximises diversity and unity. If the unity of the piece is assumed, then diversity will be the major outcome of
similarity judgements, and as such, makes the task of empirical research into this area all the more problematic.
This consideration of compositional practice and theory also highlights the importance of considering not only how
far this focus on coherence and unity is a compositional strategy but to what extent it is a received listening
ideology.
Acknowledgements
This research was funded by an Arts and Humanities Research Board small research grant.
References
Agawu, V.K. (1991). Playing With Signs: A Semiotic Interpretation of Classic Music. Princeton, NJ: Princeton
University Press.
Alegant, B. (1996). Unveiling Schoenberg's op.33b. Music Theory Spectrum, 18(2), 143-166.
Barsalou, L.W. (1992). Cognitive Psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Bartlett, J.S. & Dowling, W.J. (1980). Recognition of transposed melodies: a key-distance effect in developmental
perspective. Journal of Experimental Psychology: Human Perception and Performance, 6, 501-515.
Brown, H., Butler, D. & Jones, M.R. (1994). Musical and Temporal Influences on Key Discovery. Music
Perception, 11, 371-407.
Chapin, S.S. (1982). Extracting an Unfamiliar Theme from its Variations. Psychomusicology, 2, 48-50.
Deliège, I. & Mélen, M. (1997). Cue abstraction in the representation of musical form. In: I. Deliège & J.A.
Sloboda (Eds.), Perception and Cognition of Music, Hove: Psychology Press.
Dowling, W.J. (1982). Musical scales and psychophysical scales: Their psychological reality. In: R. Falck & T.
Rice (Eds.), Cross-cultural perspectives on music, Toronto: University of Toronto Press, 20-28.
Eysenck, M.W. & Keane, M.T. (1990). Cognitive Psychology: A Student's Handbook. Hove: Lawrence Erlbaum.
Francès, R. (1958/1988). The Perception of Music. London: Lawrence Erlbaum (translated by W.J. Dowling).
Hampton, J. (1997). Similarity and Categorization. Proceedings of SimCat 1997: An Interdisciplinary Workshop on
Similarity and Categorisation, M. Ramscar, U. Hahn, E. Cambouropolos and H. Pain. (Eds.), Dept. of Artificial
Intelligence, Edinburgh University, 37-41.
Keil, F.C. (1989). Concepts, Kinds and Cognitive Development. Cambridge, MA: MIT Press.
Komatsu, L.K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526.
Medin, D.L., Goldstone, R.L. & Gentner, D. (1993). Respects for similarity. Psychological Review, 100(2),
254-278.
Medin, D. & Ortony, A. (1989). Psychological essentialism. In: S. Vosniadou & A. Ortony (Eds.), Similarity and
Analogical Reasoning, Cambridge: Cambridge University Press.
Medin, D.L. & Schaffer, M.M. (1978). Context theory of classification learning. Psychological Review, 85,
207-238.
Meyer, L.B. (1973). Explaining Music: Essays and Explorations. Berkeley, CA: University of California Press.

ALNDKE1.DOC
Murphy, G.L. & Medin, D.L. (1985). The role of theories in conceptual coherence. Psychological Review, 92,
289-316.
Pollard-Gott, L. (1983). The emergence of thematic concepts in repeated listening to music. Cognitive Psychology,
15, 66-94.
Povel & Essens, 1985
Réti, R. (1951). The thematic process in music. New York: The Macmillan Company.
Rips, L.J. (1989). Similarity, typicality and categorisation. In S. Vosniadou & A. Ortony (Eds.) Similarity and
Analogical Reasoning. Cambridge: Cambridge University Press.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General,
104(3), 192-233.
Rosch, E. & Mervis, C.B. (1975). Family resemblances. Cognitive Psychology, 7, 573-605.
Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M. & Boyes-Braem, P. (1976). Basic objects in natural
categories. Cognitive Psychology, 8, 382-439.
Schoenberg, A. (1967). Fundamentals of musical composition. (G. Strang & L.Stein, Eds.). London: Faber & Faber
Limited.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.
Welker, R.L. (1982). Abstraction of themes from melodic variations. Journal of Experimental Psychology: Human
Perception & Performance, 8, 435-447.
Zbikowski, L.M. (1999). Musical coherence, motive, and categorization. Music Perception, 17(1), 5-42.
Back to index

For young children, practising a musical instrument involves quite unique challenges
Proceedings paper
"I've got to do my scale first!"

A case study of a novice's clarinet practice.
James M. Renwick and Gary E. McPherson
Sydney, Australia
This case study investigated the practice behaviour of a young novice clarinettist, Clarissa (a
pseudonym), who was one of 157 children involved in a larger scale 3-year longitudinal study (see
McPherson & Renwick, 2000). Clarissa and her parents agreed to regularly videotape her home
practice in Year 1 of the study, and again during Year 3. Over this 3-year period, a comprehensive
range of information was gathered in the form of tests of musical achievement, as well as interviews
with the child, her mother, her class teacher and band director.
The practice videotapes were analysed using computer software (Noldus Information Technology,
1995) that allowed for repeated viewing of the tape and detailed coding of behaviours according to
pre-programmed 'channels'. These channels filter behaviours into mutually exclusive categories and
comprised:
1. Content: Repertoire, technical work, informal practice, or not practising.
2. Strategy: Playing, silent fingering, thinking, moving, singing, counting, or no
strategy.
3. Errors: Correcting finger error, correcting sound production error, repeating
correct note, and ignoring error.
4. Run-through: The number of times the subject had played through the piece or
exercise to the end.
5. Self-regulatory behaviours: Avoiding distractions, referring to the time,
referring to a homework diary, giving up (before reaching the end of a task), and
seeking help.
6-8. Mother, father and sibling: Distracting, supporting or absent.
These behaviours were further coded into subcategories. For instance, Not practising was divided into
file:///g|/Sun/Renwick.htm (1 of 10) [18/07/2000 00:28:29]

(a) finding music, (b) daydreaming, (c) talking, (d) fiddling with instrument, (e) expressing
frustration, (f) resting, and (g) being distracted.
Clarissa was 9 years and 7 months old when she joined her school's band program. She chose clarinet
because "It sounded fun", because "My best friend plays clarinet", and also because "The clarinet
teacher looked nice." Before starting clarinet lessons, Clarissa had learnt Suzuki violin for 4 years, but
this was discontinued when she started clarinet. Her intrinsic motivation for learning an instrument
appeared to be something of a family 'story': "I saw someone playing violin on TV when I was about 3
and I asked my Mum if I could learn an instrument."
Clarissa's school academic results show that she was intellectually very capable. Grade 3 literacy and
numeracy tests that are routinely conducted in New South Wales schools ranked her in the top 8% of
her age group. In addition, she scored in the top 20% of her age group for musical aptitude, as
measured by Gordon's (1982) Intermediate Measures of Music Audiation. Despite these
above-average academic and music aptitude results, Clarissa's actual musical achievement was much
closer to the average of our sample of 157 children. Her score on the Watkins-Farnum Performance
Scale (Watkins & Farnum, 1954), a standardised sight-reading test, was 15 in Year 1 of the study,
compared with a mean of 15.2 (SD = 10.9). One year later, her score of 17 had not risen substantially,
compared with the total sample's mean of 24.5 (SD = 13.0). In Year 3, Clarissa's score rose more
sharply to 28, bringing it closer to the sample mean of 29.2 (SD = 14.7).
Previous research with schoolchildren has established the link between musical achievement and
accumulated practice time (Sloboda, J. W. Davidson, Howe, & Moore, 1996). The slow pace of
Clarissa's musical achievement might thus be partly explained by the quantity of practice she
undertook, as regular interviews with her mother in Year 1 provided us with an estimate of only 6.8
minutes of average daily practice, a figure that was slightly lower than the total sample, who averaged
7.3 minutes (SD = 4.7).
We chose Clarissa in this case study because she fell mid-way on our measures of practice time and
musical achievement, and also on many behavioural measures in the sub-sample of subjects whose
videotaped practice we analysed (McPherson & Renwick, 2000). Her videotaped behaviour suggests
that it may not have been simply the small quantity of practice that led to sub-optimal musical
achievement, but also the cognitive and motivational quality of that practice. The purpose of this
study, therefore, was to investigate the cognitive and motivational processes of Clarissa's practice
behaviour in Year 1, and to compare these with results from Year 3. This provided the basis for us to
speculate about possible explanations for the wide variance found in Clarissa's task engagement and
persistence.
Year 1
In the two recorded practice sessions analysed from Year 1, Clarissa was sitting on the couch in her
family's lounge room. Interview data tells us that practice sessions regularly occurred at 7:45 a.m. and
that this was the last activity in Clarissa's routine before leaving for school. In the interview at the end
of Year 1, Clarissa stated that both parents and her younger sister often listened to her practice, and
that they would "stand and watch and clap afterwards." This family support was evident in the
behaviour of both parents and the sister on the videotapes: practice was clearly an activity in which
the whole family was involved.
The role of the parents was mostly restricted to listening and making encouraging comments, although
there were some instances where Clarissa and her mother discussed the material that should be
practised:

Mother: Play the ones you know well first.

Clarissa: I'm supposed to practise the slurs.
Mother: What was the song that the teacher was pleased with on Monday?
Clarissa: I've got to do my scale first!
Interestingly, in both instances, the mother seemed to be encouraging her daughter to start the session
with familiar, well-learnt material, while Clarissa instead proposed taking on a more technical,
problem-solving orientation. The interviews conducted with the mother demonstrate that both parents
were aware of their limitations in providing specific help for Clarissa, as evident in the mother's
assertion: "We don't know anything about clarinet!"
The content of the sessions in Year 1 consisted almost exclusively of familiar, simple tunes played
through once only from two band-oriented method books (Feldstein & O'Reilly, 1988; Herfurth,
1942). In the session recorded in the 4th month of learning, 14 pieces were played over a practice
session of 14.9 minutes (counted from the first to last note played). The session recorded 6 weeks later
lasted only 6.6 minutes-discussion between mother and daughter indicates that they were running out
of time before they had to leave for school. In this session, only four pieces were practised, along with
one scale and one arpeggio. In both sessions combined, 80% of the practice was spent on pieces and
technical work. Of the remaining time, one-third was spent looking for music to play and two-thirds
talking, either to herself or with another family member.
The practising strategy used in these early sessions consisted largely of playing the piece or exercise
through once and correcting some of the errors en route, a finding that is consistent with other studies
of novice practice (Hallam, 1992). We found no evidence of Clarissa using such strategies as counting
aloud, silently fingering, singing, or moving to the music. There was only one instance-5 seconds
long-of Clarissa inspecting the music on her stand as if engaged in some sort of mental rehearsal.
Given the absence of any overt strategies beyond playing the piece through and occasionally
correcting errors, these mistakes were analysed and placed into a number of categories. In the two
sessions in Year 1, there were 17.1 minutes spent playing music for which the score was available to
the researchers. In this time, errors occurred on average 5.5 times per minute. This rate was very close
to the average (5.4/min) for all five subjects whose errors we analysed for Year 1 (McPherson &
Renwick, 2000), although the pattern of Clarissa's error categories was distinct. For instance, only
16% of Clarissa's errors were left uncorrected, compared with the sample average of 43%. Her
'default' error pattern was a sort of musical stutter consisting of the first one or two notes of a piece or
phrase being played and then immediately repeated, even though they were initially played correctly.
With Clarissa, 39% of errors fell into this category, compared with 14% for the whole sample, which
led to a considerable loss of fluency.
The more familiar these tunes were, the more accurate was the rendition, as was evident in a
note-perfect performance of Old Macdonald Had A Farm. Other tunes that Clarissa played in Year 1,
such as the hymn Beneath Thy Guiding Hand, may have been originally included by the author of her
method book (Herfurth, 1942) because of their familiarity, but in contemporary Australia this melody
is unlikely to be recognised by children. Not surprisingly then, Clarissa made 8.1 errors per minute
when practising this hymn.
This large discrepancy between the accuracy of familiar tunes and that of unfamiliar material may
explain why Clarissa stated in her interviews that she preferred to learn an 'easy' piece rather than a
'hard' one. This avoidance of challenge would normally be characterised by motivation theorists (e.g.,

Gottfried, 1985) as demonstrating a lack of intrinsic motivation, but with Clarissa, it seemed that there
was a strong association between intrinsic motivation and the pleasure that she took in playing easier,
familiar melodies. The less familiar, more difficult pieces were played so haltingly and with so many
errors that they seemed to seriously undermine Clarissa's intrinsic desire to practise this type of piece.
As she said in an interview at the end of Year 1: "I don't like learning hard pieces because I find it
annoying." When asked, in the same interview, what she considered the most important thing to do
when practising, Clarissa stated that it was to "play my favourite songs."
One possible explanation for this clear distinction between 'hard' and 'easy' pieces in Clarissa's
practice accuracy and self-reports may be related to the clarity of the goal-state in the problem-solving
activity of decoding the notation (J. E. Davidson & Sternberg, 1998). For familiar children's melodies
such as Old Macdonald, Clarissa seemed quite capable of using her mental representation of the
melody as a template against which to measure her performance. Her success in playing this melody
correctly seemed to give her pleasure. However, with melodies that were not familiar, her attentional
resources seemed to be exhausted by the act of decoding the musical notation while playing, and she
therefore seemed unaware of the majority of the errors that she failed to correct. Interestingly, this
may have been less of a problem when Clarissa was learning violin through the Suzuki method
because recorded models of set pieces are used as a reference point and could have been used by
Clarissa and her parents as a way of checking her accuracy. As her mother stated in an interview:
With Suzuki, I knew every note. I don't feel I know enough about what she should be
doing with clarinet-that's the style of teaching.
It would seem from the child's and mother's comments, as well as our observational data, that the
instructional method that Clarissa was using in Year 1-typical of most of the children in our study-was
prone to inducing cognitive overload (Sweller, Van Merrienboer, & Paas, 1998) because its lack of
sequencing and of a clear goal-state led to excessive demands on working memory. Given that
Clarissa's motivational orientation was most optimally engaged when she played her "favourite
songs", it is not surprising that she did not show the persistence necessary to overcome the cognitive
overload inherent in her approach to new material.
Year 3
The observational and interview data for Year 3 reveal some interesting factors that changed over the
intervening 2 years. One clear change was in the practising environment. The time set for practice
remained at 7:45 a.m., but the venue had changed from the lounge room to the quieter dining room. In
the videotapes, Clarissa appeared alone, and in the interviews she stated that her parents now only
listened to her practice "sometimes".
The content of practice had progressed to more difficult material, but was still dominated by repertoire
for 92% of the time she spent playing. Despite Clarissa's insistence in the Year 3 interview on the
importance of practising technical work, this aspect made up only 8% of her time spent playing on the
videotapes. Time spent not practising was almost exclusively taken up with looking for music to play
(94%) with the remainder (6%) spent resting. This increased resting time may be a factor of the longer
pieces now being practised. Because there were no other people present, no talking was observed.
Clarissa's predominant approach to practising remained playing a piece or exercise through once and
correcting some errors on the way: 96% of playing time occurred within the first run-through of a
piece. This behaviour contrasts with Clarissa's description of her own practising methods in the Year 3
interview:
I normally play the piece all the way through and then come back to the bits that are bad

... I practise one segment at a time.

The play-through approach meant that a large number of pieces were attempted in each session. In
one 20-minute session, three scales, five solo pieces, and two band parts were practised.
There was some evidence of new strategies emerging in Year 3. In the interview, Clarissa claimed that
she sometimes fingered a piece through, and that she would silently inspect the notation or try to sing
it through in her mind before playing it. This claim is borne out by the observational data: Clarissa
now spent 3% of her practice silently inspecting the notation, 1% singing the music aloud, and 1%
silently fingering.
The nature of errors also shows an interesting pattern of change between Years 1 and 3. Their
frequency rose from 5.5 per minute to 7.7 per minute and the early 'stutter' pattern of repeating one or
two correct notes had declined from 39% of errors to 9%. The prevalence in Year 3 of notes in the
'overblown' register of the clarinet led to a large increase in sound production errors (from 2% to
14%). In Year 1, Clarissa was more likely to repeat only one or two notes after an error than to repeat
more than two (23% and 13% of total errors respectively). However, in Year 3, this pattern was
reversed (17% and 23%). This tendency to repeat larger chunks of music as skill increases has also
been found by previous researchers (Gruson, 1988). A trend that may be less adaptive was the
increase in ignored errors, which constituted 30% of errors in Year 3, compared with 16% in Year 1.
The nature of these ignored errors leads us to speculate that they were more likely to have been
ignored because Clarissa had not noticed them than because she was deliberately adopting a policy of
continuing in order to achieve fluency. This interpretation can be supported by the negative rate of
improvement on the second run-through of material. Importantly, errors were 11% more common on
the second run-through than the first, perhaps because Clarissa was unable to store the location of her
errors in working memory in order to avoid them on the second attempt.
From La Cinquantaine to Golden Wedding

One particular practice session in Year 3 provided a portion of data that was surprisingly atypical.
This session contained work on five scales, two band parts, and three classical solos (Strauss' Blue
Danube, a Serenade by Haydn, and Gabriel-Marié's La Cinquantaine). There was also an extended
period of time in which Clarissa practised Woody Herman's Golden Wedding, a 'jazzed-up' version of
La Cinquantaine with added blues ornaments and swing rhythm.
What proved so surprising was the degree of persistence that Clarissa demonstrates when she
practised Golden Wedding, in comparison with her three classical pieces. With the classical repertoire,
Clarissa spent on average 0.9 seconds practising per note in the score. With Golden Wedding, this
increased to 9.8 seconds per note: an 11-fold increase.
In addition to this remarkable difference in time spent on the pieces, we observed some large
differences on several other measures, as outlined in Table 1. In terms of strategies, Clarissa practised
the classical pieces almost exclusively with her 'default' play-through approach, but with Golden
Wedding, there was a marked increase in silent fingering, silent thinking, and singing. In terms of
run-throughs, Golden Wedding was the only example on videotape of Clarissa playing a piece through
more than once: in this practice session she had an extended period of work on the piece (4.3 min)
before turning to another piece. She then returned to Golden Wedding for 1.6 minutes for an
additional two run-throughs.

Comparison of the pattern of errors (Table 1) reveals additional large differences. With Golden
Wedding, Clarissa was more likely to repeat sections longer than two notes, and less likely to repeat
only one or two correct notes than in the three classical pieces. In other words, in this one piece, she
demonstrated behaviour that Gruson (1988) found to be associated with expertise level. It was also
only in Golden Wedding that Clarissa showed any signs of deliberately altering her tempo when
repeating sections-another important component of the approach of experts (Miklaszewski, 1989;
Nielsen, 1999). The appearance of this strategy in Clarissa's otherwise novice behaviour was
unexpected.
These large differences in practice behaviour between pieces led us to investigate possible
explanations. When Clarissa was interviewed she explained that her desire to learn Golden Wedding,
unlike other pieces, was strongly motivated by her intrinsic interest in the piece. Apparently, in one of
her instrumental lessons, Clarissa's teacher had mentioned that he played a 'jazzy' version of La
Cinquantaine in his big band, and he demonstrated so that she could hear the transformation. Strongly
motivated by her desire to play in a jazz style, Clarissa asked her teacher to notate the theme of
Golden Wedding, so that she could practise it at home. Thus, rather than the task being chosen by the
teacher, as is the usual practice in most lessons, Golden Wedding was chosen by the student.
The notated version of Golden Wedding, hastily sketched out by the teacher, appears on the videotape

to be acting as only a rough prompt for which notes to play. The aural memory of the teacher's
performance was possibly a more vivid prompt. There is a phrase where the melody climbs to notes in
the clarinet's range that Clarissa does not know well, and she uses a trial-and-error approach to find
these notes, by reference to her mental representation. Thus, it would appear that Clarissa was able to
return in Golden Wedding to the pleasurable activity she reported in Year 1 of playing her "favourite
songs" by ear, at the same time as she demonstrated highly atypical task engagement.
Related literature and discussion of findings
Our subject's motivational pattern in Year 3, as she approached adolescence and the transition to high
school, appears from the interview data to be highly multifaceted. For instance, what might seem a
noticeable increase in autonomy, as reflected in an increasing desire to practise alone, is qualified by
Clarissa's remarks about her practice being contingent on extrinsic rewards (i.e., pocket money).
When asked to respond to 14 Likert scale items which provided possible reasons for practising her
instrument, the strongest response ("very true of me") was on the scale "Because that's what I'm
supposed to do". Enjoyment-related reasons were scored "not very true of me". Thus, it would appear
that Clarissa's general motivational orientation to school activities was that they were part of her
larger set of obligations:
Playing my clarinet is part of my morning routine, which is part of my job list and I get
paid my pocket money if I do everything on the job list.
Nevertheless, recent research (Pintrich & Schrauben, 1992) has suggested that high levels of extrinsic
motivation can occur together with high levels of intrinsic motivation. Several comments made by
Clarissa in her Year 3 interview revealed a sense of achievement motivation. When asked what was
the most exciting thing that had happened to her musically, she answered:
When I graduated from Book 1 Suzuki when I played the violin.
However, she now seemed unsure of her own achievement level on the clarinet, making this a less
potent motivator:
I don't know if I am going well on the clarinet. Not many people have made any
comments on my playing, so I am not sure.
By Year 3, Clarissa seemed to have changed her attitude towards 'hard' pieces. Asked if she liked to
learn them, she replied: "Yes. It makes the pieces a challenge."
Existing research demonstrates that primary-school children can differentiate levels of intrinsic
motivation for different school subjects, and that a general motivational orientation can also be found
for each individual that is less domain-specific (Gottfried, 1985). For instance, while Clarissa stated in
Year 3 that practising was more boring than fun, she also said: "I like to practise most of the things
my teacher gives me," which shows that she was able to distinguish between her intrinsic interest in
the repertoire and her dislike for the process of learning it. She was also able to distinguish between
tasks that have extrinsic utility (Eccles, Wigfield, & Schiefele, 1998), but are not inherently enjoyable.
Asked what were the bad things about learning clarinet, she replied: "You have to keep on playing
your scales, but you need them to play songs."
Recently, there has been a resurgence of research in the area of interest in learning (Krapp, Hidi, &
Renninger, 1992; Schiefele, 1991). This field investigates the domain-specific aspects of intrinsic
motivation, often by observing the effects of differential levels of interest on individuals' processing
of text. Clarissa's observed practice and comments on Golden Wedding can be explained according to

this literature in terms of actualised interest, which Schiefele (1991) defines as content-specific
intrinsic motivation. Asked what were the good things about learning the clarinet, Clarissa replied:
"You get to play fun, jazzy songs." Interest has been found to enhance the subjective quality of the
learning experience and also to influence the quality of learning results, with high-interest subjects
engaging in more intensive and meaning-oriented processing of text (Schiefele, 1991). Interest
(Schiefele, 1991) and task value (McPherson & McCormick, 1998; Pintrich & De Groot, 1990) have
been found to be associated with the use of higher-order learning strategies, as we observed in
Clarissa's practice of Golden Wedding. The notion of task-specificity that is inherent in research on
interest can be observed in the large differences between Clarissa practising her standard repertoire
and her self-selected jazz piece. Similarly, research in expectancy-value motivation is beginning to
clarify these processes in music. For example, O'Neill (1999) found that student musicians'
perceptions of the importance of a musical task predicted the amount of practice they undertook,
while McPherson (2000) found that estimates children made before lessons commenced of how long
they intended learning their instrument predicted their practice time and musical achievement 9
months later. The present observational case study extends and supports these empirical findings.
Conclusions
The results of this case study suggest that with strong enough motivation, even quite young music
learners can engage in the types of self-regulatory behaviour that will enhance their achievement. The
effect that was found in Clarissa's practice of Golden Wedding of intrinsic interest leading to
considerably greater persistence, strategy use, and monitoring of accuracy seems intuitively obvious.
Nevertheless, the majority of instrumental teachers-at least in the English-speaking world-continue to
choose most of the repertoire played by their students, and to base their lessons around a
teacher-directed model where the prime focus of attention is 'learning the notes' (Reid, 1997). We
speculate therefore, that encouraging students to practise repertoire which they select themselves and
find personally interesting, can lead to an increase in cognitive engagement and more efficient
learning. Consequently, a detailed and systematic analysis of data collected in the 3-year longitudinal
study will continue as one means of clarifying this important issue.
Note
This research has been supported by a large Australian Research Council grant (No. A79700682),
awarded for a 3-year study in 1996.
References
Davidson, J. E., & Sternberg, R. J. (1998). Smart problem solving: How metacognition
helps. In D. J. Hacker, J. Dunlovsky, & A. C. Graesser (Eds.), Metacognition in
educational theory and practice (pp. 47-68). Mahwah, NJ: Erlbaum.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In W. Damon
(Series Ed.) & N. Eisenberg (Vol. Ed.), Handbook of child psychology: Vol. 4. Social,
emotional, and personality development (5th ed., pp. 1017-1095). New York: Wiley.
Feldstein, S., & O'Reilly, J. (1988). Yamaha band student: A band method for group or
individual instruction (Book 1). Van Nuys, CA: Alfred Publishing Corporation.
Gordon, E. E. (1982). Intermediate measures of music audiation. Chicago: G. I. A.
Publications.
Gottfried, A. E. (1985). Academic intrinsic motivation in elementary and junior high

school students. Journal of Educational Psychology, 77(6), 631-645.

Gruson, L. M. (1988). Rehearsal skill and musical competence: Does practice make
perfect? In J. A. Sloboda (Ed.), Generative processes in music: The psychology of
performance, improvisation, and composition (pp. 91-112). Oxford: Clarendon Press.
Hallam, S. (1992). Approaches to learning and performance of expert and novice
musicians. Unpublished doctoral thesis, University of London.
Herfurth, C. P. (1942). A tune a day: A first book for clarinet instruction (Book 1).
London: Chappell.
Krapp, A., Hidi, S., & Renninger, K. A. (1992). Interest, learning and development. In K.
A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and
development (pp. 3-25). Hillsdale, NJ: Erlbaum.
McPherson, G. E. (2000, July). Commitment and practice: Key ingredients for
achievement during the early stages of learning a musical instrument. Paper presented at
the XXIV International Society for Music Education Research Commission, Salt Lake
City, USA.
McPherson, G. E., & McCormick, J. (1998). Motivational and self-regulated learning
components of musical practice. In T. Murao (Ed.), Proceedings of the 17th Research
Seminar of the International Society for Music Education (pp. 121-130). Johannesburg:
School of Music, University of Witwatersrand, Johannesburg.
McPherson, G. E., & Renwick, J. M. (2000). Self-regulation and musical practice: A
longitudinal study. In J. A. Sloboda & S. A. O'Neill (Eds.), Proceedings of the Sixth
International Conference on Music Perception and Cognition. Keele, England: European
Society for the Cognitive Sciences of Music.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance.
Nielsen, S. G. (1999). Regulation of learning strategies during practice: A case study of a
single church organ student preparing a particular work for a concert performance.
Psychology of Music, 27(2), 218-229.
Noldus Information Technology. (1995). The Observer, base package for Windows:
Reference manual, version 3.0 edition. Wageningen, The Netherlands.
O'Neill, S. A. (1999). The role of motivation in the practice and achievement of young
musicians. In S. W. Yi (Ed.), Music, mind and science. Seoul: Seoul University Press.
Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning
components of classroom academic performance. Journal of Educational Psychology,
82(1), 33-40.
Pintrich, P. R., & Schrauben, B. (1992). Students' motivational beliefs and their cognitive
engagement in classroom academic tasks. In D. H. Schunk & J. L. Meece (Eds.), Student
perceptions in the classroom (pp. 149-183). Hillsdale, NJ: Erlbaum.
Reid, A. (1997, June). The meaning of music and the understanding of teaching and
learning in the instrumental lesson. Paper presented at the Third Triennial ESCOM

Conference, Uppsala, Sweden.

Schiefele, U. (1991). Interest, learning, and motivation. Educational Psychologist, 26,
299-323.
Sloboda, J. A., Davidson, J. W., Howe, M. J. A., & Moore, D. G. (1996). The role of
practice in the development of performing musicians. British Journal of Psychology,
87(4), 287-309.
Sweller, J., Van Merrienboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture
and instructional design. Educational Psychology Review, 10(3), 251-296.
Watkins, J. G., & Farnum, S. E. (1954). The Watkins-Farnum Performance Scale: A
standardized achievement test for all band instruments. Winona, MN: Hal Leonard.
Back to index

MOTIVATIONAL BEHAVIOUR AND CHILDREN'S MUSICAL PERFORMANCE ACHIEVEMENT
Proceedings paper
TIME ESTIMATION AND TIME RELATED FEATURES OF AUDITORY ENVIRONMENT

IN MUSIC LISTENING
Rosalia Di Matteo, Marta Olivetti Belardinelli,
Department of Psychology, University of Rome "La Sapienza"
ECONA (Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial
Systems)
Introduction
Time evaluation of short temporal intervals represents an interesting topic in the research about
human cognition because the basic mechanisms implied are still not clearly understood. Several
models attribute the ability to evaluate time to the processing of an internal clock and revealed a
correlation between accuracy of time evaluation and the degree of attention devoted to time (Thomas
and Weaver, 1975). Other models, derived by the influential hypothesis proposed by Ornstein (1969),
consider time evaluation a by-product of the cognitive processing, and regard the memory coding as
the main determinant of time evaluation accuracy. In these years, however, a lot of experimental
works showed a more complex picture. Neurophysiological studies (Rammsayer, 1994; Macar &
Casini, 1998) revealed a possible dissociation between mechanisms underlying the processing of very
short temporal intervals (in the order of few milliseconds) and long ones (from few seconds to few
minutes). Already in 1984, Fraisse (1984) draw a distinction between time perception and time
evaluation, as time perception is tied to psychological present, whereas time evaluation involves
memory. As Block (1990) pointed out time evaluation behavior depends on the interaction of several
factors such as subject's individual characteristics, internal attributes of the time period, cognitive
activities carried out during the time period and other time related behaviors imposed by
environmental or experimental requests. Moreover Zakay (1990) showed the involvement of different
cognitive processes (STM or LTM) by using different methods of assessing time evaluation. In
particular using a prospective paradigm (when subjects know in advance the temporal request) we
have to do with an experienced time, whereas using a retrospective paradigm (when the subjects
ignore the task) we have to do with a remembered time.
Previous works showed that, in prospective condition, duration estimates are longer when they are
requested after a short time interval occupied with interfering cognitive tasks than when they are
requested immediately (Bueno, 1994; Vitulli & Shepard, 1996, 1998). The interfering task causes
subjects to rebuild the duration from memory. We should expect the opposite pattern (shorter
estimates) by using an empty time interval in a remote estimation paradigm. In this case the delay
reduces the availability of temporal information and the time estimates should reflect the attentional
resources required by event processing (Kahneman, 1973). Events requiring complex processing were
evaluated as shorter than events requiring simple processing (Craik and Lockhart, 1972).
file:///g|/Sun/DiMatteo.htm (1 of 4) [18/07/2000 00:28:30]

Both Poynter (1983, 1989) and Jones and Boltz (1989), although starting from two different points of
view, outlined the importance of the segmentation of the flow of events in time evaluation. According
to Jones and Boltz some natural events are characterized by a high degree of structural coherence and
tend to elicit an automatic mode of attending because their temporal course is more predictable. On
the other hand events characterized by a low degree of coherence require a controlled mode of
attending and an active segmentation of the flow of stimulation by means of counting and grouping
strategies.
Some studies report that, in retrospective condition, overestimation of duration increases as events
coherence decreases (Boltz, 1995). Other studies show that, in prospective condition, irregular events
are more underestimated than regular events (Macar, 1996). Event structure depend on the interplay
between two different level of organization: a vertical organization corresponding to the periodical
grouping of high-level time units; and an horizontal organization corresponding to the serial
developments of low-level time units. We suppose that both vertical and horizontal organization
contribute to modulate attentional factors and memory representation in time evaluation. In this study
an attempt was made to verify this hypothesis.
Method
Twenty-four students of a first year course of Psychology participated. Subjects were requested to
reproduce the duration of structured auditory events in a prospective paradigm condition.
The auditory events derived by some musical fragments composed according to two different criteria:
Tonal (TO) or Non-Tonal (NT) composition, corresponding to the periodic organization; and Salient
(SA) or Non-Salient (NS) characterization, corresponding to the serial organization. These musical
fragments were edited so that only temporal relations between consecutive pulse were conserved.
Three Analyses of Variance carried out on the effective durations (mean =7.44 sec.; d.s=1.26), on the
number of beats (mean=4.87 beats; d.s.=0.81) and on the number of pulses (mean=22.25 pulses;
d.s.=5.14) did not show any significant difference among the different classes.
The role of structural factors in time reproduction was evaluated in relation to the effect of delayed
estimation condition. In the immediate condition the reproduction started shortly after the end of the
sequence. In the delayed condition a twenty-second empty time interval was introduced between the
end of the sequence and the beginning of the reproduction.
A within factors 2x2x2 experimental design was used. The repeated measure factors were the delay
condition (immediate or delayed condition), the periodic organization (tonal or atonal sequences), and
the serial organization (salient or non-salient sequences) of events structure. The dependent variable
was the Directional Error calculated as the ratio between the reproduced duration and the effective
duration of each auditory sequence. Music education, experienced difficulty, interest for the task and
attention needed to complete the task were assessed by means of a self-evaluation scale.
Result and Discussion
Results show that both serial and periodic organization affect time estimation. The serial organization
contributes to the overall accuracy of estimates independently of the delay of the reproduction. In fact
time reproductions of Non-Salient sequences are less accurate and shorter than time reproductions of
Salient sequences in both immediate and delayed condition.
On the other hand the periodic organization, also, enhances the processing of events. Time
reproductions of Non-Tonal sequences are less accurate and shorter in the delayed condition than in
the immediate condition.

This pattern of results suggest that the serial organization (Salient vs. Non-Salient) modulate
attentional factors, whereas the periodic organization (Tonal vs. Non-Tonal) affect the memory
representation of auditory event processing in time estimation.
References
Block, R.A. (1990). Models of psychological time. In R.A. Block (Ed.), Cognitive models of
psychological time (pp. 1-36). Hillsdale: Lawrence Erlbaum Associates.
Boltz, M.G. (1995). Effects of event structure on retrospective duration judgments. Perception
& Psychophysics, 57(7), 1080-1096.
Bueno, M.B. (1994). The role of cognitive changes in immediate and remote prospective time
estimations. Acta Psychologica, 85(2), 99-121.
Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: A framework for memory
research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684.
Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35, 1-36.
Jones, M.R., & Boltz, M.G. (1989). Dynamic attending and responses to time. Psychological
Review, 96(3), 459-491.
Kahneman, D. (1973). Attention and effort. Prentice-Hall, Englewood Cliffs, NJ.
Macar, F. (1996). Temporal judgments on intervals containing stimuli of varying quantity,
complexity, and periodicity. Acta Psychologica, 92(3), 297-308.
Macar, F., & Casini, L. (1998). Brain correlates of time processing. In V. De Keyser, G.
d'Ydewalle & A. Vandierendonck (Eds.), Time and the dynamic control of behavior (pp.
71-82). Seattle: Hogrefe & Huber Publishers.
Michon, J.A. (1990). Implicit and explicit representations of time. In R.A. Block (Ed.),
Cognitive models of psychological time (pp. 37-58). Hillsdale: Lawrence Erlbaum Associates.
Ornstein, R.E. (1969). On the experience of time. Penguin Books, Harmondsworth
Poynter, W.D. (1983). Duration judgement and the segmentation of experience. Memory &
Cognition, 11, 77-82.
Poynter, W.D. (1989). Judging the duration of time intervals: A process of remembering
segments of experience. In I Levin & D. Zakay (Eds.). Time and human cognition A life-span
perspective (pp. 305-321). Elsevier, Amsterdam.
Rammsayer, T.H. (1994). A cognitive-neuroscience approach for elucidation of mechanisms
underlying temporal information processing. International Journal of Neuroscience, 77(1-2),
61-76.
Thomas, E.A.C., & Weaver, W.B. (1975). Cognitive processing and time perception.
Perception and Psychophysics, 17, 363-367.
Vitulli, W.F., & Crimmins, K.A. (1998). Immediate versus remote judgements: delay of
response and rate of stimulus presentation in time estimation. Perceptual & Motor Skills, 86(1),
19-22.

Vitulli, W.F., & Shepard, H.A. (1996). Time estimation: effects of cognitive task, presentation
rate, and delay. Perceptual & Motor Skills, 83(3), 1387-1394.
Zakay, D. (1990). The evasive art of subjective time measurement: some methodological
dilemmas. In R.A. Block (Ed), Cognitive models of psychological time (pp. 59-84). Hillsdale:
Lawrence Erlbaum Associates.
Back to index

ATTENTIONAL RESOURCE ALLOCATION IN MUSICAL ENSEMBLE PERFORMANCE
Proceedings paper
Attentional Resource Allocation in Musical Ensemble Performance

Peter E. Keller
MARCS, University of Western Sydney
Why attending in ensembles is important
Much of the world's music is performed by ensembles consisting of two or more individuals. Usually
these individuals aim to interact during performance in a manner that is conducive to producing a
coherent and cohesive musical entity. Clearly, to realise this goal, each performer must
simultaneously listen to their own part and parts played by others. This process, which has been
termed prioritised integrative attending (Keller, 1999), essentially involves dividing attention between
a high priority part (one's own part) and the overall aggregate structure that emerges when all parts
(including one's own) are combined.
Prioritised integrative attending is considered as somewhat of an ideal by musicians in a wide variety
of ensemble settings and cultures. In African drum music, the performer "learns the whole
simultaneously with the parts" (Pantaleoni & Serwadda, 1968, p. 52) because "Each player must have
a general awareness of the resultant [i.e., the aggregate structure], as well as the knack of coming in at
the right moment" (Nketia, 1962, p. 50). Similar principles characterise Western chamber music:
"With alert sensitivity to what your part is saying, to what each of the other voices is doing, to what
the leader of the moment is inviting you to match, you will lift your performance to a higher level, and
in doing so fulfil the incredible potential the string quartet has of transcending the sum of its parts"
(Fink & Merriell, 1985, p. 15).
Surprisingly, despite the pervasiveness and apparent cultural significance of ensembles, theoretical
knowledge is lacking about the factors that influence prioritised integrative attending, and the
mechanisms that underlie it. Identifying the factors that affect ability to engage in prioritised
integrative attending should be particularly useful for developing systematic music educational
techniques aimed at fostering ensemble skills. Such techniques are needed because, although existing
techniques are based upon expert musical intuition, evidence of their success is only anecdotal (e.g.,
Casey, 1991). At a more general level, gaining an understanding of the cognitive mechanisms
underlying prioritised integrative attending should benefit theories of attention and multiple task
behaviour, which have neglected to address this mode of attention. Indeed, ensemble performance can
be viewed as a multiple task to the extent that it requires the performer to allocate attentional
resources skilfully and flexibly between different sound sources. The theory of Attentional Resource
Allocation in Musical Ensemble Performance (ARAMEP) outlined here is intended to serve as a
heuristic for studying the dynamics of prioritised integrative attending.
A theoretical approach to Attentional Resource Allocation in Musical Ensemble Performance
(ARAMEP)
file:///g|/Sun/Keller.htm (1 of 13) [18/07/2000 00:28:33]

Resource components of prioritised integrative attending

Broadly speaking, resources are the sensory, perceptual, cognitive, and motor processes that underlie
human behaviour. Modern theories of multiple task behaviour claim that resources are arranged in
multiple pools that are differentiated in terms of the types of activity they support and the stages of
processing during which they are accessed (Smyth, Morris, Levy, & Ellis, 1987; Wickens, 1980,
1991). Multiple task theories also typically hold that resource capacity within each pool is limited
(Kahneman, 1973; Wickens, 1980, 1984). It follows that multiple task performance should proceed
relatively unimpaired when component tasks each tap exclusive pools of resources (Allport, Antonis,
& Reynolds, 1972; Wickens, 1984), but, if the component tasks rely upon common resources, then
they may interfere with one another (Neumann, 1996; Wickens, 1989, 1991).
Accordingly, in dual tasks (i.e., two concurrent activities), a difficulty-performance trade-off is often
observed wherein performance on the secondary task suffers as the difficulty of the primary task is
increased (see Wickens, 1980). The degree of interference may also depend upon the compatibility of
the two tasks (Damos, 1991; Wickens, 1989, 1991). Tasks are considered to be compatible if "some
dimension or aspect of one stimulus can be used to predict a dimension or aspect of the second
stimulus" (Damos, 1991, p. 105). Compatible tasks produce minimal interference because "a common
mental set, processing routine, or timing mechanism can be activated in service of the two tasks"
(Wickens, 1991, p. 23).
In ARAMEP, prioritised integrative attending is characterised as a dual task in which one component
involves paying attention to one's own part, and the other component involves paying attention to the
aggregate structure. Processing one's own part can be considered to be the primary task, in which case
processing the aggregate structure is the secondary task. Insight is gained into the resource demands
associated with prioritised integrative attending by reducing each of these components to several
constituent sub-skills.
The primary task of processing one's own part typically involves (a) retrieving from memory
information that is relevant to performing the part, (b) tracking, or monitoring the sound that one is
producing, and (c) forming a memory representation based upon the current production. Several
researchers have discussed aspects of these sub-skills (Gabrielsson, 1999; Palmer, 1997; Sloboda,
1982, 1985). The retrieval processes that characterise the performance of familiar music include the
recall of performance goals and performance plans. Performance goals, which are established during
the preparation of a musical piece for performance, reside in memory as idealised mental
representations of the patterns constituting the piece (Palmer, 1997; Rideout, 1992). Performance
plans that serve as strategies for transforming representations of goal patterns into sound are also
acquired during preparation (Drake & Palmer, 2000; Gabrielsson, 1999; Sloboda, 1982). These are
used to direct motor processes involved in pattern production (Palmer, 1997; Shaffer, 1985). Even
though performance goals and plans are not directly relevant in all performance contexts, other types
of information may be retrieved. For example, when sight-reading, pattern recognition processes are
employed to identify familiar melodic or rhythmic figures, and thus facilitate the priming of
appropriate motor control programs (Gruson, 1988). Irrespective of the performance context, tracking
one's own part during production relies upon feedback received through auditory, visual, and
proprioceptive (tactile and kinaesthetic) channels (see Gabrielsson, 1999, Pressing, 1998, 1999). In
line with the proceduralist position on memory (Crowder, 1993), it is assumed here that representation
formation is a natural consequence of the processing that takes place during tracking.
The secondary task in prioritised integrative attending - i.e., processing the aggregate aspect of the
ensemble texture - involves (a) tracking other parts, (b) grouping together elements belonging to one's
own and other parts, (c) forming an internal representation of the aggregate structure, and (d) retrieval

of information about the aggregate structure. Therefore, the resource demands of aggregate pattern
processing differ from those associated with attending to one's own part mainly in terms of the
grouping process that combines elements from different parts. This process, which may be termed
'between-part grouping', requires considerable attentional flexibility as it involves scanning between
parts continuously, and in real time, to determine their interrelationship.
It is assumed here that, apart from between-part grouping, attending to one's own part, on one hand,
and the aggregate structure, on the other hand, are subserved by common processes. Accordingly, the
tracking and representation formation processes associated with the perception of aggregate structures
operate similarly to those employed in relation to one's own part (except that tracking other parts
occurs only via the auditory channel). Likewise, retrieval demands arise if the performer recalls
performance goals relating to the aggregate structure. The importance of such performance goals has
been demonstrated by Goodman (1998, 2000), who investigated how 'model' conceptions of aggregate
structures develop and influence interactions between individual performers in ensembles.
Clearly, due to the substantial overlap between resources involved in attending to one's own part and
the aggregate structure, these two components of prioritised integrative attending should be
susceptible to mutual interference. Two primary sources of interference are identified in ARAMEP:
(a) the degree to which tracking one's own part disrupts the tracking of other parts, and (b) the
disruption of between-part grouping caused by the structural complexity of the interrelationship
between parts. Following from principles in multiple task theory, interference to tracking is related
mainly to the difficulty of one's own part, whereas interference to between-part grouping is largely a
function of the compatibility of one's own part and the aggregate structure. The factors that influence
these considerations are discussed next.
Factors that influence prioritised integrative attending
The difficulty of one's own part is a subjective consideration insofar as it is the product of attentional
and motor constraints that operate within the individual performer. Thus, difficulty is prone to be
affected by general, relatively extramusical factors (i.e., qualities of the performer rather than the
music) that are related to both attention and motor control, such as anxiety, arousal, mastery of
instrumental technique, familiarity with the music in question, and other factors relating to musical
expertise (see Sloboda, 1996). However, there are also some more objective determinants of
difficulty. These include several musical factors that have been found to affect attentional processes
under a wide variety of circumstances, e.g., rhythmic complexity (e.g., Bharucha & Pryor, 1986;
Essens, 1995; Handel, 1973; Handel, 1992; Jones & Boltz, 1989; Klein & Jones, 1996; Povel &
Essens, 1985), pitch-related factors such as the size of the interval between adjacent tones and
melodic contour (e.g., Dowling, 1978; Dowling & Bartlett, 1981; Dowling, Kwak, & Andrews, 1995),
and tonality and harmonic context (e.g., Dawe, Platt, & Racine, 1993; Holleran, Jones, & Butler,
1995; Krumhansl, Sandell, & Sargeant, 1987; Palmer & Krumhansl, 1987; Schmuckler, 1989; Smith
& Cuddy, 1989). It is assumed in ARAMEP that these factors disrupt the tracking of other parts to the
extent that they augment the demands associated with producing and monitoring one's own part.
It is also assumed in ARAMEP that the compatibility of one's own part and other parts may affect
both tracking and between-part grouping, but only the latter is of primary concern here. In multiple
task theory, two broad varieties of compatibility have been identified: spatial and temporal (Wickens,
1989, 1991). Both spatial and temporal compatibility are relevant in music, where their analogues are
pitch and rhythm, respectively (see Jones, 1981, 1992).
The compatibility of the relationship between parts in terms of pitch is dependent upon factors such a
tonality and pitch range. In particular, between-part grouping process should be more difficult when

parts are in different keys (as in polytonality, see Krumhansl, 1990; Thompson & Mor, 1992) or when
there is large pitch separation between parts. The latter is exemplified in auditory streaming
demonstrations where (at certain presentation rates) a sequence of alternating high- and low-pitch
tones is perceived as a single stream when pitch separation is narrow, whereas the sequence
segregates into a stream of high tones and a stream of low tones when pitch separation is wide (see
Bregman, 1990; Jones & Yee, 1993).
Temporal compatibility in ensemble music is a matter of the rhythmic complexity, i.e., coherence of
the temporal relationship, between parts comprising the multipart texture. Lack of coherence has been
found to interfere with between-part grouping in studies of auditory streaming (Jones, Kidd, &
Wetzel, 1981), and related work has been done with polyrhythm (Handel, 1984, 1989; Pressing,
Summers, & Magill, 1996; Summers, Rosenbaum, Burns, & Ford, 1993). Rhythmic complexity, in
general, can be defined according to how a pattern's structure fits within a metric framework, i.e., a
cognitive/motor schema consisting of hierarchically nested pulsations generated in the performer. In
situations requiring prioritised integrative attending, multipart rhythmic complexity concerns the
degree to which one's own part and the aggregate structure can be accommodated by the same metric
framework.
Combined effects of spatial and temporal compatibility arise within textural factors such as density,
i.e., the number of parts in the ensemble, and how differentiated they are in terms of rhythm (even if
low in complexity), tessitura, and timbre. Between-part grouping should generally deteriorate with
increases in the number of parts and differences in rhythm, tessitura, and timbre.
The findings of a survey-based study conducted by Keller (2000b) with practising ensemble
musicians are consistent with the above ideas about difficulty and compatibility. Musicians were
asked to rate how influential upon their ability to engage in prioritised integrative attending are
various extramusical factors (anxiety, arousal, and technical mastery) and musical factors (complexity
relating to rhythm, texture, and several pitch-related factors). A particular order of importance
emerged, with extramusical factors generally being rated as more influential than musical factors.
Within the musical factors, rhythm and texture were considered to be most influential, followed by
tonality, melodic contour, and pitch-interval size. In response to open-ended questions about
situations in which prioritised integrative attending is compromised, musicians listed several
additional influential factors (e.g., an uncomfortable performance environment and poor balance in
terms of loudness between parts). The most noteworthy outcome of this study is that the rhythmic
complexity of one's own part and the rhythmic complexity of the relationship between parts were
claimed to be particularly influential. This finding implies that metric frameworks may have a special
role in prioritised integrative attending. Specifically, metric framework generation may facilitate the
processing and representational efficiency, and thereby the attentional flexibility, required in
ensemble performance (Keller, 1999).
A weakness of the above approach is that it relies solely upon introspective self-reports by musicians.
Nevertheless, the proposed link between metric frameworks and prioritised integrative attending is
supported by some recent, more rigorous experimental research (see Keller, 1999). The experiments
employed novel dual task paradigms that were intended to simulate the attentional demands arising in
ensemble performance. Both recognition memory and reproduction-based tasks were used. In all
tasks, participants were presented with multipart patterns, and required to attend simultaneously to a
particular 'target' part and the aggregate structure. The target parts comprised either metrical patterns
(i.e., patterns that fit a metric framework) or nonmetrical patterns (i.e., patterns that do not fit a metric
framework), and the aggregate structures were always metrical. Across experiments it was found that
processing the aggregate structures encountered greater interference when target parts were

nonmetrical than when they were metrical. Thus, attentional flexibility was enhanced when
participants were able to use a common metric framework for both components of prioritised
integrative attending.
Mechanisms underlying prioritised integrative attending
In ARAMEP, it is claimed that metric framework generation provides an attentional scheme that
guides the tracking, between-part grouping, representation formation, and retrieval processes that
were identified earlier as sub-skills of prioritised integrative attending. The notion of meter as an
attentional scheme is a central concern in Mari Riess Jones' dynamic attending approach to rhythmic
behaviour (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). It is proposed in the dynamic
attending approach that attentional activity fluctuates lawfully in response to the structure of musical
patterns. Specifically, attentional energy surges towards temporal locations at which events are
expected to occur. In metrical patterns, due to underlying periodicities, events are statistically more
likely to occur at strong metric locations (such as the beginning of bars and beats) than at weak
locations (Palmer & Krumhansl, 1990). Therefore, when processing metrical patterns, expectancies
occur periodically and the attender's 'biological rhythms' synchronise with the pattern's structure.
Consequently, the attender experiences periodic fluctuations in attentional activity that mirror metric
structure. In other words, there is a greater attentional activity at strong metric locations than at weak
locations. Gjerdingen (1989, p. 78) has suggested that it might even appear as if the attender is "
'paying more attention' to events occurring on strong beats".
Although the concept of metric fluctuations in the activity of attentional resources is generally useful
when accounting for efficiency and flexibility in processing, it is not sufficient for explaining
prioritised integrative attending. If it is assumed that resources are limited in the sense that there is a
level ceiling on resource capacity, then an increase in attentional activity at strong metric locations
would bring resource consumption closest to full capacity at the corresponding points in time. This
logically might lead to greater scarcity of the resources necessary for processes such as tracking other
parts and between-part grouping at strong, relative to weak, metric locations. This proposition is
counterintuitive, as it implies that it should be more difficult for ensemble performers to engage in
prioritised integrative attending at these specific locations. Although this has not been investigated
directly, research on asynchrony between ensemble members (e.g., Rasch, 1988; Shaffer, 1984)
appears to indicate that this is not necessarily the case (see Keller, 2000a).
Therefore, to accommodate prioritised integrative attending, ARAMEP incorporates a two-factor
account of attentional resource allocation that specifies how variations in resource availability
compensate for fluctuating resource activity. In this account, resource availability refers to the
proportion of the attender's resources that are free to serve in a given task at a particular point in time,
and resource activity refers to the proportion of the attender's available resources that are actually
employed in the service of the task.
It is proposed here that, to overcome capacity limitations, both resource availability and resource
activity have the potential to be modulated in tandem in a manner that is highly plastic and efficient.
In musical contexts, this potential is released by the generation of metric frameworks. This conception
of resource allocation, which is depicted in Figure 1, is based upon several assumptions. First,
resource availability fluctuates across timepoints in a manner that mirrors the profile of metric
frameworks if an appropriate cognitive/motor schema is invoked. Second, the default pattern of
resource activity mirrors the distribution of resource availability across time. Furthermore, activity is
regulated by a feedback mechanism that is sensitive to availability limits, and thus functions to ensure
that activity remains within these limits.

This tight relationship between resource availability and resource activity enables the efficient
processing of metrical patterns. In accordance with the dynamic attending approach (e.g., Large &
Jones, 1999), it is assumed here that resource activity becomes focused at space/time regions where
target events occur, or are expected to occur. Consequently, due to variability in the concentration of
events at different metric locations in music, resource activity is usually increased at strong locations.
When resource availability and resource activity operate in concert, however, compensatory increases
in availability accompany the momentary increases in activity. Therefore, sufficient resources are
available so long as the pattern continues to conform to the established metric structure, as a greater
proportion of resources is 'on standby' at strong metric locations. Note that this account differs from
the dynamic attending concepts described earlier mainly in that it addresses resource availability in
addition to resource activity: both factors come to share metric structure. The current account also
deviates from traditional resource theory in that it emphasises lawful fluctuations in resource
availability at the timescale where meter resides.
Figure 1: Both resource availability and resource activity mirror metric structure (the dots represent
metric pulsations).
The two-factor conception of resources described above becomes particularly useful when attempting
to explain resource allocation during prioritised integrative attending. At weak metric locations,
resource activity associated with processing one's own part is typically relatively low, and therefore
the performer is free to track other parts and engage in between-part grouping. At strong metric
locations, even though resource activity associated with processing one's own part may be higher,
tracking and between-part grouping are enabled by increased resource availability. Thus, the efficient
distribution of attentional resources provides a foundation for flexibility in attending. These benefits
cease to exist in the absence of metric framework generation.
It is assumed in ARAMEP that resource availability is no longer modulated systematically when
metric frameworks, or other appropriate schemas, are not generated. Resource availability is instead
constant over time (or characterised by minute random fluctuations) in these situations - such as when
attention is directed to a nonmetrical pattern. Nevertheless, resource activity continues to fluctuate,
typically quite wildly, in response to the unfolding nonmetrical pattern. Thus, resource activity and
resource availability may become decoupled: fluctuations in activity are not compensated for by
corresponding fluctuations in availability.
Furthermore, when attending to a nonmetrical pattern, resource availability should eventually begin to

decrease. Expectations about the temporal location of events comprising nonmetrical patterns lack
precision. Therefore, there must be a corresponding increase in the size of the temporal region during
which the attender is prepared for the events. As is the case in metrical patterns, this preparedness is
manifested as increased resource activity. However, in nonmetrical patterns, these relatively high
levels of resource activity must extend over regions of greater duration than those circumscribed by
strong locations in metric frameworks. Based upon the assumption that sustained focused activity is
effortful and leads to decreases in resource availability (see Koelega, 1996), the present account
postulates that adequate increases in resource availability are not sustainable over the extended
regions of high resource activation demanded by nonmetrical patterns.
In any case, when attending to nonmetrical patterns, resource availability and resource activity are less
likely to be correlated than when attending to metrical patterns. This independence becomes
especially problematic in multipart contexts where prioritised integrative attending is required. This is
because resource activity associated with attending to nonmetrical target parts frequently nears, or
even meets, resource availability. Recovering from these frequent disturbances interferes with
processing and leaves little scope for the flexible attending that underlies tracking other parts and
between-part grouping.
Interactions of factors and mechanisms
The primary goal of ARAMEP is to explicate how the musical and extramusical factors identified
earlier interact with the resource availability and resource activity mechanisms described above. It is
claimed that this interaction is initiated when the factors impact upon the 'state' of the performer and
their environment, i.e., the surrounding musical context. ARAMEP is concerned with three aspects of
the performer/context relationship: intrinsic cognitive-motivational, intrinsic executive, and extrinsic
states.
Intrinsic states in general occur within the performer, and include the degree to which his or her
perceptual, cognitive, and motor resources are prepared to deal with the ensemble interaction at hand.
Two varieties of intrinsic state are distinguished: cognitive-motivational and executive. Intrinsic
cognitive-motivational state refers to high level cognitive phenomena such as attentional sets, and
motivational or emotional factors such as mood. Musical factors that contribute to intrinsic
cognitive-motivational state include performance goals and musical schemas - e.g., metric
frameworks and abstract knowledge of tonality. Extramusical factors that influence intrinsic
cognitive-motivational state include anxiety, arousal, and motivation. Intrinsic executive state
incorporates the strategies that are available to the performer for producing target behaviour. Intrinsic
executive state is defined jointly by the performer's mastery of instrumental technique and the
schemas that guide their motor actions (e.g., performance plans).
In contrast to the intrinsic states, extrinsic state is a product of the performer's external environment,
which comprises the ensemble sound, physical surroundings, and ambient social context. Extrinsic
state is affected by musical factors such as rhythm, texture, tonality, melodic contour, and
pitch-interval size, and extramusical factors such as acoustic conditions, lighting, background noise,
and comfort of the performance space.
It is proposed here that the three performer/context states are linked systematically to resource
availability and resource activity (see Figure 2). Intrinsic cognitive-motivational state modulates both
resource availability and resource activity, whereas extrinsic and intrinsic executive states have a
direct influence only upon resource activity. In other words, the temporal profile of resource activity
is determined by the full range of musical and extramusical factors, but the timecourse of resource
availability is determined exclusively by schematic musical factors - metric frameworks in particular -

and physiological extramusical factors - anxiety, arousal, and motivation. Nevertheless, resource
availability may be affected indirectly by the other factors through causal links between the three
performer/context states (see Figure 2). Extrinsic state affects both intrinsic executive state (e.g.,
comfort may influence technique) and intrinsic cognitive-motivational state (e.g., ensemble sound
structure determines which schemas are invoked, and social context influences anxiety); intrinsic
cognitive-motivational state modulates intrinsic executive state (e.g., anxiety affects motor control);
and intrinsic executive state influences extrinsic state (e.g., motor control affects ensemble sound). A
feedback link from intrinsic executive system to the intrinsic cognitive-motivational system is also
assumed to exist for the endogenous regulation of motor control (e.g., error correction based on
proprioceptive feedback, rather than external acoustic sources).
Figure 2: Interaction between performer/context states and resource availability and resource activity.
The connections between performer/context states and resource availability and resource activitiy in
ARAMEP embody predictions about how the musical and extramusical factors under discussion

interfere with prioritised integrative attending. Basically, different factors lead to three different types
of interference: (Type A) decreased resource availability, (Type B) increased resource activity, or
(Type C) decoupled availability and activity. Type A interference occurs when global decreases in
resource availability are brought about by factors such as extreme high or low levels of anxiety and
arousal (see Kahneman, 1973). On the other hand, the increases in resource activity that characterise
Type B interference are related to rhythmic, textural, and pitch-based complexity. Finally, in Type C
interference, the decoupling of resource availability and resource activity occurs exclusively as a
result of rhythmic complexity. Interactive effects may also arise through combinations of musical and
extramusical factors. Rhythmic complexity should be a particularly potent contributor in these
interactions, given the intimate relation between metric frameworks and resource allocation
mechanisms postulated in ARAMEP. If this claim is valid, then music educational techniques for
dealing with rhythmic complexity by optimising metric framework generation should facilitate the
development of prioritised integrative attending skills.
Conclusions
Prioritised integrative attending in ensemble performance is a multifaceted skill composed of
sub-skills including tracking multiple sound sources and grouping together their elements in order to
derive the aggregate structure. ARAMEP accounts for how the attentional flexibility required for
these sub-skills is influenced by a range of musical factors (e.g., rhythmic and pitch-based
complexity) and extramusical factors (e.g., anxiety and arousal). The central claim is that these factors
act directly upon cognitive/motor mechanisms that regulate attentional resource allocation during
performance. The role of metric framework generation is paramount in this process. Through metric
framework generation, resource availability and resource activity are modulated - in real time and
continuously - in a manner that is both plastic and efficient, and hence conducive to attentional
flexibility. Thus, the sub-skills involved in prioritised integrative attending encounter minimal
interference when availability and activity maintain a lawful relationship. However, the decoupling of
resource availability and activity that occurs in the absence of metric frameworks curbs attentional
flexibility. The challenge for future research is to demonstrate empirically the distinction between
coupled and decoupled resource availability and activity.
By identifying the factors and mechanisms underlying prioritised integrative attending, ARAMEP has
potential to serve as a heuristic for directing research into ensemble skills. It also highlights the
complexity of the task faced by ensemble musicians. ARAMEP implies that ensemble performance
requires special capabilities that transcend the skills commonly identified in music education as
indices of a performer's general musical ability (see Shuter-Dyson, 1999). Technical competence as an
instrumentalist, accurate perceptual-motor skills, and artistry as an interpreter of music are necessary,
but by no means sufficient, for excellence as an ensemble performer. Specific prioritised integrative
attending sub-skills are fundamental to ensemble cohesion and coherence.
References
Allport, D.A., Antonis, B., & Reynolds, P. (1972). On the division of attention: A
disproof of the single channel hypothesis. Quarterly Journal of Experimental
Bharucha, J.J., & Pryor, J.H. (1986). Disrupting the isochrony underlying rhythm: An
aymmetry in discrimination. Perception and Psychophysics, 40, 137-141.
Bregman, A.S. (1990). Auditory scene analysis. Cambridge, Massachusetts: The MIT
Press.

Casey, J.L. (1991). Teaching techniques and insights for instrumental music educators.
Chicago: GIA Publications.
Crowder, R.G. (1993). Systems and principles in memory theory: Another critique of
pure memory. In A.F. Collins & S.E. Gathercole (Eds.), Theories of memory (pp.
139-161). Hove, UK: Lawrence Erlbaum Associates.
Damos, D.L. (1991). Dual-task methodology: Some common problems. In D.L. Damos
(Ed.), Multiple-task performance (pp. 101-119). London: Taylor & Francis.
Dawe, L.A., Platt, J.R., & Racine, R.J. (1993). Harmonic accents in inference of metrical
structure and perception of rhythm patterns. Perception and Psychophysics, 54, 794-807.
Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for
melodies. Psychological Review, 85, 341-354.
Dowling, W.J., & Bartlett, J.C. (1981). The importance of interval information in long
term memory for melodies. Psychomusicology, 1, 30-49.
Dowling, W.J., Kwak, S., & Andrews, M.W. (1995). The time course of recognition of
novel melodies. Perception and Psychophysics, 57, 136-149.
Drake, C., & Palmer, C. (2000). Skill acquisition in music performance: Relations
between planning and temporal control. Cognition, 74, 1-32.
Essens, P.J. (1995). Structuring temporal sequences: Comparison of models and factors
of complexity. Perception and Psychophysics, 57, 519-532.
Fink, I., & Merriell, C. (with the Guarneri String Quartet). (1985). String quartet playing.
Neptune City, NJ: Paganiniana Publications.
Gabrielsson, A. (1999). The performance of music. In D. Deutsch (Ed.), The Psychology
of Music (2nd ed.) (pp. 501-602). San Diego, CA: Academic Press.
Gjerdingen, R.O. (1989). Meter as a mode of attending: A network simulation of
attentional rhythmicity in music. Intégral, 3, 67-91.
Goodman, E. (1998, July). 1 + 1 = 2? The ensemble performance of Chopin's Cello
Sonata. Paper presented at the Tenth International Conference on Nineteenth-Century
Music, University of Bristol, United Kingdom.
Goodman, E. (2000, March). Ensemble rehearsal: Analysis and psychology in practice.
Paper presented at the SMA and SPRMME Study Day 'Pathways to Musical
Understanding: Analysis and Psychology in Conjunction', University of Reading, United
Kingdom.
Gruson, L.M. (1988). Rehearsal skill and musical competence: Does practice make
perfect? In J.A. Sloboda (Ed.), Generative processes in music (pp. 91-112). Oxford:
Clarendon Press.
Handel, S. (1973). Temporal segmentation of repeating auditory patterns. Journal of
Experimental Psychology, 101, 46-54.
Handel, S. (1984). Using polyrhythms to study rhythm. Music Perception, 1, 465-484.

Handel, S. (1989). Listening: An introduction to the perception of auditory events.

Cambridge, MA: MIT Press.
Handel, S. (1992). The differentiation of rhythmic structure. Perception and
Psychophysics, 52, 497-507.
Holleran, S., Jones, M.R., & Butler, D. (1995). Perceiving implied harmony. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 21, 737-753.
Jones, M.R. (1976). Time, our lost dimension: Toward a new theory of perception,
attention, and memory. Psychological Review, 83 (5), 323-355.
Jones, M.R. (1981). Only time can tell: On the topology of mental space and time.
Critical Inquiry, 7, 557-576.
Jones, M.R. (1992). Attending to musical events. In M.R. Jones & S. Holleran (Eds.),
Cognitive bases of musical communication (pp. 91-110). Washington: American
Psychological Association.
Jones, M.R., & Boltz, M. (1989). Dynamic attending and responses to time.
Psychological Review, 96 (3), 459-491.
Jones, M.R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of
Experimental Psychology: Human Perception and Performance, 7(5), 1059-1073.
Jones, M.R., & Yee, W. (1993). Attending to auditory events: The role of temporal
organization. In S. McAdams & E. Bigand (Eds), Thinking in sound: The cognitive
psychology of human audition (pp. 69-112). Oxford: Clarendon Press.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.
Keller, P. (1999). Attending in complex musical interactions: The adaptive dual role of
meter. Australian Journal of Psychology, 51, 166-175.
Keller, P. (2000a, January). Attentional resources and metric frameworks in musical
rhythm. Paper presented at the Fifth Conference of the Australasian Cognitive Science
Society, University of Melbourne, Australia.
Keller, P. (2000b). Musical and extramusical factors influencing attending in ensemble
performance. Manuscript in preparation.
Klein, J.M., & Jones, M.R. (1996). Effects of attentional set and rhythmic complexity on
attending. Perception and Psychophysics, 58 (1), 34-46.
Koelega, H.S. (1996). Sustained attention. In O. Neumann & A.F. Sanders (Eds.),
Handbook of perception and action, Vol. 3: Attention (pp. 277-331). London: Academic
Press.
Krumhansl, C.L. (1990). Cognitive foundations of musical pitch. New York: Oxford
University Press.
Krumhansl, C.L., Sandell, G.J., & Sergeant, D.C. (1987). The perception of tone
hierarchies and mirror forms in twelve-tone serial music. Music Perception, 5, 31-78.
Large, E., & Jones, M.R. (1999). The dynamics of attending: How we track time varying

events. Psychological Review, 106, 119-159.

Neumann, O. (1996). Theories of attention. In O. Neumann & A.F. Sanders (Eds.),
Handbook of perception and action, Vol. 3: Attention (pp. 389-446). London: Academic
Press.
Nketia, J.H.K. (1962). The hocket-technique in African music. Journal of the
International Folk Music Council, 14, 44-52.
Palmer, C. (1997). Music performance. Annual Review of Psychology, 48, 115-138.
Palmer, C., & Krumhansl, C.L. (1987). Pitch and temporal contributions to musical
phrase perception: Effects of harmony, performance timing, and familiarity. Perception
and Psychophysics, 41, 505-518.
Palmer, C., & Krumhansl, C.L. (1990). Mental representations for musical meter.
Journal of Experimental Psychology: Human Perception and Performance, 16, 728-741.
Pantaleoni, H., & Serwadda, M. (1968). A possible notation for African dance
drumming. African Music, 4, 47-52.
Povel, D.J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2,
411-440.
Pressing, J. (1998). Referential behaviour theory: A framework for multiple perspectives
on motor control. In J. Piek (Ed.), Motor control and human skill: A multi-disciplinary
perspective (pp. 357-384). Champaign, IL: Human Kinetics.
Pressing, J. (1999). The referential dynamics of cognition and action. Psychological
Review, 106, 714-747.
Pressing, J., Summers, J., & Magill, J. (1996). Cognitive multiplicity in polyrhythmic
pattern performance. Journal of Experimental Psychology: Human Perception and
Rasch, R.A. (1988). Timing and synchronization in ensemble performance. In J.A.
Sloboda (Ed.), Generative processes in music: The psychology of performance,
improvisation, and composition (pp. 70-90). Oxford: Clarendon Press.
Rideout, R.R. (1992). The role of mental presets in skill acquisition. In R. Colwell (Ed.).
Handbook of research on music teaching and learning (pp. 472-479). New York:
Shirmer Books.
Schmuckler, M.A. (1989). Expectation in music: Investigation of melodic and harmonic
processes. Music Perception, 7, 109-150.
Shaffer, L.H. (1984). Timing in solo and duet piano performances. The Quarterly
Journal of Experimental Psychology, 36A, 577-595.
Shaffer, L.H. (1985). Timing in Action. In J.A. Michon & J.L. Jackson (Eds.), Time,
mind and behavior (pp. 226-241). Berlin: Springer-Verlag.
Shuter-Dyson, R. (1999). Musical ability. In D. Deutsch (Ed.), The Psychology of Music
(2nd ed.) (pp. 627-651). San Diego, CA: Academic Press.

Sloboda, J.A. (1982). Music performance. In D. Deutsch (Ed.), The Psychology of Music
(pp. 479-496). New York: Academic Press.
Sloboda, J.A. (1985). The musical mind. Oxford: Oxford University Press.
Sloboda J.A. (1996). The acquisition of musical performance expertise: Deconstructing
the "talent" account of individual differences in musical expressivity. In K.A. Ericsson
(Ed.), The road to excellence: The acquisition of expert performance in the arts and
sciences, sports, and games (pp. 107-126). Mahwah, NJ: Lawrence Erlbaum.
Smith, K.C., & Cuddy, L.L. (1989). Effects of metric and harmonic rhythm on the
detection of pitch alterations in melodic sequences. Journal of Experimental Psychology:
Human, Perception and Performance, 15, 457-471.
Smyth, M.M., Morris, P.E., Levy, P., & Ellis, A.W. (1987). Cognition in action. London:
Summers, J.J., Rosenbaum, D.A., Burns, B.D., & Ford, S.K. (1993). Production of
polyrhythms. Journal of Experimental Psychology, 19, 416-428.
Thompson, W.F., & Mor, S. (1992). A perceptual investigation of polytonality.
Psychological Research, 54, 60-71.
Wickens, C.D. (1980). The structure of attentional resources. In R. Nickerson (Ed.),
Attention and Performance VIII (pp. 239-257). Hillsdale, NJ: Erlbaum.
Wickens, C.D. (1984). Processing resources in attention. In R. Parasuraman, & D.R.
Davies. (Eds), Varieties of Attention (pp. 63-102). Academic Press.
Wickens, C.D. (1989). Attention and skilled performance. In D. Holding (Ed.), Human
skills (pp. 71-105). New York: John Wiley & Sons.
Wickens, C.D. (1991). Processing resources and attention. In D.L. Damos. (Ed.),
Multiple-task performance (pp. 3-34). London: Taylor and Francis.
Back to Index

The McGill Piano Project: Effects of piano instruction
Proceedings Keynote
The nonmusical benefits of music instruction

Eugenia Costa-Giomi, McGill University
The contribution of music instruction to the education and development of children has received
considerable attention in recent years. The surge of interest on the benefits of music has been
triggered by research findings on the effects of music listening and music instruction on cognitive
abilities. While results regarding the short-term effects of music listening on spatial abilities are
contradictory, those regarding the effects of music instruction on the development of spatial skills are
more consistent. It was found that children who received 6- to 24-months of music instruction scored
significantly higher in certain spatial tests than those not engaged in formal musical training. These
effects, however, may be only temporary (Costa-Giomi, 1999; Persellin, 2000), relatively small
(Costa-Giomi, 1999), and restricted to very specific spatial tasks (Rauscher et al, 1994).
Other benefits of music instruction that have been investigated are those related to academic
achievement. It has been suggested that exposure to music even in relatively passive situations such as
those involving listening to background music improve academic achievement (e.g., Schreiber, 1988).
Students who participate in music instruction or live in an enriched musical environment often obtain
higher academic scores than do nonparticipants (Harding, 1990; Hurwitz et al., 1975; Linch, 1994).
However, not all studies found a significant relationship between music participation and academic
achievement (Legette, 1994; Kooyman, 1989). It could be argued that in the long run, such
relationship might be influenced by a process of selection through which high-academic achievers
engage themselves and persist in music instruction and lower-academic achievers do not pursue or
discontinue this type of instruction.
Although music has been used as an effective therapeutic treatment with groups characterized by low
self-esteem, there are conflicting research findings regarding the contribution of music participation to
the development of self-esteem in normal children (Legette, 1994; Linch, 1994; Wood, 1973).
Research in which music instruction was actually provided as a treatment indicated that, in general,
music participation has very limited effects on subjects' self-concept (Lomen,1970; Wamhopff, 1972)
or no effect at all (Legette, 1994; Michel, 1971; Michel & Farrell, 1973).
Those who provide children with music lessons report that there are many other benefits associated
with music instruction. Duke, Flowers, and Wolf (1997) found that parents and piano teachers believe
that piano instruction improves certain personal characteristics of children such as discipline,
concentration, ability to relax, confidence, responsibility, and self-concept and adds pleasure to
students' lives. The scarce research on the personality traits of pianist (Kemp, 1996) makes it
impossible to assess whether these beliefs can be supported by empirical evidence.
What it is known, is that children who are engaged in piano instruction come from a rather privileged
environment, fact that might explain the reported advantage that these children have in regards to
academic achievement, self-esteem, and personality traits. The results of a national survey conducted
in the United States ( Duke, et al., 1997) indicated that most piano students are white Caucasian,
female, from upper-middle income, have well-educated parents, live in suburban homes with both
file:///g|/Sun/Costakey.htm (1 of 7) [18/07/2000 00:28:35]

parents, and have high academic expectations. In addition to demographic differences, it is possible
that the students who participate in music instruction for an extensive period of time have different
interests or personal characteristics than do students who never take or discontinue music lessons. The
present study was conducted with children who did not seek to participate in formal music instruction
and who came from a less privileged environment than the one described by Duke at al. Its purpose
was to investigate the effects of three years of piano study on their cognitive development, academic
achievement, and self-esteem. A more detailed description of the study's sample, methodology, and
results regarding the cognitive benefits of piano instruction can be found in Costa-Giomi (1999).
Methodology
Sample
One hundred and seventeen 4th-grade children (58 girls and 59 boys) who had never participated in
formal music instruction, did not have a piano at home, and whose family income was below $40,000
Cdn (approximately $27,000US) per annum participated in the study. Thirty percent of the children
lived in single-parent families and 25% had unemployed parents whose welfare subsidies were less
than $20,000. Sixty-seven children were assigned to the experimental group and 50 children to the
control group.
Treatment
Each child in the experimental group received, at no cost to the families, three-years of individual
piano lessons and an acoustic piano. The lessons, which were taught at the children's schools, were 30
minutes long during the first two years and 45 minutes during the third year. Nine experienced
teachers followed a traditional piano curriculum characteristic of Canadian conservatories.
Testing
Prior to the treatment, children in both the control and experimental groups were administered five
standardized tests with adequate reliability levels for the age of the sample: Level E of the Developing
Cognitive Abilities Test (DCAT), the tonal and rhythmic audiation subtests of the Musical Aptitude
Profile, the fine motor subtests of the Bruininks-Oseretsky Test of Motor Proficiency, the language
and mathematics subtests of the level 14 of the Canadian Achievement Test 2 (CAT2), and the
Coopersmith Self-Esteem Inventories (long form). At the end of the first, second, and third year of
instruction, children took the appropriate level of the DCAT (i.e., E, F, and G levels respectively), and
the self-esteem test. At the end of the second and third year of piano instruction, children also took the
language and math subtests of the CAT2, levels 15 and 16 respectively. Tests were administered to
mixed groups of experimental and control subjects in the same order in all schools. I studied children's
academic performance in school through the analysis of their school report cards from third grade
(one year prior to the start of piano instruction) to sixth grade and focused on the effects of piano
instruction on four subjects: music, math, English, and French. The piano teachers completed weekly
progress reports about children's attendance to the lessons and their practice routine throughout the
three years of instruction.
Results
In order to determine the effects of piano instruction on cognitive abilities, self-esteem, and academic
achievement, I compared the experimental and control groups throughout the duration of the project
through ANOVAs with repeated measures followed by Scheffe post hoc comparisons. I conducted
these analyses with a number of additional independent variables: sex, income (<$20,000, $20,000 -
$30,000 Cdn., or $30,000 - $40,000 Cdn.), family structure (single- or two-parent family), and

parental employment (0, 1, or 2 employed parents). I also compared the children who dropped out of
the piano lessons during the duration of the study with those who never participated in formal music
instruction and who participated in the piano lessons for three years.
Cognitive abilities
Results showed that the experimental group's spatial scores were significantly higher than those of the
control group after one and two years of instruction and that the groups' spatial scores did not differ
prior to the treatment or at the end of the intervention. After two years of treatment, also the general
cognitive ability scores of the experimental group were significantly higher than those of the control
group. No differences in quantitative and verbal abilities between the control and experimental groups
could be established throughout the duration of the project. Gender, income, family structure, and
parental employment did not interact with the treatment in a significant way. No differences between
the drop-out and control groups or between the drop-out and experimental groups could be established
for any of the subtest scores or the total scores of the DCTA.
I conducted multiple regression analyses to explore the effects of specific components of the treatment
on the cognitive development of children in the experimental group. These components were
dependent upon the subjects motivation to learn the piano, variable which could have affected the
relationship between music and cognitive development. Motivation to learn the piano, as measured by
lessons missed and average practice time per week, explained 21% of the variance in spatial abilities
and 22% of the variance in total cognitive abilities after three years of piano instruction.
Self Esteem
The results of the analyses showed that the total self-esteem scores of the experimental group
increased significantly from 1994 to 1997 but those of the control group did not. It was noticed that
the School self-esteem scores of the experimental group tended to increase throughout the three years
while those of the control group tended to decrease. The analyses which included gender, income,
family structure, and parental employment as additional variables yielded similar results to the ones
reported earlier and showed no interactions between piano instruction and these variables. Further
analyses indicated that the total self-esteem scores of children who completed the three years of
instruction improved significantly, while those of the dropout and control groups did not.
Academic achievement
The results of the analyses did not show any significant effect of piano instruction on children's total
scores in the language and math subtests of the academic achievement test. The analyses of partial
scores in each of the two math subtests and two language subtests revealed no significant difference
between the control and experimental groups. It was noticed that the math computation scores of the
experimental group tended to be higher than those of the control group especially after two years of
instruction. The analyses which included of gender, income, family structure, and parental
employment as additional variables yielded similar results to the ones reported earlier. The analysis of
math computation scores which included income as an independent variable showed a significant
interaction between Year and Group which was not found in the previous analysis. There was a more
pronounced improvement in the experimental group's math computation scores than those of the
control group.
When re-analyzing the data to study differences in academic achievement among the children who
completed the three years of piano lessons and those who dropped out of or never participated in
piano instruction, it was found that the experimental group obtained higher total language scores than
the control group after two years of instruction.

School performance
The analyses of data from the school report cards showed that piano instruction affected children's
school music marks. The experimental group obtained significantly higher music marks than the
control group after two years of instruction. It was also found that the marks of the control group
varied significantly throughout the three years of the project while those of the experimental group did
not.
Although the analysis of math marks also showed a significant effect of piano instruction on school
math performance, post-hoc analyses did not reveal any differences between the control and
experimental groups. Similarly, no effects of piano instruction on children's school performance in
language subjects were found even when including sex, income, family structure, and parental
employment as additional independent variables into the analyses. The consideration of these
variables did not modify the results presented regarding children's math and music marks either. The
analyses of school marks of children who completed, did not complete, or never participated in the
three years of piano instruction, yielded similar results to the ones reported earlier. The dropout
group's marks did not differ from either the control's or the experimental group' marks.
Other effects
The analysis of data gathered through regular interviews with the parents, teachers, and students
showed a few interesting trends. Almost half of the parents of children in the experimental group
reported that during the three years of the project they provided their other children with music
lessons. With the exception of one child in the control group, children who expressed interest in
pursuing music careers were all in the experimental group. More children in the experimental group
than in the control group showed interest in playing musical instruments at the end of the project.
Parents and children in the experimental group attended more concerts and recitals than those in the
control group throughout the three years of the project. No differences in personal traits, as rated by
the parents, between the control and experimental groups could be established either before or after
the treatment.
Discussion
The results of the study suggest that one of the benefits associated with piano instruction is the
development of children's self-esteem. The total self-esteem scores of the experimental group showed
a more pronounced (and statistically significant) improvement than those of the control group during
the three years of instruction regardless of family income, sex, family structure, and parental
employment. It is important to mention that the treatment of this project involved not only the piano
lessons but also many other special events such as owning a piano, playing in recitals, and getting
individual attention from a caring teacher. Although traditional piano instruction involves all these
elements and, as such, contributes to the development of self-esteem in children, future research might
try to establish the individual contribution of each these elements.
As discussed elsewhere (Costa-Giomi, 1999), the results of the study corroborate that piano
instruction produces temporary improvements of general and spatial cognitive abilities. Children
receiving piano lessons obtained significantly higher general and spatial cognitive scores than those
not participating in formal music instruction after one and two years of treatment. However, no
differences between the groups could be established after the third year of instruction. Additional
findings suggest that motivation to study piano plays an important role in the relationship between
music instruction and cognitive development. Apparently, children who apply themselves benefit to a
larger extent than do those who do not practice or miss lessons.

The results of the study did not show any significant contribution of piano instruction to children's
academic achievement in language and math as measured by standardized tests. I noticed, however,
that children receiving the piano lessons tended to obtain higher math computation scores than those
not participating in formal music instruction after two years of treatment and that those who
completed three years of piano instruction obtained higher language scores than those who
discontinued the lessons.
The effects of piano instruction on children's academic achievement in school, as measured by school
report cards, were similar to the ones measured through standardized tests. No benefits of piano
instruction were evident from the analyses of school marks in language subjects and math. Although
the math marks of the children receiving and not receiving the lessons changed significantly from
third-grade to sixth-grade, the results did no indicate any clear overall gain or loss for any of the
groups. The lack of other effects of piano lessons on school performance are in fact positive signs that
this type of instruction does not necessarily add excessive strain on children's academic
responsibilities. Children in the experimental group were able to meet the academics demands of
school on top of those associated with the study of piano.
Participation in piano instruction had other interesting effects on children and their families. Families
whose children were taking lessons showed an increased interest in music participation than those
whose children were not engaged in formal music instruction. The siblings of almost half of the
children who were offered the piano lessons through this project subsequently started formal music
instruction at their parents' initiative and families with children involved in the piano lessons attended
more concert than those with children in the control group. Participation in music instruction actually
opened othercareer options for the children in the experimental group, options not even considered by
those with no formal music instruction experience.
In summary, the results of the study show that piano instruction benefits children in various ways but
that the scope of these benefits may be more limited than previously suggested.
References
Costa-Giomi, E. (1999). The effects of three years of piano instruction on children's cognitive
development. Journal of Research in Music Education, 47, 198-212.
Duke, B., Flowers, P., & Wolfe, D. (1997). Chilodren who study piano with excellent teachers in the
United States. Bulletin of the Council for Research in Music Education, 132, 51-85.
Harding, J. A. (1990). The relationship between music and language achievement in early childhood.
Dissertation Abstracts International, 52 (10A), 3148.
Hurwitz, I., Wolff, P., Bortnick, B., & Kokas, K. (1975). Nonmusical effects of the Kodaly music
curriculum in primary grade children. Journal of Learning Disabilities, 8(3), 45-52.
Kemp, A. (1996). The Musical Temperament: Psychology and Personality of Musicians. New York:
NY, Oxford University Press.
Kooyman, R. J. (1989). And investigation of the effect of music upon the academic, affective, and
attendance profile of selected fourth grade students. Dissertation Abstracts International, 49 (11-A),
3265.
Lamar, H. B. (1990). An examination of the congruency of music aptitude scores and mathematics
and reading achievement scores of elementary children. Dissertation Abstracts International, 51
(3-A), 778-779.

Legette, R. M. (1994). The effect of a selected use of music instruction on the self- concept and
academic achievement of elementary public school students. Dissertation Abstracts International,54
(7-A) 2502.
Linch, Sheryl Ann (1994). Differences in academic achievement and level of self-esteem among high
school participants in instrumental music, non-participants, and students who discontinue instrumental
music education. Dissertation Abstracts International, 54 (9-A), 3362.
Lomen, D. O. (1970). Changes in self-concept factors: A comparison of fifth-grade instrumental
music participants and nonparticipants in target and nontarget schools in Des Moines, Iowa.
Dissertation Abstracts International, 31, 3962A.
Michel, D. E. (1971). Self-esteem and academic achievement in black junior high school students:
effects of automated guitar instruction. Council for Research in Music Education, 24, 15-23.
Michel, D. E. & Farrell, D. M. (1973). Music and self-esteem: disadvantaged problem boys in an all
black elementary school. Journal of Research in Music Education, 21, 80-84.
Persellin, D. (2000, March). The effect of activity-based music instruction on spatial-temporal task
performance of young children. Paper presented at the National Biennial Music Educators National
Conference. Washington, DC.
Rauscher, R., Shaw, G., Levine, L., Ky, N., & Wright, E. (1994, August). Music and spatial task
performance: A causal relationship. Paper presented at the Annual Convention of the American
Psychological Association, Los Angeles: CA.
Schreiber, E. H. (1988). Influence of music on college students' achievement, Perceptual and Motor
Skills, 66, 338.
Wamhoff, M. J. (1972). A comparison of self-concept of fourth grade students enrolled and not
enrolled in instrumental music in selected schools in the Barstow, California School District.
Dissertation Abstracts International, 33, 626A.
Wagner, M. & Menzel, M. (1977). The effect of music listening and attentiveness training on the
EEGs of musicians and nonmusicians. Journal of Music Therapy, 14, 151-164.
Wood, A. L. (1973). The relationship of selected factors to achievement motivation and self-esteem
among senior high school band members. Dissertation Abstracts International, 35, 1150A.
Tests
Bruininks-Oseretsky Test of Motor Proficiency (1978). Published by American Guidance Center.

Circle Pines, MN.
Canadian Achievement Tests, 2nd Ed. (1992). Published by Canadian Test Centre Inc. Markham, ON.
Coopersmith Self-Esteem Inventories (1981). Published by Consulting Psychologists Press, Inc. Palo
Alto, CA.
Developing Cognitive Abilitites Test (1990). Published by American Testronics. Iowa City, Iowa.
Musical Aptitude Profile (1988). Published by Riverside Publishing Company.

Back to index

posters1
Poster session1 Sunday

Theme 1: Child development, education and learning
Theme 2: The study of music performance
Delegate Paper Title Theme
Bounceau, A. Colour in code, mind and sound 1
Chin, C. Musical self-concepts of young children 1
Costa-Giomi, E. Young children's identification of simple harmonic accompaniments 1
The effects of music therapy on social development and communication skills in two
Dillon, T. group's of moderate learning disabled children 1
Student evaluation of instrumental teaching: a challenge to the teacher-student

Hanken, I. relationship 1
Motivation and expertise: The role of teachers, parents and ensembles in the
Ilari, B. development of instrumentalists 1
Iritani, T. A new theory of adult learning 1

Ivaldi, A. Motivation and identity in music performance: an exploratory study 1
Lapidaki, E. Young people's and Adults' large-scale timing in music listening 1
An empirical investigation of developments in joint attention following structured music
MacDonald, R. workshops 1
Discrimination and interference in recall of melodic stimuli:musically trained and

Madsen, C. untrained adults versus children 1
Miura, M. A conceptual design of a CAI system for basse donnee in harmony theory 1
The effects of background music on memory under the perspective of irrelevant speech
Mullensiefen, D. effect and context-dependant memory 1
Musumeci, O. The cognitive pedagogy of aural training 1

Apprenticeship, cultural practices and research on instrumental teaching in higher music
Nerland, M. education: A theoretical perspective 1
Starting to play a musical instrument: parents' and children's expectations of the learning
Pitts, S. process 1
Rauscher, F. Is the mozart effect debunked? (ABSTRACT) 1

The influence of parental attitudes and support on children's engagement in instrumental
Ryan, K. music 1
The social cost of expertise: personality differences in string players and their
Stepanauskas, D. implications for the audition process and musical training 1
Strand, K. Rhythmic organisation in young children's vocal performances 1

Webster, P. Broadening the concept of music aptitude: new approaches to theory and assessment 1
Welbel, J. A multidimensional model of performance anxiety (ABSTRACT) 1
DeFonso, L. Vocal quality of singing voices as a function of acoustic properties (ABSTRACT) 2
Operant conditioning of the EEG as a means of enhancing performance skills in
Egner, T. 2
musicians (ABSTRACT)
Fine, P. The effects of melody and harmony on pitching ability in sight-singing 2
Hariharan, M. Performance studies on the music of genre of performing artists in south indian music 2
Koh, C. Sequential and structural analysis of recall protocols during melody learning 2
Kruger, M. Measuring slide movement with young trombone players 2
file:///g|/poster1/posters1.htm (1 of 2) [18/07/2000 00:28:37]

posters1
Laukka, P. Expression of emotions in drumming performance 2

It's a part of me: acquiring a sense of musical ownership through musical performance
Lightbourn, A. 2
(ABSTRACT)
Parncutt, R. Is scientific research on piano performance useful for pianists? (ABSTRACT) 2

Shifres, F. The role of performance in the cognitive reality of hierarchic structure 2
Return to menu page
file:///g|/poster1/posters1.htm (2 of 2) [18/07/2000 00:28:37]

hjgfkhgfkhf
Proceedings paper
COLOUR IN CODE, MIND AND SOUND.

Annette Boucneau.
Candidate Masters in Ethnomusicology.
Helsinki University.
Department of Musicology.
e-mail: Annette.Boucneau@helsinki.fi
'A music system, its style, its main characteristics, its structure are all very closely associated with the
particular way in which it is taught. Not only what is taught but also the activities involved in learning
can tell us what is valued in a music.' (Bruno Nettl 1983, 331-332).
'Colourstrings' teaching is especially designed as pre-school instrumental education. The method was
first developed for violin by Dr. Géza Szilvay in the early 1970's. My research is based on the Violin
ABC books and on practical observations of lessons at East Helsinki Music School, Finland. In this
paper, I throw light on two important ideas related to the cross modal associations, which characterise
this education to performance approach.
In his 'Colourstrings' teaching, Szilvay's first goal is to construct the hearing of the young child. Many
of the ideas developed in his instrumental approach for violin are an extension and application of
Kodaly's principles. Szilvay does not use conventional music notation during the initial year(s) of
learning. The first page of book A (fig. 1.) is a picture the child could even have drawn him/herself
when he/she was confronted with the sound of the open violin strings for the first time. It seems as if
the child has differentiated the four specific timbres of the strings, interpreted and expressed
him/herself through a medium he/she is familiar with: drawing pictures and using colours. As such,
the code is fully comprehensible, the child understands the relation between the pictorial code and
sound instantly. These four connoted characters awaken the interest and involvement of the child and
play a crucial part in the creative processes of structured hearing and musical cognition while playing
the violin.
A pre-school concept of the four strings, an invitation to use imagination.
file:///g|/poster1/Bounceau.htm (1 of 8) [18/07/2000 00:28:39]

hjgfkhgfkhf
Fig. 1. : The visual representation of the auditory sound image of the four violin strings: the opening
page of book A of the Violin ABC of Dr. G. Szilvay.
The musical code, designed for and by adults, is often forced prematurely upon the pre-school
instrumentalist. With this background, I am particularly keen to defend the meaningfulness of code as
used in 'Colourstrings' when teaching pre-school children an instrument, in this case the violin.
My study addresses two issues:
-The importance of a colour code in preparation of sight reading conventional notation.
-The semantic meaning and importance of colours in stimulating the personal engagement of the
young player.
I have decided to investigate the stage when 'Colourstrings' pupils have assimilated body movements
adequately in relation to the colour code. The sound of the four strings, coded as a green bear, a red
father, blue mother and a yellow bird, assist the children to identify the sound and direct the required
position of the right arm. With practice, the movements become trained and ingrown. The code
functions as memory aid and programs the inner body and inner ear in preparation of this process. The
young child masters the soundscape he/she is creating and re-creating. Colours and pictures encode
additional concepts needed in this process of re-creation. This code is comprehensible to the
five-year-old and inspires to investigate more and with ease. The child, curious and motivated,
practices more and most importantly, shows enjoyment in practising.
Imagination is the first goal of music education: musical cognition is built on intersense
imagery.

hjgfkhgfkhf
According to McAdams, developing metaphors in order to communicate an 'auditory sound image' is

of vital importance: metaphors allow for a hierarchical or multi levelled approach to auditory
organisation. (1984, 291) The pre-school child is at an age where he/she uses metaphors and cross
modal associations naturally. For example, when the child starts playing the violin his/her attention is
drawn to focus on the picture of the bear when playing the G-string. Whenever the G-string is heard,
the child recognises the sound and identifies it with the picture of the bear. When the child looks at the
picture of the bear, the sound of the G-string surfaces in aural imagination before he/she has the time
to lift the bow or pluck the string. The string is not a bear but another bear: it receives another new
meaning.
To Dr. Szilvay, the four characters have each an additional referent: the bear represents the toy world,
the father and mother represent the family and the bird represents nature. As the bear represents a toy,
I see it that the five-year-old initially plays with the violin and builds a similar relationship like he/she
has when playing with other toys, or - as suggested here - with a toy bear. The bird represents nature,
the violin receives a new dimension of sounds in nature. The mother and father are the family and the
violin very soon is the next dearest other family member of the child. Szilvay touches here upon what
he calls the familiar world of the child: play, family and the sounds of nature. As the child interprets
the picture and reconstructs in imagination what he/she has heard when playing the strings, the inner
voice is not his/her own, but adapted, sounds as if it came from an inner violin. The violin itself is
reproduced in imagination. 'The violin is the inner voice.' (Boucneau, 1999).
The imagined oral cue is coloured: it has internalised each particular timbre of the violin string. The
inner 'canto-ergo sum' (Björkvold, 1990) disposes now over four 'instrumental chords'. Meanwhile, as
the hearing and inner voice are coloured, the child has learned what to listen for. 'A hearing is itself a
performance, an active process of making meaning.' (Bamberger, 1991, 3). This making meaning out
of listening prepares the body co-ordinations in the mind. The visual sound image of the four strings
internalises and cues the body movements. The child learns to conduct his/her own inner musical
world.
How to put a principle of string crossing into action: selections in the mind and body depend on
visual colours.
In August 1999, a primary school was founded at East Helsinki Music Institute. The school carries the
name 'East Helsinki Music School'. Thirty children were selected to start in year one. Nine of them
decided to learn to play the violin. Three pupils were trained 'Colourstrings' pupils and six showed no
previous experience. Two groups of different levels have been maintained throughout the year. Four
times a week, each group receives a 45 minute group lesson from Dr. Szilvay.
During the first term, practising at home is not required, as Szilvay considers the lessons a group
practice. From the second term, daily homework is expected. While playing in-group and in solo-tutti
interaction, the children accumulate an extensive repertoire of songs. The colour code develops
gradually stave by stave. By the end of the year, all players possess a balanced elementary basis of
violin skills and all demonstrate an adequate 'colour-free' conventional code comprehension.
The purpose of the study.
As one aspect in the development of sight-reading, I have assessed the importance of the colour code
in relation to the spatiotemporal movements of the right arm. Secondly, I have investigated the
transition process to read conventional music notation.
Method, material, procedure.

hjgfkhgfkhf
The following tests were given in September 1999, after one month of tuition, and in February 2000,
after six months of tuition. The children were assessed in groups and placed in rows, in order not to
copy each other. The advanced group was evaluated after one year of playing, in September 1999. For
the beginners' groups, colours were added to critical places in string crossings. The most advanced
children, did not need any colour indications and performed the string crossings without hesitation.
Test 1.
After one month of tuition, the six children of the beginners' group were tested in the following way:
Colours or characters were called out in a free order. With closed eyes, the pupils selected the
required position of the right arm, without making a sound as to not reveal the 'answer' to the others.
This resulted in a 'miming' of the string attack and an upward circular right arm movement.
Test 2.
Six months later, the ability to read conventional notation was tested with the same group.
The beginners' group was confronted with new material, conventionally written. The notation
included a fingering in colour where confusion about the string crossing would arise.
Test 3.
After one year of playing, the advanced group's level of sight-reading from conventional notation was
tested.
The group was confronted with a sight-reading, which included more than one string crossing.
Neither colours nor fingerings were given.
Results.
Test 1: after one month of 'Colourstrings' training, the string crossing reflexes were synchronised and
well assimilated by all.
Test 2: was performed without hesitation when colours and/or coloured fingering were added.
Test 3: proved that there was no need to add colours above the conventional notation. All children
performed their string crossings as expected.
Test 1 reveals that from the first month of 'Colourstrings' tuition, the right arm's spatiotemporal
movements are accurately trained to perform string crossings. With closed eyes, the miming of the
string attack is faultless. The sight-reading results in tests 2 and 3 are different. In comparing tests 2
and 3, we notice that there is a transition period of approx. 6 months when a coloured fingering cues
string crossings. After one year of tuition, perfect assimilations of the right arm movements result in a
well-directed performance of colour-free sight-readings including string crossings.
In general, 'Colourstrings' pupils prove to be efficient sight-readers. The audiation or inner hearing is
first colour (timbre) specific before it is pitch specific. From imagining what should be heard, what
should have moved differently, the children control and become 'masters of themselves'. Confronted
with a colour, the instant processing of aural and body imagination is an anticipation in the mind what
the produced sounds have to sound like and which movements are involved. The mental link between
the visual colours and the aural timbre of the string is an internal imaginative process. The four chosen
colours are a mnemonic device to practically overcome string crossings but, equally function as
specific timbre semantic when shifting the left hand and playing on the same string is involved.

hjgfkhgfkhf
Discussion.
Test 3 proves that the transition to conventional notation happens approximately after one year. As
children 'grow out of the colours', the temporary effect of 'coloured hearing' is not lasting. However,
as a mnemonic, colours simplify the process of comprehending and reinforce the development of
reading. Intuitively, the child is eager and demonstrates enjoyment: he/she becomes committed. I like
to link both issues of my study. The musical grammar, which Szilvay presents at the level of thinking
of the child is easy to decode. As such, the young player is stimulated to repeat the experience and
practices more. He/she is capable to encode and eager to show this progress through playing.
However, are colours and pictures the only mnemonic aid? Social learning - in 4 group lessons a week
- is likely to play an important part in the progress of the groups here studied. According to Elliott,
constructive knowledge is worth doing, as the social environment stimulates it. (1993) The social
environment is first created by setting up a primary school within a music school, second by
organizing players of similar ability in the same group.
Dr. Szilvay was motivated to develop the use of colours, because of the inadequacy of the
conventional notation to the mind of the pre-school child. When the conventional code system as we
know it, does not awaken any meaning to the young learner, the child does not see any application for
it in relation to his/her playing. By using 'a sign system based on previous experiences' (
Dewey/Reichling ) - Szilvay conveys to the child that the meaning of music is to reveal his/her inner
world and communicate with the outer world through playing. Szilvay's first message to convey to the
child is that music is an expression, it is the revelation of inner feelings. Through colours, the child's
self-transformation is not based on mechanical routines and abstract comprehension, but is an active
process which the child can respond to affectedly. That, which is transformed here as well, is the mind
of the adult. 'Colourstrings' teaching depends on the teacher's ability to reach the artist in the child.
The teacher has to know 'how to 'music' and how to teach others 'to music.' (Elliott 1991-93, 24) The
application of colour in code, mind and sound demands in the first place an adaptation of the mind of
the adult/teacher to the mind of the child.
I have mentioned two other factors which need consideration when relating to the observations and
tests done with the first violin class of East Helsinki Music School: social learning and the teacher
involved. I acknowledge that there are more factors involved for example related to the background
the children have from home. These issues, I have deliberately limited in this presentation but are
open to further discussion.
If colours and images speak, they can make the bow sing and a pizzicato chuckle.
There are three implications I want to highlight here. First, the importance of colours in developing
the identity of a string and its position, so that the process of sight reading, based on a faster
assimilation of body control, can take place at an early stage in the development of the young
instrumentalist. Second, my hypothesis is, that colours can be seen as a semantic of interpretation and
thereby stimulate the child to become improvisers and interpreters from the start. Third, the music is
not only understood as a performance practice but the inner musical 'chat' the child has developed
along the process of structuring the hearing. The child has understood that he/she can encode an
imagined aural experience visually.
The application of colours in constructing the inner hearing has been applied to other instrumental
teaching. 'Colourstrings' methods exist for violin, cello, double bass, guitar, piano and flute. Each uses
colours to simplify the code to the pre-school child. Colours speak to the young child and intensify the

hjgfkhgfkhf
visual perception linked to specific aural timbre expectations. These will be heard in imagination so
that precise right arm movements can be directed. The colours function then as a mnemonic aid in
developing accurate sigh reading skills. As the 'colour hearing' does not last, there are no problems in
transition to reading the conventional code. Tests and almost thirty years of teaching experience world
wide, have proven that children learn faster the conventional code, are better sight readers and as such
can join the string group activities at an early stage in their development. Through colours, the
learning process has been intensified.
However, the meaning of the code and the musical theory is learned in a playful way. As the child
could make sense out of colours, music reading becomes more valuable. In addition, colours speak to
the emotion of the child and suggest that the expression of inner feelings can add other dimensions to
decoding or reading. In my opinion, the emphasis on timbre, which Dr. Szilvay's initiates in his
'Colourstrings' approach, is at the basis of the process of musicing. As Davidson claims: 'My choice of
string on which I play a given note now reflects my understanding of the role timbre plays in my
interpretation. As my options expand, the choices I make become increasingly significant. This
dramatically transforms my perspective of the musical experience as well as the nature of
performance.' ( L. Davidson 1994, 103) Thus, visual colours as semantic of the string timbre imply an
interpretation and technical ability in function of expressing feelings. The child has quasi
independently discovered how to music musically. The musical knowledge, the violin aesthetics and
the ability to imagine, be personally engaged and give an emotional response, are all combined in the
colour code. It is this total comprehension, without notes or words but through colour and pictures,
which is at the basis of making music in an intellectual and musical way.
It must be noticed, that from as young as pre-school age, the child not only enjoys the practice of
decoding and re-creating, but equally improvises and interprets known or new tunes. He/she not only
does so through playing, but is equipped with the knowledge to encode what he/she has created. As
the child's musical and expressive knowledge and inner hearing has developed from listening for
colours s/he was looking at, the choice of timbre according to his/her own inner feelings when
improvising a tune will correspond with a visual colour in the young composer's code. As the mind
makes new combinations, the child not only possesses the skills to play these almost simultaneously -
as in the case of improvisation - but can also encode the ongoing musical 'chat' in his/her mind.
Already from this stage, expressive meaning is engraved in the code. The 'Music is born out of the
need to express ourselves and to communicate aesthetically through the abstractions and
characteristics of sound.' (R. Aiello 1994, 44). Music does not only mean sound, but from now on is
recognised as a system of coding.
In 'Colourstrings' teaching, Dr. Szilvay re-structures and re-values the instrumental approach of the
very young, based on the relative theories of Kodaly. He aims at developing a musical constructed
knowledge by applying a colour code appropriate to the level of comprehension of the child, who
becomes a musical master of him/herself from the very start. At this point, Szilvay overtakes Kodaly:
even before reading notes, the pre-school violinist has grasped the concept of musicing in relation to a
compact but meaningful picture borrowed from his/her own familiar world and has experienced music
making and music coding/decoding in several dimensions.
REFERENCES
Aiello, R. (1994) Music and Language. Parallels and Contrasts. In R. Aiello (Ed.) Musical

hjgfkhgfkhf
Perceptions. New York: Oxford University Press. pp 40-62.

Bamberger, J. S. (1991) The Mind Behind the Musical Ear: How Children Develop Musical
Intelligence. Cambridge, Massachusetts: Harvard University Press.
Bamberger, J. S. (1994) Coming to Hear in a New Way. In R. Aiello (Ed.) Musical Perceptions. New
York: Oxford University Press. pp 131-151.
Björkvold, J-R. (1990) Canto-Ergo Sum. In F. R. Wilson and R. L. Roehmann (Eds.) Music and
Child Development: Proceedings of the 1987 Denver Conference. St. Louis: MMB Music. pp
117-137.
Björkvold, J-R. (1989) Det Musiske Menneske. Oslo: Freidig.
Boucneau, A. A. (1999) From Eye to Ear: Colours and Pictures Promt the Inner Voice of Pre-School
Violinists. Paper presented at CMI-99 Oslo.
Davidson, L. (1994) Songsinging by Young and Old. In R. Aiello (Ed.) Musical Perceptions. New
York: Oxford University Press. pp 99-130.
Eliott, D. J. (1991-93) Music as Knowledge. In E. R. Jorgensen (Ed.) Philosopher, Teacher,
Musician: Perspectives on Music Education. Urbana and Chicago: University of Illinois Press. pp
21-40.
Elliott, D. J. (1993) On the Values of Music and Music Education. Philosophy of Music Education
Review, Volume 1, No. 2, 81-93.
Elliott, D. J. (1995) Music Matters: A New Philosophy of Music Education. New York: Oxford
University Press.
Nettl, B. (1983) The Study of Ethnomusicology. University of Illinois Press.
Reichling, M. J. (1991-93) Dewey, Imagination and Music: a Fugue on Three Subjects. In E.R.
Jorgensen (Ed.) Philosopher, Teacher, Musician: Perspectives on Music Education. Urbana and
Chicago: University of Illinois Press. pp 61-78.
Szilvay, G. (1996) Colourstrings 1996. The Finnish Kodaly Centre, Yearbook 1996, 49-66.
Jyväskylä. Finland.
Szilvay, G. (1994) Violin ABC. Book A. (revised edition) Espoo, Finland: Fazer Music Inc.
Back to index

hjgfkhgfkhf

Young children's musical self-concepts and family musical experiences
Proceedings paper
MUSICAL SELF-CONCEPTS OF YOUNG CHILDREN

Christina Chin, University of California at Santa Cruz
Discussion of the beneficial effects of music on child development often centers on cognitive
development, such as spatial-temporal reasoning (e.g. Rauscher et al., 1997). In contrast, the present
research addresses the effects of musical activities on an area of social and emotional development,
the self-concept. This research investigates why some young children think of themselves as musical,
and hence are motivated to do music activities, whereas others do not think of themselves as musical.
Because music programs are often the first to go when the budgets of schools in the U.S. are cut, it is
important for us to understand the consequences of a lack of music education for the development of
children's musical self-concept and their possible involvement in future musical activities.
Even when elementary schools in the U.S. have music programs, it is common for children not to
have an opportunity to learn to play a musical instrument in school until fourth grade. Unfortunately,
by the time children are 9 years old, we may have missed valuable years in terms of the formation of
their self-concepts. According to Piaget, children enter concrete operations when they grasp the ideas
of conservation and seriation between the ages of 5 and 8 (Boden, 1979). Commensurate with these
newfound cognitive abilities, children begin to think of themselves and other people in terms of traits,
and to use social comparison to rank individuals, including themselves, to others in different domains
(Sameroff & Haith, 1996). For example, a child might go from thinking of someone as a person who
makes cookies and cakes to thinking of someone as a person who is good at baking. Because the
transition from thinking of oneself and others as doing things to thinking of oneself and others as
having abilities takes place between the ages of 5 and 8, it is critical for developmental psychologists
to study the early formation of self-concepts in children as young as 5 to 8 years old.
Unfortunately, however, most studies of children's self-concepts have been carried out with older
children, because it is easier to administer measures to children who can read. Although it is possible
to read 64 self-concept questions aloud to young children (Marsh, Craven, & Debus, 1991), it is
probably not too engaging a task for young participants. A fairly recent innovation methodologically
is the use of puppets with very young children (Eder, 1990; Ablow & Measelle, 1993), but a pictorial
instrument is most widely used to study the self-concepts of young children (Harter & Pike, 1983).
The Pictorial Scale of Perceived Competence and Social Acceptance for Young Children (PSPCSA)
is developmentally appropriate and user-friendly, but neglects to measure any kind of musical or
artistic self-concept that children may have. For the present research, a new instrument was devised
which measures musical self-concept and artistic self-concept, as well as self-concept in other areas.
To do this, the format of the PSPCSA was retained, but the content - pictures and questions - were
modified.
In addition to assessing children's self-concepts, the present study assessed family musical
environment through a parental questionnaire. Although the family environment that children grow up
in probably affects their self-concepts, including their musical self-concept, surprisingly little research
has been conducted on family influence on children's musical development. No research directly
file:///g|/poster1/Chin.htm (1 of 7) [18/07/2000 00:28:42]

addresses the initial formation of young children's musical self-concept, which is likely a necessary
precursor for individuals to continue to be involved in music as they get older and perhaps later decide
to become professional musicians. A study of concert pianists from the U.S. found that 19% came
from families with no prior musical involvement (Sosniak, 1985). Perhaps what might matter more to
a child's musical development than whether a child's family members are musically involved
themselves is whether a child's family is supportive of the child's musical activities. For example, in a
British study of musically gifted 10- to 18-year-olds, 86% of the students benefited from some form
of parental encouragement or pressure to practice (Sloboda & Howe, 1991). Note that this study did
not include children as young as those in the present research. Because parents' attitudes towards
music may affect their children's musical self-concepts, an additional component of this study was a
parent questionnaire assessing attitudes towards music and family musical environment. Not much
research has focused on parents' attitudes towards music, with such notable exceptions as the 1994
Gallup poll measuring Americans' attitudes towards music, and a measure of the home musical
environment which correlated with second graders' musical ability as assessed by teachers (Brand,
1986).
METHOD
Participants
Participants were 88 children between the ages of 5 and 8, and their parents. The sample consisted of
43 first-grade students and 45 second-grade students. There were 46 girls and 42 boys. The Northern
California school district from which students were drawn is about 44% European American, 29%
Latino, 19% Asian American, and 6% African American. Most were attending a public elementary
school which was the first to participate in a pioneering music program, Guitars in the Classroom,
with the remainder attending a neighboring public school without a music program.
Materials
Self-Concept: A new adaptation of the Pictorial Scale of Perceived Competence and Acceptance for
Young Children (Harter & Pike, 1983) was used. Whereas the original measure assessed scholastic
competence, physical competence, peer acceptance, and maternal acceptance, the revised measure
assesses musical competence, artistic competence, scholastic competence, physical competence,
prosocial competence, and acceptance of physical appearance. Child participants point to which of
two pictured children is more like them, then answer a second question asking to what degree they are
like the pictured child. This two-step process results in each variable being measured on a 4-point
forced choice scale.
Questionnaire: Parents were asked questions about the family's active experience with music, the
family's listening habits, and the parents' attitudes towards music. Some questions were adapted from
Brand's (1986) Home Musical Environmental Scale (HOMES).
Procedure
Child and parent participation was on a voluntary basis. Child participants were individually
administered the revised version of the Pictorial Scale of Perceived Competence and Acceptance for
Young Children at the school site. Parent participants filled out the questionnaires at home and sent
them back to school with their children. Data were collected at the beginning and at the end of the
school year.

Design
A 2 (grade) x 2 (school) x 2 (gender) x 2 (within) ANOVA was used to analyze child self-concept
data. Chi-squares and correlations were used to analyze parent questionnaire data.
RESULTS
Child Self-Concept
Music vs. Other Activities: For the overall mean values of the different areas of self-concept, see Table
1. Apparently, the first and second graders studied are more confident in their abilities to do anything
measured besides sing and play musical instruments.
TABLE 1 - MEAN VALUES OF CHILD SELF-CONCEPT
IN DIFFERENT AREAS
Spring
Self-Concept Area Fall
Climbing 3.66 3.69
Swinging 3.63 3.72
Making Things 3.49 3.26
Math 3.43 3.51
Physical Appearance 3.43 3.26
Sharing 3.42 3.65
Spelling 3.38 3.57
Writing 3.23 3.54
Drawing 3.21 3.25
Reading 3.09 3.38
Dancing 3.01 3.00
Singing 2.79 2.76
Playing Musical Instrument 2.77 2.38
Developmental Differences: See Figure 1. Not surprisingly, second graders rated themselves better at
reading and at math than did first graders, an assessment that may have a basis in reality. In contrast,
however, second graders rated themselves worse at singing and at playing musical instruments than

did first graders.
Gender Differences: See Figure 2. Girls rated themselves better at singing than did boys. Boys rated
themselves better at climbing than did girls.
Family Musical Environment

Parent Attitudes: A positive attitude towards music in the fall was associated with child's self-rated

musical abilities in the spring. Specifically, parents' endorsement of the statement "I can't imagine life
without music" was positively correlated with child's later self-rated singing ability (r=.34, p<.05).
DISCUSSION
In the early elementary years, children's self-concepts in music are low compared to their
self-concepts in other areas, such as physical activity. The current research, which found that first- and
second-graders' self-concepts in playing a musical instrument and in singing were lower than their
self-concepts in other areas, corroborated Eccles et al.'s (1993) finding that first-, second-, and
fourth-graders' self-concepts in playing a musical instrument were lower than their self-concepts in
other areas. In fact, although in the current study children's self-concepts in most areas increase or
remain relatively stable from first to second grade, their self-concepts in music undergo a decline
during this period. In contrast, Eccles et al. (1993) did not find significant differences in children's
self-concepts in most areas from first to second grade, but did from second to fourth grade. The
difference between the findings of the current study and Eccles et al.'s (1993) findings perhaps can be
explained by the differing methodologies used to assess self-concept. Eccles et al. (1993) read
questions aloud to children, who used a pictorial representation of a 7-point Likert scale to respond.
Since first- and second-graders have not necessarily mastered the ability to make multiple
comparisons yet, and hence would not necessarily understand the relatively complex 7-point Likert
scale, this method would be less sensitive to measuring their self-concept than our method of asking
two dichotomous questions in succession. In any case, Eccles et al.'s (1993) findings suggest that it is
crucial to supply musical activities to children before third or fourth grade, before they become
convinced that they are not good at music and fail to develop any interest in doing musical things. Our
findings, using a methodology more compatible with the cognitive limitations of first- and
second-graders, suggest that it may be critical to introduce children to music even earlier. This
empirical finding is supported by developmental psychology theory asserting that the ages from six to
eight are vital to the early formation of self-concept, i.e. when children begin to think of themselves
and other people in terms of traits, not just their physical characteristics and the things they do
(Damon & Hart, 1982), and when social comparison becomes very important (Sameroff & Haith,
1986).
Previous research regarding children's gender role development and music has found girls and boys to
gravitate towards different musical instruments as early as in third grade (Abeles & Porter, 1978), and
that girls express more favorable attitudes towards music than boys in the third through sixth grades
(Nolin, 1973). Children's gender role development with respect to music has rarely been studied
before third grade. In a notable exception, the self-concepts of first-, second-, and fourth-graders were
studied (Eccles et al., 1993); girls perceived themselves as better at music and reading than boys
perceived themselves to be, and boys perceived themselves as better at sports and math than girls
perceived themselves to be. In the current study, two of these findings were corroborated; girls have
more confidence in their singing abilities than boys have in theirs, and boys have more confidence in
their climbing abilities than girls have in theirs. Future research could investigate the phenomenon of
boys already thinking that they are not as good at singing relative to girls as early as first grade by
administering a measure of gender role development to participants. Then it would be possible to
discover whether girls are more likely to perceive themselves as good at music, or whether
feminine-stereotyped individuals are more likely to perceive themselves as good at music.
Parental perceptions of children's competence in certain areas can affect the development of children's
competence in those areas. With regard to math, English, and sports, parents' perceptions of their
children's abilities have been shown to be influenced by children's gender, and not necessarily by

children's actual performance (Jacobs & Eccles, 1992). Since previous research has not been carried
out to study parents' perceptions of their children's competence in music, this exploratory research is
noteworthy for showing a link between positive parental attitudes towards music (belief in the
importance of music for themselves personally) and child musical self-concept. Future data collection
could be improved by having parents rate their attitudes on a Likert scale rather than by globally
endorsing or failing to endorse a statement. In previous research with adolescents talented in music
and other arts, most participants reported that someone's approval helps motivate them to do their
music or other art; a mother's approval is most frequently mentioned (Chin, 1997). Since many of the
adolescent participants in the aforementioned study had been involved in music and other arts for
several years, it is of vital importance to carry out more research with parents of young children.
REFERENCES
Abeles, H. F., & Porter, S. Y. (1978). The sex-stereotyping of music instruments. Journal of Research
in Music Education, 22.
Ablow, J. C., & Measelle, J. R. (1993). Berkeley Puppet Interview: Administration and scoring system
manuals. Berkeley: University of California.
Boden, M. A. (1979). Piaget. Brighton: Harvester Press.
Brand, M. (1986). Relationship between home musical environment and selected musical attributes of
second-grade children. Journal of Research in Music Education, 34, 111-120.
Chin, C. S. (1997). The social context of artistic activities: Adolescents' relationships with friends and
family. Unpublished manuscript, University of California at Santa Cruz.
Damon, W., & Hart, D. (1982). The development of self-understanding from infancy through
adolescence. Child Development, 53, 841-864.
Eccles, J., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences in
children's self- and task-perceptions during elementary school. Child Development, 64, 830-847.
Eder, R. A. (1990). Uncovering young children's psychological selves: Individual and developmental
differences. Child Development, 61, 849-863.
Jacobs, J. E., & Eccles, J. (1992). The impact of mothers' gender-role stereotypic beliefs on mothers'
and children's ability perceptions. Journal of Personality and Social Psychology, 63, 932-944.
Harter, S., & Pike, R. (1983). The Pictorial Scale of Perceived Competence and Social Acceptance for
Young Children. Denver, CO: University of Denver.
Marsh, H. W., Craven, R. G., & Debus, R. (1991). Self-concepts of young children 5 to 8 years of
age: Measurement and multidimensional structure. Journal of Educational Psychology, 83, 377-392.
Nolin, W. H. (1973). Attitudinal growth patterns toward elementary school music experiences.
Journal of Research in Music Education, 21, 123-134.
Rauscher, F. H., Shaw, G. L., Levine, L. J., Wright, E. L., Dennis, W. R., & Newcomb, R. L. (1997).
Music training causes long-term enhancement of preschool children's spatial-temporal reasoning.
Neurological Research, 19, 2-8.
Sameroff, A. J., & Haith, M. M. (1996). The five to seven year shift: The age of reason and

responsibility. Chicago: University of Chicago Press.

Sosniak, L. (1985). Learning to be a concert pianist. In B. S. Bloom (Ed.), Developing talent in young
people. New York: Ballantine.
Sloboda, J. A., & Howe, M. J. (1991). Biographical precursors of musical excellence: An interview
study. Psychology of Music, 19, 3-21.
Back to index

Methodology
Proceedings paper
Young children's identification of simple harmonic accompaniments

Children have difficulty in perceiving harmonic changes until the age of eight or nine (Bentley, 1966;
Franklin, 1956; Hufstader, 1977; Imberty, 1969; McDonald & Simons, 1989; Merrion, 1989; Moog,
1976; O'Hearn, 1984; Petzold, 1966; Schultz, 1969; Shuter-Dyson & Gabriel, 1981; Simons, 1986;
Taylor, 1969; Vera, 1989; Zimmerman, 1971). Young children do not react adversely to dissonant
accompaniments to a melody (Antochina, 1939; Believa-Exempliarskaia, 1925; Bridges, 1965; Moog,
1976; Rupp, 1915, cited in Funk, 1977; Revesz, 1954; Sloboda, 1985; Teplov, 1966), or dissonant
chords and intervals (Valentine, 1913, Yoshikawa, 1972) and are inconsistent when identifying
dissonant and consonant stimuli (Zenatti, 1974). They often fail to perceive the difference between a
theme and its harmonic variations (Hufstader, 1977; O'Hearn, 1984; Pflederer & Sechrest, 1968;
Taylor, 1969) and in identifying the number of tones present in a chord (Vera, 1989). They also have
difficulty in expressing their perception of harmony verbally (Hair, 1981) or through the use of visual
representations (Hair, 1987). These findings have often been taken as an indication that young
children are incapable of perceiving harmony. However, studies have shown that kindergarten
children can recognize simple chord changes (Costa-Giomi, 1994a, 1994b) and 6-year-olds can
identify a chord that is different between pairs of short progressions (Zenatti, 1969). First graders
readily discriminate between pairs of chords (Hair, 1973) and seem confused when asked to sing a
familiar song with unfamiliar accompaniments (Sterling, 1985).
Research has provided little information about how to help young children learn harmonic elements.
Are there any factors that affect children's harmonic perception and understanding and that teachers
can manipulate in order to teach harmony to children effectively? The present study addressed this
question. The purpose of the study was to identify developmental trends in young children's
perception of simple accompaniments to familiar songs and musical factors that affect their harmonic
perception.
Methodology
Children attending kindergarten through 3rd grade at a public school in Montreal participated in the
study. The school had no formal music program. The classroom teachers, who developed singing
activities, did not teach harmonic concepts to the children and did not use any harmonic musical
instrument to accompany the children's singing. There were 18 children in kindergarten, 30 in first
grade, 22 in second grade, and 21 in third grade.
Children were provided with 10 weeks of music instruction. The 30-minute weekly lessons were
taught by a music specialist and focused on harmonic elements. The goals of the short music program
were for the children to learn to identify and play a simple chord progression (I V I) on the
omnichord, to sing short songs with an accompaniment of tonic and dominant chords, to identify
chord changes in more complex chord progressions, to perceive the difference between chord changes
and chord position changes, and to learn that most songs end in a tonic chord. Children learnt songs
file:///g|/poster1/CostaGio.htm (1 of 6) [18/07/2000 00:28:43]

Methodology
based on I and V including three songs used in the posttest, played the accompaniment to these songs
individually on the omnichord, wrote the accompaniment of the songs on the board, played games
based on the aural discrimination of chord changes and chord position changes, and practiced how to
identify the chords in simple progressions played in the omnichord.
The posttest had two parts. The first part was a paper and pencil group test and the second one was an
individual test requiring singing, the performance of a simple accompaniment on the omnichord, and
selected perceptual tasks. The present manuscripts reports the results of the first part of the test only.
The first part of the test consisted of two different tasks, one requiring the discrimination of various
accompaniments to familiar songs and the other one requiring the identification of the chords of a
simple accompaniment. First, the music specialist sang a familiar song to the children in four different
ways accompanying herself with the omnichord. For each rendition of the song, children were asked
whether the song sounded right. The rendition considered to be correct was the one children had heard
during the treatment and that was based on the conventional tonic and dominant chords. One of the
incorrect versions presented the chords of the accompaniment in the reversed order, that is, tonic
chords were replaced with dominant chords and dominant chords were replaced with tonic chords. In
the second incorrect version, the accompaniment was transposed a fifth higher in the middle of the
song while the melody was sang in the original key throughout the performance. In the third incorrect
version, both the melody and the accompaniment were transposed a fifth higher in the middle of the
song. Kindergarten children were presented with the four renditions of only one song: "Firilalala."
The other children were presented with the corresponding renditions of two additional songs "Blue
Bird" and "Row Row Row Your Boat." The order of presentation of the four renditions was different
for each of the three songs. All children listened to the stimuli in the same order.
To complete the second task, children were asked to identify the eight chords of the accompaniment
of a familiar song. Children listened to the music teacher sing the refrain of the song "Firilala" six
times with a simple omnichord accompaniment. While they listened to the stimuli, they wrote the I
and V chords in the answer sheet which included the drawings of eight birds representing the eight
chords of the refrain (the song "Firilalala" is about a bird wedding). Kindergarten children were not
asked to complete this task.
Results
Task 1
Children's responses to each of the four renditions of the songs were considered correct and given 1
point or incorrect and given 0 points. Because kindergarten children only listened to one song, two
analyses of variance (ANOVA) with repeated measures were performed with the data. One ANOVA
included the responses of first-, second-, and third-graders who listened to the four renditions of three
songs. The other ANOVA was based on the responses of children in kindergarten, first, second, and
third grade to the four renditions of the song "Firilala".
The results of the first ANOVA showed that grade level and song affected children's responses,
F[2,438] = 5.20, p < .01 and F[2,438] = 13.68, p < .01 respectively. Three interactions were
significant: grade x song, F[4,438] = 5.24, p < .01, grade x rendition, F[6,438] = 2.49, p = .02, and
song x rendition, F[6,438] = 2.32, p =.04. Analyses of simple effects indicated that song affected the
responses of first graders but not those of the older children. While first graders responded more
accurately when presented with the renditions of "Firilala" than those of the other songs, second- and
third-graders responded quite evenly to the three songs. The analyses of simple effects also showed
that grade affected children's responses to two of the incorrect renditions of the songs, the one in

Methodology
which the dominant and tonic chords were switched and the one in which both melody and
accompaniment were transposed. While these two renditions elicited the lowest scores from first
graders, they elicited the highest scores from children in second and third grade. Further analyses of
simple effects indicated that song affected children's responses to the two incorrect renditions that
included transpositions. While these renditions of "Firilala" elicited the highest responses from most
children, the same renditions of the other songs elicited the lowest scores.
The results of the second ANOVA performed with the data from the song "Firilala" showed that
children's responses differed according to grade and rendition F(3,270) = 4.17, p < .01, F(3, 270) =
4.85, p < .01 respectively. The interaction between grade and rendition was significant, F(9,270) =
2.66, p < .01. Analyses of simple effects indicated that grade affected children's responses to two of
the rendition, the one in which the dominant and tonic chords were switched and the correct rendition.
The comparison of means showed that rendition affected the performance of kindergarteners but not
those of the older children. Further analysis determined that kindergarten children provided more
accurate responses when presented with the two renditions which included transpositions than when
presented with the other renditions.
Task 2
The data from the second task was analyzed in order to see developmental trends in the way young
children perceive chords. An idea that was stressed during the training was that the accompaniments
of most songs usually end in the tonic chord. I found that 60% of the children in first grade identified
the last chord of the accompaniment as I, 95% in second grade did so, and 91% in third grade
identified the last chord accurately.
The last two chord of the accompaniment were tonic chords. I found that 27% of first graders, 32% of
second graders, and 62% of third graders were able to identified these chords accurately.
The first four chords of the accompaniment were I I V V. I found that 20% of first graders, 64% of
second graders, and 86% of third graders identified this simple progression accurately.
The last four chords of the accompaniment were I V I I. Only one child in first grade, two in second
grade, and two in third grade identified the chords of this more complex progression accurately. The
five children who were able to do so, also identified the first four chords of the song accurately. These
five children were the only subjects who identified all eight chords of the accompaniment correctly.
Discussion
The results of the study indicate that there are certain factors that affect children's performance in
harmonic perception tasks and show developmental trends in the way children perceive simple
harmonic progressions. Children's familiarity with a song affects the accuracy with which they
discriminate among various accompaniments to its melody. Although children learnt the three songs
used as stimuli during the training, they were more familiar with one of the songs (i.e., "Firilala") than
with the others, because "Firilala" was sang every week for 10 weeks while the other songs were only
introduced during the fifth week of instruction. Children were more successful in discriminating
between an incorrect and a correct rendition of "Firilala" than in doing so between the renditions of
the other songs. This was particularly true for the younger children. Perhaps, by listening to and
singing the same song for many weeks, children become more aware of the various features of the
song including those on which they would not spontaneously focus their attention. It is known that
young children tend to focus their attention on musical elements other than harmony but that they may
be prompt to focus on this element if presented with simple stimuli (Costa-Giomi, 1994a, 1994b). The
results of this study suggest that teachers can direct students' attention to the harmonic features of the

Methodology
music more successfully by using the songs that are most familiar to the children. This practice might
be effective especially when introducing complex accompaniments. In this study, the stimuli that were
most difficult to discriminate were two renditions of the less familiar songs; in fact, the same
renditions of the most familiar song elicited the highest scores.
Only a few students could identified the eight chords of the accompaniment to the refrain of "Firilala"
accurately indicating that the identification of chords is obviously a difficult task for young children.
Even in third grade were children usually unable to identify the tonic and dominant chords of simple
accompaniments.. However, children were able to identify chords with different degrees of accuracy
depending on their grade level. Older children were more successful in identifying the chords than
were second graders, and in turn, second graders were more accurate in their identifications of the
progressions than were first graders. The accompaniment children were asked to identify was
composed of two phrases. The first phrase, which was quite simple (I I V V), was identified by 86%
of the third-grade children and 68% of the second graders. Despite its apparent simplicity, only 20%
of the first grade children could identify the four chords accurately. The second phrase of the
accompaniment was more complex because it presented two chord changes (I V I I). The difficulty of
this progression was reflected in the low number of children who identified it accurately.
Interestingly, the five children who were able to do so were distributed among the three grade levels
indicating that even first graders may be able to identify the chords of a tonic-dominant progression.
Although children in general applied the knowledge they had learnt during the lessons when taking
the test, the younger ones were not as consistent as the older students in their use of new knowledge .
For example, most children remembered that the last chord of the accompaniment was likely to be the
tonic but 40% of first graders failed to do so. It is clear that young children benefit from the repetition
of simple concepts, especially those that are more foreign to them.
Teachers should be aware of the difficulties young children experience when presented with simple
harmonic tasks and are recommended to consider carefully the inclusion of harmonic concepts in the
early childhood music curriculum. It seems important that they provide children with opportunities to
apply of harmonic concepts through performance activities in addition to perceptual tasks.
References
Bentley, A. (1966). Musical ability in children and its measurement. New York: October House Inc.
Bridges, V. (1965). An exploratory study of the harmonic discrimination ability of children in
kindergarten through grade three in two selected schools. Unpublished Doctoral dissertation, Ohio
State University, Columbus.
Costa-Giomi, E. (1994b) Recognition of Chord Changes by 4- and 5-year-old American and
Argentine Children. Journal of Research in Music Education, 42, 68-85.
Franklin, E. (1956). Tonality as a basis for the study of musical talent. Goteberg: Gumpert.
Funk, J. D. (1977). Some aspects of the development of music perception. Dissertation Abstracts
International, 38, 1919B (University Microfilms No. 77-20,301.
Hair, H. I. (1973). The effect of training on the harmonic discrimination of first-grade children.
Hair, H. I. (1981). Verbal identification of music concepts. Journal of Research in Music Education,
29, 11-21.

Methodology
Hair, H. I. (1987). Descriptive vocabulary and visual choices: children's responses to conceptual
changes in music. Bulletin for the Council of Research in Music Education, 91, 59-64.
Hufstader, R. A. (1977). An investigation of a learning sequence of music listening skills. Journal of
Research in Music Education, 25, 184-196.
Imberty, M. (1969). L'acquisition des structures tonales chez l'enfant. [The acquisition of tonal
structures in children]. Paris: Klincksieck.
McDonald, D. T. & Simons, G. M. (1989). Musical growth and development birth through six. New
York: Schirmer.
Merrion, M. (1989). What works: instructional strategies for music education. Reston: Music
Education National Conference.
Moog, H. (1976). The musical experience of the pre-school child. (C. Clarke, trans.), London: Schott
& Co., Ltd. (original work published in 1968).
O'Hearn, R. N. (1984). An investigation of the response to change in music events by children in
grades one, three, and five. Dissertation Abstracts International, 46, 371A.
Petzold, R. G. (1966). Auditory perception of musical sounds by children in the first six grades.
(Cooperative Research Project No.1051). Madison: University of Wisconsin. (ERIC Document
Reproduction Service No. ED 010 297).
Pflederer, M. & Sechrest, L. (1968). Conservation-type responses of children to musical stimuli.
Bulletin for the Council of Research in Music Education, 13, 19-36.
Schultz, S. W. (1969). A study of children's ability to respond to elements of music. Unpublished
doctoral dissertation, Northwestern University, Evanston, IL.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability. (2nd edition). London:
Methuen.
Sloboda, J. A. (1985). The musical mind: the cognitive psychology of music. Oxford: Clarendon Press.
Sterling, P. A. (1985). A developmental study of the effects of accompanying harmonic context on
children's vocal pitch accuracy of familiar melodies. Dissertation Abstracts International, 45, 2436A.
Taylor, S. (1969). The musical development of children aged seven to eleven. Doctoral dissertation.
University of Southampton, UK.
Teplov, B. M. (1966). Psychologie des aptitudes musicales. [Psychologie of musical aptitudes]. Paris:
Presses Universitaires de France.
Valentine, C. W. (1913). The aesthetic appreciation of musical intervals among school children and
adults. British Journal of Psychology, 6, 190-216.
Vera, A. (1989). El desarrollo de las destrezas musicales [The development of musical abilities].
Infancia y Aprendizaje, 45, 107-121.
Yoshikawa, S. (1973). Yoji no waon-Kan no hattatsu [A developmental study of children's sense of
tonality]. Ongaku-Gaku, 19 (1), 5-72.
Zenatti, A. (1969). Le developpement genetique de la perception musicale [The development of

Methodology
musical perception]. Monographies Francaises de Psychologie, 17. Paris: Centre National de la

Recherche Scientifique.
Zenatti, A. (1974). Perception et appreciation de la consonance musicale par l'enfants entre 4 et 10
ans [Perception and appreciation of musical consonance of children 4- to 10-year old]. Sciences de
l'Art, 9( 1 & 2), 1-15.
Zimmerman, M. P. (1971). Musical characteristics of children. Washington, D.C.: Music Educators
National Conference.
Back to index

THE EFFECTS OF MUSIC THERAPY ON SOCIAL DEVELOPMENT
Proceedings paper
THE EFFECTS OF MUSIC THERAPY ON SOCIAL DEVELOPMENT AND

COMMUNICATION THROUGH ANALYSIS OF THE MUSICAL RELATIONSHIPS AND
VERBAL DIALOGUES IN TWO GROUPS OF MODERATE LEARNING DISABLED
CHILDREN
Teresa Dillon, Department of Psychology, The Open University,

Walton Hall, Milton Keynes, MK7 6AA
Telephone: + 44 (0) 1908 274066
Fax: + 44 (0) 1908 654488
Email: m.t.dillon@open.ac.uk
Raymond A.R. MacDonald, Department of Psychology,

Glasgow Caledonian University,
Cowcaddens Road,
Glasgow G4 0BA, UK.
Telephone: + 44 (0) 141 331 3971
Fax: + 44 (0) 141 331 3636
Email: R.MacDonald4@gcal.ac.uk
Katherine Williams, Music Therapist,

Sounds of Progress,
18 Albion Street Glasgow, G1 1LH
0141 552 3575
BACKGROUND
The practice of music therapy in Britain began in the 1950s, with the British Society of Music
Therapy being established in 1958. Music Therapy is a specialised and rapidly developing profession.
Due to the diverse activities and approaches within the discipline, finding an adequate definition is
file:///g|/poster1/Dillon.htm (1 of 5) [18/07/2000 00:28:45]

difficult. To summarise, music therapy can be viewed as the use of sounds and symbols within an
evolving relationship between child or adult and therapist to support and encourage physical, mental,
social and environmental well being Bunt (1994).
Documentation of music therapy practice has become an integral part of a therapist's activities. In
addition, there is a growing interest in quantitative and qualitative research methodologies that can
help us understand in more detail the process and outcomes of this intervention (Purdie, 1997).
Consequently, this has lead to investigations on the influence of music therapy on specific aspects of
behaviour such as communication (MacDonald, O'Donnell and Davies, 1999). This research has also
focused on the utility of music therapy as a unique therapeutic method within clinical practice
(Aldridge, 1993).
There has been over the last two decades an upsurge in qualitative social psychological research based
on communication, language and texts. This is due partly to an interdisciplinary trend towards
communication-orientated research in sociology, women's studies, anthropology, media studies and so
forth. Language is now viewed as an active site for the continuing negotiation of various meanings
and often investigations examine dyads, or group situations. It has been suggested the most important
element of task activity in groups is the dialogue among groups (Miell, D. and Mac Donald R.A.R, in
press; Tolmie, Howe, Mackenzie and Geer, 1993). Rogoff (1980) considers children to be apprentices
in thinking, active in their efforts to learn, observe, and participate with peers and with skilled
members of society. Central to Rogoff's theory of guided participation is the notion of
intersubjectivity, or sharing of focus and purpose between children and their more skilled partners and
their challenging and exploring peers. Progress occurs when children internalise or appropriate social
processes. Taking this view, the music therapy sessions can be seen as socially collaborative situations
where a skilled member of society (music therapist) is guiding the children through a musical and
verbal language.
Within her discussion of guided participation and intersubjectivity, Rogoff (1980) also talks about
how she sees the creative process occurring among individuals. She suggests that the mutual
involvement of people working on similar issues is part of the social context of creativity. In the case
of music therapy this context is the music therapy session and the creative musical relationships that
develop over time between therapist and client group. The sessions therefore can be seen as a
collaborative situation where the music therapist employs both musical and verbal dialogues to reach
the objectives of the sessions as well as develop the children's creativity and imagination.
AIMS
Taking the above view to social group processes and musical communication this study investigates
the effects of music therapy on social development as examined by the dialogues employed within the
sessions and communication taken as the musical relationships over 10 weekly sessions of music
therapy.
METHOD
The participants were 7 children, aged between 8-11 years, (Group 1 = 2 boys and 2 girls, Group 2 = 3
boys). The sessions were conducted as part of a program of music therapy in special education
schools. After 10 sessions of music therapy, videotaped analyses and transcriptions of each session
were conducted to determine how musical and verbal dialogues developed.

The sessions were conducted on a weekly basis, on a Friday morning for 20 minutes in the schools
music room. A professional music therapist conducted all the sessions.
The music therapist developed a structure to the sessions, which was: hello song, egg shaking game,
instruments playing/rhythm game, goodbye song. Analysis of the sessions focused on each of the
games and interpreting how the musical relationships were developing and how verbal dialogue
supported these relationships.
RESULTS
Transcripts from the videos show how the musical and verbal dialogues reflect communication and
social development over the 10 weeks, and how verbal dialogue supported the developed of the
musical relationships. Analysis of the music therapist's dialogues indicated how the music therapist
acts as a facilitator guiding the children through musical communication and with assisted verbal
dialogue development. For example, the music therapists simple and repetitive dialogue allows for the
musical relationships to be the main form of communication; 'Listen to each other.....try again....go
slower....consequently the music therapists dialogue develops a scaffold which enhances the musical
communication. It also has been shown that repetitive and structured task assist children with
moderate learning disabilities as it helps the child to master a task and feel confident with their work.
Analysis of musical development indicates how the children's skill in the games advances, as well as
their imagination and confidence within these games. For example, when Session 1 and 10 are
compared both Group 1 and Group 2, show improvement in their confidence to play and experiment
with the instruments.
DISCUSSION
Chomsky (1990) claims that the structure of language does not allow direct expression of our thought,
as the knowledge that we possess is not always reducible to words. Language has the limitations of
representing what one thinks without necessarily being what one thinks. This also relates to
Wittgenstein's (1953) arguments that we can never be entirely sure that we do in fact correctly
understand precisely what is intended, that language is not simply a matter of transmitting intentions
and knowledge. What is being proposed here is that through the analysis of the musical relationships
and verbal dialogue within the music therapy session we can develop a clearer picture of how the
therapy sessions evolve. As well as examining how the music therapist guides the children within the
session, assisting them in their development as apprentices in musical thinking.
The research shows examples of how the children's musical expressions can be demonstrated both
musically and verbally as both the children and the music therapist establish a common understanding
of the session by projecting their thoughts and ideas directly into the musical games within the
sessions. In this way music therapy can be viewed as form of musical and verbal discourse, a
discourse that is though music rather than about music. The children and the music therapist do not
therefore need to discuss their ideas, as they become apparent through the direct action, the musical
games. It is through this direct action that a common understanding of the musical games develops
between the music therapist and the children and its form this that the musical relationships,
imagination, creativity and confidence of the children develop. It is this shared musical reality, which
is principle form of communication, which the verbal dialogue assists and is based around. The simple
repetitive statements from the music therapist and the short statements form the children indicate that

the musical relationships are what is important within the music therapy sessions.
CONCLUSION
The findings suggest that music therapy can be analysed though the musical and verbal dialogue that
develops within the sessions. These forms of dialogue are unique to the music therapy session and
assist in the development of children with moderate learning disabilities musical communication,
experimentation, creatively and imagination. The results of the experiment also indicated how the
music therapist acts as a facilitator to the musical relationships, by employing certain dialogue
techniques. The therapist assists, through guided participation, the development of the children's
musical relationships. The results of the experiment are encouraging since there is a limited amount of
research on the effects of music therapy with groups of moderate learning disabled children that focus
on the development of musical relationships and on the verbal dialogue within music therapy sessions.
KEY WORDS
Music therapy, learning disabled children, social and communication skills, musical relationships and
verbal dialogue.
REFERENCES
Aldridge, D. (1993) Music therapy research 1: A review of the medical research literature within a
general context of music therapy research. The Arts in Psychotherapy, 20, 11-35.
Bunt, L. (1994) Music Therapy: An Art Beyond Words. Routledge: London.
Chomsky, N. (1990). Language and mind. In D. H. Mellor (Ed.), Ways of Communicating: The
Darwin College Lectures. Cambridge University Press
MacDonald, R.A.R., O'Donnell, P.J, & J.B., Davies (1999). Structured music workshops for
individuals with learning difficulty: an empirical investigation. Journal of Applied Research in
Intellectual Disabilities 12(3) 225 - 241.
Miell, D. & MacDonald, R.A.R. (in press). Children's creative collaborations: The importance of
friendship when working together on a musical composition. Social Development
Tolmie, A., Howe, C., Mackenzie, M. and Geer, K. (1993). Task design as an influence on dialogue
and learning: Primary school group work with object flotation. Social Development, 2 (3), 189-211
Purdie, H. (1997) Music therapy with adults who have traumatic brain injury and stroke. The British
Journal of Music Therapy, 11(2), 45 -51.
Rogoff, B. (1980). Apprenticeships in thinking: Cognitive development in social context. Oxford
University
Wittgenstein, L. (1953). Philosophical Investigations (translated by Anscombe G.E.M.). Blackwells,
Oxford

Back to index

Studentvurdering av hovedinstrumentundervisning - en utfordring for lærer-studentrelasjonen
Proceedings paper
Student evaluation of instrumental teaching − A challenge

to the teacher-student relationship
Ingrid Maria Hanken, Associate Professor,

The Norwegian State Academy of Music
Introduction
To qualify for a career as a professional musician, it is necessary to receive many years of high quality
instrumental teaching (Ericsson 1997; Manturzewska 1990; Sosniak 1990). The years the students
spend in higher music education institutions are crucial in this respect, and students often consider the
study of their principal instrument to be most important (Nielsen 1998). The quality of instrumental
teaching is therefore of vital concern to institutions of higher education.
Student evaluation of teaching is one means of developing the quality of teaching, that is in use in
many institutions. In this context student evaluation is used formatively, to improve teaching, and not
summatively as a basis for decisions on tenure, merit pay and so on (Centra 1993). The question
addressed in this paper is if student evaluation of individual teaching represents a special challenge to
the teacher-student relationship as compared to evaluation of class teaching. Research on higher music
education substantiates the close and personal relationship that normally develops between the
instrumental teacher and his student (Kingsbury 1988; Nettle 1995; Nielsen 1998). In such dyadic
teacher-student settings, it is vital to develop and preserve a good relationship between the two
parties. Tiberius og Flak (1999) claim that in every relationship between teacher and student there will
be some disappointment and negative emotions. Dyadic teaching and learning represent a special
challenge, however, because «... the overt civility of dyadic relationships can mask unexpressed
tensions and (...) these tensions, if not addressed, can increase to the explosive point, at which the
relationship itself is destroyed» (ibid. p.3). Therefore, they conclude, it is important to «...structure a
relationship that can handle conflicts and tensions routinely and thereby prevent escalation (p.5).
Student evaluation can be understood as a routine built into the relationship with the purpose of
unmasking tensions in a controlled manner, thereby enabling the parties to address the problems.
On the other hand, being subject to evaluation is not always pleasant; student evaluation reflects on
the teacher´s self respect as a professional and can sometimes be experienced as wounding,
threatening and demoralising (Braskamp & Ory 1994:128; Moses 1986; Ryan et.al.1980; Seldin 1989,
1993; Strike 1991). It is therefore a common recommendation in evaluation literature that student
evaluation should be conducted anonymously. In individual instrumental teaching, however, it is often
difficult, and perhaps not even very productive, to maintain anonymity. It is therefore a relevant
question to ask whether student evaluation of individual instrumental teaching in some cases might, in
file:///g|/poster1/Hanken.htm (1 of 8) [18/07/2000 00:28:46]

fact, worsen the teacher-student relationship.

Another relevant question is if student evaluation is compatible with the role of being an instrumental
teacher and a student respectively. Instrumental teaching is often described as learning by
apprenticeship (Nielsen 1998). Although learning by apprenticeship can be defined and understood in
different ways (Nielsen & Kvale 1997), there is undoubtedly a strong professional authority ingrained
in the role of the master. Nielsen (op.cit.) found in his study of instrumental teaching and learning at
an academy of music, that the student´s professional trust in his teacher is an important basis for
learning to take place. There is reason to believe that it might not always be easy to combine the
student´s need for a strong professional trust with the appraising and dispassionate attitude that
student evaluation implies.
My reason for asking these questions is the fact that student evaluation was implemented as a
mandatory procedure at my own institution, The Norwegian State Academy of Music, in 1994-95. In
the years since then I have understood from comments made by both students and teachers that
student evaluation is regarded as somewhat problematic, especially when it concerns the student´s
principal instrument teacher. The extensive research literature on student evaluation of teaching has to
a very limited degree addressed the special form of teaching that individual instrumental teaching
implies. I therefore decided to investigate if student evaluation is a useful means for improving
instrumental teaching. In this paper I will primarily focus on the consequences that student evaluation
might have on the teacher-student relationship.
Method
Since there is so little research done on this particular field, I chose to do an exploratory study. To
understand how student evaluation works, it is vital to understand how the persons involved
themselves look at it. I therefore conducted semi-structured qualitative research interviews (Kvale
1996) with principal instrument teachers and their students. The sample consists of 9 instrumental
teachers with long teaching experience. The interviews with the teachers indicated that there were
three different approaches to the use of student evaluation represented. I therefore chose one
representative for each of these approaches (teachers A, C and H) and interviewed 9 of their students
(students a1-3, c1-3, and h1-3). The students in the sample had all completed a minimum of two years
of study at the Academy, and they all had music performance as a central part of their programme.
Results
The results indicate that both teachers and students are well aware of the fact that the evaluation in
reality is not anonymous. The teacher has such a limited number of students, and normally knows
each of them so well that he can identify them. There might be ways to reduce the possibility of
revealing the identity of the students, but both students and teachers claim that student evaluation
might not serve its purpose then. If the evaluation does not reveal the needs and opinions of the
individual student to the teacher, it will not be of much help to him in tailoring his teaching for that
particular student. It is important to understand that student evaluation in reality is not anonymous,
whatever the procedures are, when discussing it in this particular context.

A fundamental question, then, is if the students dare to be honest in their evaluation. If not, student
evaluation loses its point. The results indicate that this can be a problem. One reason for this seems to
be that the student might be afraid of hurting his teacher's feelings, and therefore does not dare to be
frank and honest. This fear is something that preoccupies several of my informants:
This is how teacher F sees this:
... it is something to do with the «chemistry» also, you are kind to each other. They are much more
afraid of hurting the teacher in a way. (Teacher F)
Teacher H also indicates that students might be considerate in what they say:
In a way you have to attach more importance to any hint of objection that crops up and then decide
whether this is only a considerate way of saying that this is hopeless, because they don't dare to express
themselves more strongly. (Teacher H)
When I ask the teachers about their reactions to critical evaluations, some of them answer that
normally they do not feel upset; they can handle whatever comments they receive professionally.
Others again admit that they can feel hurt when being criticised. Teacher C describes her reactions in
this way:
...you are, quite naturally, a bit hurt by negative [ comments] , especially when you believe you are as
good as I believe I am. What? I, who am «world famous»! and so on. And it definitely hurts a bit.
(Teacher C)
Teacher J experienced some years ago that several of his students filed a complaint against him. This
is how he describes his reaction to being criticised:
When you are as fond of the students as I actually was − I loved the job because of the students − then it
comes as such a disappointment that you cannot describe it with words. ... As a teacher, you have to find
the balance between humility, self confidence and joie de vivre. My self confidence is still there,
strangely enough, even if the joie de vivre received a blow that lasted several years. I still haven´t got
over it entirely. I felt as if something died inside me at that time. (Teacher J)
Teacher J says that this experience makes him oppose the use of student evaluation, because the
thought of being evaluated at the end of the year will make him apprehensive. He fears this will
interfere with his teaching and make him a bad teacher.
When I ask the students if they are afraid of hurting their teacher, I receive different points of view.
One of the student answers that:
...you have to realise that getting a good education is your own responsibility. You can't be afraid of
hurting a teacher. You have to tackle the problem yourself and try to criticise. (student c2)
Others again feel that it is difficult to criticise the teacher, and two of the female students imply that
girls in particular might be afraid of hurting their teacher:
Afraid of hurting the teacher, yes, we probably are. I think that's true. I don't know about the boys, but I
have talked with a lot of girls, and I think many girls are afraid of hurting their teacher. (student a1)
It is probably typical of girls that we care more about people. You feel it hurts to criticise someone.
(student c1)
As described earlier, the research literature characterises the teacher-student relationship in

instrumental teaching as normally very close. This is also confirmed by my informants. It is also

evident from the interviews that many of the principal instrument teachers invest a lot more time and
personal commitment in their students than would be expected of a university professor. This personal
relationship between the teacher and his student can be understood as a fundamental trait of individual
instrumental teaching. There seem to be at least two possible reasons for this: Firstly, this type of
teaching implies a one-to-one teacher-student relationship that often lasts several years, years that are
of vital importance in a young musician's life. One student for example, compares the relationship
between teacher and student to a parent-child relationship in order to describe the bonds between
them. Secondly, it seems that characteristics of the subject matter; the music, forces both student and
teacher to expose themselves emotionally, and therefore to come closer to each other on a personal
level. Teacher F touches on this when she says:
F: With regard to having a close relationship - A lot of people say that the teacher-student relationship
should not become too personal, but I find that difficult to regulate. We talk a lot about real feelings
during the lessons, not just 4th finger on f sharp, right. We talk about what this music expresses. It might
sound sentimental, but you have to open up your whole register of feelings, and then you cannot just sit
there and keep a distance to the student. ... You cannot be close in your teaching without being close as
a human being.
I: And I suppose the students might feel the same way, and then they are perhaps afraid of hurting you?
F: Yes. That is the point exactly. (Teacher F)
The result indicates that there might be a price to pay for the closeness between the principal
instrument teacher and his student: Some students might not dare to voice any criticism for fear of
hurting the person he feels attached to thereby destroying the openness and intimacy that is so vital in
this type of teaching.
Another reaction to a negative evaluation can be anger and hostility. Such feelings can in themselves
be a strain on the relationship between teacher and student. In addition, they might lead to reprisals
against the student. Several of my informants comment on the fact that the instrumental teacher is in a
position where he has the means to retaliate in different ways, and that fear of reprisals might stop the
student from expressing any criticism of the teacher or his teaching.
Two of the teachers expressed their concern in this connection:
... the teacher can decide whether you are going to get a job engagement or not, then it is hopeless when
you know that you will be studying with that teacher for the next three years. There is no question of
making any criticism, as they know it will not improve things. The only thing that might happen is that
the relationship might becomes worse. You will definitely be out of favour with the teacher. (Teacher C)
...they will feel that they might insult me, or that I somehow might reject them if they have something
negative to say. ...In individual teaching, and in the milieu here as a whole, they are more careful not to
come into conflict with anyone. ... If they come into conflict with someone they may have the
impression that it could harm their career. (Teacher E)
When I ask the students if this is something they worry about, I get different answers. A few of them
say that they have never thought about it, but several students say that fear of reprisals has kept them
from being frank and open either with their present teacher or with former teachers.
The fact that the student often "surfs on the contacts that his teacher has in the job market" as one
student puts it, implies that the teacher has an instrument of power that he has the potential to use on
the student. Two students comment on that:
...you know very well that it is preferable not get onto bad terms with your teacher, because then you

will not get jobs. ... I am very much aware of the fact that if I got into a major conflict with her, I would
have a problem getting those jobs, and those are jobs that I really want. Then it becomes just hopeless.
(student c1)
...because often if you get onto bad terms with your teacher, it implies that you will have difficulties in
the free lance market and the like. It is a problem, really a problem. (student c2)
The principle instrument teacher can choose to use his contacts in the job market for the benefit of his
student, or he can choose not to use them. It is not surprising, then, that the student in some cases
thinks it wiser to stay friends with his teacher by holding his tongue.
As we saw earlier, instrumental teaching is often described as learning by apprenticeship. I was
therefore interested in finding out if the roles of the master/teacher and apprentice/student are
perceived as consistent with student evaluation.
Teacher C´s answer indicates that student evaluation is not a natural part of this teaching tradition:
It is not the usual way of thinking; to let the students evaluate. I don't think it is common among my
colleagues or myself. In this master-apprentice tradition you are what you are, namely in this case, a
musician ... It is not natural for the master to ask for an evaluation, because the master is, per definition,
a master ... student evaluation is not perceived as natural within the master-apprentice tradition, it just
isn't; you only destroy yourself. (Teacher C)
At the same time, both this teacher and most of the others I interviewed were anxious not to be
identified with a master role in the sense of someone who has all the answers, and several of them
express a strong wish to reduce their authority towards the students.
Some of the students I interviewed also state very clearly that they do not feel the authority of the
teacher as a hindrance in the sense that the teacher would object to being evaluated. On the contrary,
several comment on the fact that they perceive their teachers as being anti-authoritarian and open to
feed-back and criticism.
Nevertheless, it might not always feel natural for the students to evaluate and be critical to their
teacher. One student expresses himself in this way:
But it is sometimes a bit ridiculous that you as a 20-year old should criticise a teacher who has 30 years
of experience. ... I have that much respect for C´s experience not to criticise her teaching in this way.
You have to accept it as it is. (student c2)
Teacher J seems to agree. He claims that student democracy has gone too far, and that student
evaluation is neither appropriate nor necessary:
If we are supposed to have the best teachers in Norway here, I feel that, in a way, this should be quite
unnecessary. (Teacher J)
And even if the teachers might wish to play down their authority, the students might not perceive it in
the same way. One of the students claims that the teachers might not realise how strong their authority
in reality is, and that they perhaps underestimate their power over the students:
At least I feel that in this master-apprentice relationship in which we actually find ourselves, the teacher
has a lot of power. ... this power is not obvious to the person posessing it, only to the one who might be
exposed to it. I have been teaching enough myself to know that you don't feel very powerful when you
stand there [ in front of a class or student] , but nevertheless you are, because it is your agenda, it is your
word that counts. It is easy to forget, is all too easy to forget that when I teach. And I suppose it is as
easy to forget for an instrumental teacher, also because you have such a friendly relationship with your
student. (student a1)
Furthermore, evaluating the teacher might for some students be incompatible with having a great

professional confidence and trust in his teacher, a trust that seems to be fundamental in learning by
apprenticeship:
In my opinion, to put up too much resistance against the teacher or the type of system he has, just doesn't work,
especially in the type of teaching tradition that we have. I think you have to decide to go along with him
entirely, or otherwise you have to find yourself another teacher. (student a1)
Teacher E expresses somewhat the same attitude when looking back to his own student days:
E: It is a question of faith, to subject oneself to teaching. It is a question of believing in it.
I: Believing in what the teacher has to offer you?
E: Yes, for me it was. I had to make a choice: either I was suspicious and distrustful, or I just had to «swallow»
what he came with. And then, in a way, you have put behind you that dispassionate and critical attitude. You
have to have faith in the person and trust that this will work out. (Teacher E)
The results indicate that there might be some role expectations built into this kind of teaching that can
make it difficult for the student to have a dispassionate and appraising attitude towards his own
teacher. The teacher´s professional authority per se sometimes seems to be an obstacle, even if the
teacher himself does not necessarily stress his authority or expect any reverence. The reason might
just as well be that the student needs to have complete faith in his teacher as a professional authority.
But the results also indicate that some teachers might feel student evaluation to be alien to the kind of
roles he and his students have within this teaching tradition.
Conclusion
We have seen that the teacher-student relationship plays a decisive role in the student´s development
towards becoming a professional musician. His professional trust in his teacher is a fundamental
condition in this relationship In his book Personal Knowledge Michael Polanyi (1958:53) underlines
the importance of this almost blind trust when he writes: «You follow your master because you trust
his manner of doing things even when you cannot analyse and account in detail for its effectiveness.»
It seems, however, that it is not always easy to combine this trust with a more democratic relationship
and a dispassionate and appraising attitude. Both teachers and students might feel that student
evaluation confuses the roles.
Individual instrumental teaching is a kind of teaching that normally creates closeness between teacher
and student, but it also presupposes closeness to succeed. The results of this study indicate that
students might be very anxious not to destroy this intimacy and confidence. In this situation student
evaluation can be a double-edged sword. On the one hand it can help the student to express any
negative feelings he might have in a regulated and accepted context, and thereby contribute to
reducing the tension in the dyadic relationship. Gaining insight into the needs and feelings of the
student will also enable the teacher to adapt his teaching and thereby prevent future disappointments
and frustrations. On the other hand it seems as if student evaluation in some cases actually results in a
deterioration of the relationship because the teacher cannot handle negative evaluations and feels hurt
or even becomes hostile. In other words; in some cases student evaluation can be counterproductive.
The students seem to be painfully aware of this possible outcome, and their strategy in some cases,
seems to be to keep quiet. They prefer to live with the problems, rather than tackle them by criticising,
or they change teachers if it becomes too much of a strain. In many cases this fear of the teacher´s
reaction might be groundless. Many teachers probably handle criticism professionally and do not let
the student notice any negative reaction. But at the same time, the students´ tales of experiences they
have had trying to voice criticism to teachers they have had through the years, give grounds for

concern. It underlines how important it is that the teacher has a highly developed professionalism and
ethical awareness in his role as a teacher. If not, student evaluation might actually make things worse
for the student.
Instrumental teachers are only human beings, naturally. Human beings, who invest a lot of time,
commitment and professional reputation in their work as teachers. Disappointment and anger are
therefore understandable reactions when the student is dissatisfied, or does not want to accept what
one has to offer. In such a close relationship both parties are dependent on each other for support and
acknowledgement, and, therefore, they have power over each other. As the Danish philosopher Knud
Eilert Løgstrup (1999) says, in every relationship we hold something of another human beings life in
our hand: We are each other´s destiny and have therefore power over each other. The teacher is the
student´s destiny, but the student is to a large extent also the teacher´s destiny. We understand that
when we listen to teacher J tell about his reactions to being criticised. Intimate relationships imply that
one has to reveal oneself to the other person, and for that trust is a precondition. «Acknowledgement,
respect and consideration can only develop between persons who dare to expose themselves to each
other in the conviction that they will not be rejected by the other part» (Bergem 1998:80). Criticism
and negative evaluations can easily be regarded as a rejection of what one stands for both as a teacher
and as a musician, and a natural reaction to this might be a feeling of hurt or anger. That is why it is so
important that the teacher is aware of the ethical demands that are ingrained in his role as a teacher.
The teacher is always the stronger part in a relationship that per definition is asymetric, no matter how
close it might be. This imposes on the teacher an ethical responsibility towards the student: He has to
control his own reactions, and put his own needs aside in favour of the student´s.
A first condition for being able to act morally, is to be aware that one is facing an ethical demand
(ibid.): The teacher must realise what a high price the student might have to pay for his honesty if it is
not met in a decent and ethically justifiable manner. Otherwise, the teacher risks, if only through
ignorance or thoughtlessness, misusing his power to preclude the student from the possibility of
criticising; criticism that might be both justified and necessary in order to improve his learning.
References:
Bergem. T. (1998). Læreren i etikkens motlys. (The teacher in the light of ethics) Oslo:
adNotam Gyldendal.
Braskamp, L. A., & Ory, J. C. (1994) Assessing Faculty Work. Enhancing Individual and
Institutional Performance. San Francisco: Jossey Bass Publishers.
Centra, J. A. (1993). Reflective Faculty Evaluation. San Francisco: Jossey-Bass
Publishers.
Ericsson, A. K. (1997). Deliberate practice and the acquisition of expert performance: An
overview. In H. Jorgensen and A. C. Lehmann (Eds.). Does practice make perfect?
Current Theory and Research on Instrumental Music Practice. Oslo: Norges
musikkhøgskole. pp 9-51.
Kingsbury, H. (1988). Music, Talent and Performance: A Conservatory System.
Philadelphia: Temple University Press.
Kvale, S. (1996). InterViews. An introduction to Qualitative Research Interviewing.
Thousand Oaks: Sage Publications.

Løgstrup, K. E. (1999). Den etiske fordring. Oslo: J.W. Cappelens Forlag.

Manturzewska, M. (1990). A Biographical Study of the Life-Span Development of
Professional Musicians. Psychology og Music, 18, 112-139.
Moses, I. (1986). Student Evaluation of Teaching in an Australian University − Staff
Perceptions and Reactions. Assessment and Evaluation in Higher Education, 2, 117-129.
Nettl, B. (1995). Heartland Excursions. Ethnomusicological Reflections on Schools of
Music. Urbana: University of Illinois Press.
Nielsen, K. (1998). Musical Apprenticeship. Learning at the Academy of Music as
Socially Situated. Ph.D.Dissertation. Aarhus University.
Nielsen, K., & Kvale, S. (1997). Current Issues of Apprenticeship. Journal of Nordic
Educational Research, vol.17, 3. pp. 130-139.
Polanyi, M. (1958). Personal knowledge. London: Routhledge and Kegan.
Ryan, J. J., Anderson, J. A., & Birchler, A. B. (1980). Student Evaluation: The Faculty
Responds. Research in Higher Education, 4, 317-333.
Seldin, P. (1989). Using Student Feedback to Improve Teaching. New Directions for
Teaching and Learning, 37, 89-97.
Seldin, P. (1993). The Use and Abuse of Student Ratings of Professors. Chronicle of
Higher Education, 46, 40.
Sosniak, L. A. (1990). The Tortoise, the Hare, and the Development of Talent. In M. J.
A. Howe (Ed.). Encouraging the Development of Exceptional Skills and Talents.
Leicester: BPS Books.
Strike, K. A. (1991). The Ethics of Evaluation. In J. Millman and L. Darling-Hammond
(Eds.) The New Handbook of Teacher Evaluation. Newbury Park, Cal.: Corwin Press. pp.
356-373.
Tiberius, R. G., & Flak, E. (1999). Incivility in Dyadic Teaching and Learning. New
Directions for Teaching and Learning, 77, 3-12.
Back to index

Motivation and Expertise:
Proceedings paper
The role of teachers, parents and ensembles in the development of instrumentalists
Beatriz Ilari
bilari@po-box.mcgill.ca
McGill University
Faculty of Music
555 Sherbrooke Street West, Montreal, QC H3A 1E3, Canada
The outstanding performance of remarkable individuals has long interested scholars, educators and
researchers (Ericsson & Charness, 1994; Csikszentmihalyi, 1996). Expertise is the term used to designate
optimal human performance and there are many reasons that explain our interest in it. From a sociological
viewpoint, we understand that exceptional performance in certain domains is a culturally valued behavior
(Simonton, 1999) and is often a synonym of success. Many terms such as gifted, prodigy and genius
designate and discriminate exceptional performers from the rest of the population; although these terms
have been changing with the constraints of time (Csikszentmihalyi, 1996). From an educational viewpoint,
we are fascinated with the possibilities of understanding the cognitive processes employed by these
outstanding performers as such understanding could perhaps, shed light into the development of educational
strategies which might help a large number of individuals to achieve. According to Collins, Brown and
Newsman (1989), as learners we tend to compare our performance to the one of the expert, in order to
situate our knowledge within the domain and improve our skills. Yet, the notion of expertise has gone
through several transformations. In the past, experts were able to handle different tasks in different domains
but, as culture evolved, domains have split into sub-domains and specialization has been a natural trend
(Csikszentmihalyi, 1996). According to Sternberg (1998), expertise involves acquisition, storage and
utilization of at least two types of knowledge: - explicit knowledge of a domain and implicit or tacit
knowledge of a field. Then, it is presumable that experts have knowledge of the facts, formulas and main
ideas of a domain, as well as a "non-verbalized knowledge form"; both needed to obtain success in a field.
In the last fifteen years, many cognitive scientists have studied the expert performance of individuals in
domains that use symbolic representation such as mathematics, calculation, chess and music (Ericsson &
Charness, 1994). In the domain of music some studies have been carried out to investigate the performance
of expert musicians. While some studies concentrated on the thought processes employed by expert
musicians (see McPherson, 1997; Whitaker, 1996; Younker & Smith, 1996), others focused on the factors
or motives that lead performers to devote a large amount of time to music.
Motivation plays a very important role in the understanding of expert performance. The literature suggests
two types of motivation often related to music: - intrinsic and extrinsic. Sloboda (1993) teaches us that
intrinsic motivation is developed from intense pleasurable experiences with music, which might lead to a
deep and fulfilling personal commitment to music whereas extrinsic motivation is more related to achieving
certain goals such as parental/peer approval or winning competitions than to music itself. It seems that all
individuals have a mixture of the two types of motivation (Sloboda, 1993)
Research has shown that, as an average, elementary schools start providing instrumental instruction at age
nine (Martignetti, 1966; Mackenzie, 1991).Children who are engaged in private instruction usually start
file:///g|/poster1/Ilari.htm (1 of 11) [18/07/2000 00:28:50]

earlier depending on instrument choice and family/cultural influence. Klinedinst (1992) concluded that
approximately 25 percent of students who start instrumental lessons discontinue them after one year of
instruction. Henson (1974) found that the decision to dropout of instrumental education usually happens
during the first three years of instruction. As Sloboda and Howe (1991) pointed out, only a minority of
beginners will persist taking lessons until they reach a high level of musical competence.
Parental involvement and support seems to be a strong variable of motivation in instrumental instruction.
Parents who support their children in the early years, regardless of their musical competence, are an
important source of motivation (Doan, 1973; Davidson, Sloboda & Howe, 1995). A supportive
environment is clearly important for the success of the young instrumentalist (Allen, 1974; Allen, 1998;
Bonifatti, 1997; Davidson, Howe, Moore & Sloboda, 1996; Henson, 1974; Martignetti, 1966; Webber,
1974).
The relationship between music teacher and student is also very important when studying motivation for
instrumental learning. Sandene (1997) suggested that students are often discouraged by the negative attitude
of teachers who are too concerned with achievement and have an ego-goal orientation, creating fear instead
of pleasure and enjoyment during lessons. Davidson & Scutt (1999) studied the teacher, student and
parents' interaction before, during and after musical examinations and concluded that, although the music
learning process is of a "triadic nature" involving teacher, student and parents; the teacher is still the central
figure, responsible for shaping the learning experience by mediating the relationship between parents and
students. The study also emphasized that teachers' comments have critically an important role to play in the
learning process as parents usually count on teachers opinions and ideas. Sloboda and Howe (1991) who
studied the lives of young musicians found that teachers are extremely important especially in the early
years of instruction. While young musicians who persist in instrumental instruction learn how to
differentiate and distinguish between professional and personal qualities of their instructors, children who
dropout of instruction can not make such judgements (Davidson, Howe & Sloboda, 1995). A student's first
instrumental teacher is very important, and personal warmth seems to be an essential characteristic when
working with young musicians (Howe & Sloboda, 1991).
Many researchers have looked at the role of affect in motivation. Enjoyment and pleasure seem to be
important for keeping students motivated in instrumental instruction. Csikszentmihalyi (1990) believes that
music instruction often over emphasizes how children perform rather than what they experience. Sloboda
(1993) suggests that when too much emphasis is placed on achievement, especially in the early stages of
learning, intrinsic motivation is often inhibited and students experience anxiety instead of pleasure. Many
parents and teachers have expectations which are too high and generate great stress in the child, instead of
enjoyment. Indeed, as Howe and Sloboda (1991) suggested, many students discontinue their musical
instruction due to a lack of enjoyment when playing and practicing their instruments.
Disinterest in music seems to affect student's decision to continue music instruction. Many students lose
interest in music due to an inadequate choice of instrument (Martignetti, 1966; Henson, 1974; Allen, 1995).
The difficulty of the instruments is also mentioned as a cause for loss of interest in music as suggested by
Martignetti (1966) and Henson (1974). Casey (1964) suggested that loss of interest is sometimes related to
student's inability in achieving a satisfactory level of performance. Loss of interest in music is often not
easy to explain in words, although it is responsible for many cases of student's dropouts of instruction.
Students beliefs for success and failure in music contribute enormously for their persistence in instrumental
instruction. Many students often believe that they are not "good enough" for music and therefore, dropout
of instruction. Asmus (1986) investigated children's beliefs for success and failure in music. He found that
students, while young , tend to believe that effort is what justifies their success or failure in music. As
students mature, their beliefs change and they tend to attribute their success or failure in music to ability.
Students beliefs should be considered and taken in account as they have clear effects on achievement
(Asmus, 1986; Chang & Costa-Giomi, 1993).
Other studies that investigated the role of motivation in instrumental music education suggest that, while in

school, students are many times forced to opt between music and other classes or activities, dropping out of
instruction (Martignetti, 1966; Casey, 1964; Henson, 1974).
While most of the studies tend to look at children's motivation, a rather small number of studies have
investigated the perceptions and opinions of adult instrumentalists, recalling their experiences as students.
Not that instrumental music education should be viewed and geared towards a musical career choice, but in
some ways, when a student chooses to become a professional musician, it is assumed that she or he is
motivated for music. In a longitudinal study, Manturzewska (1990) investigated the life-span development
of Polish professional musicians and concluded that family environment and intrinsic motivation are the
most important contributors to an effective and meaningful instrumental education. The study also
mentioned the importance of teachers, colleagues and social and emotional support in the motivation of
instrumentalists. Sosniak (1985) interviewed North American professional pianists in the beginning of their
careers, as well as their parents and confirmed the assumption that long term support from parents and
teachers is essential for a successful instrumental education. The importance of instrumental practice was
also emphasized by the vast majority of musicians interviewed by Sosniak (1985).
The purpose of this exploratory study was to compare the opinions of Brazilian and Canadian adult
instrumentalists on motivation in instrumental music education. The variables studied were teacher's
influences, parental involvement, musical background and, participation in ensembles. The study examined
if Brazilian and Canadian musicians answer in a similar manner, or agree about the relevance of these
variables as motivating factors.
The hypotheses for this exploratory study were:
1. Teachers are the most important influences in instrumentalists lives regardless of cultural differences.
2. Both, Canadian and Brazilian musicians consider participation in ensembles essential to student's
development.
3. All musicians, regardless of cultural background, have thought about discontinuing their musical
activities.
METHOD
Thirty-one musicians participated in the exploratory study; 18 Brazilian and 13 Canadian. Brazilian
musicians (12 females and 6 males) ranged in age from 23 to 43 (mean age of 31), started playing their
instruments between ages 5 and 23 (mean age of 12) and were playing their instruments from 11 to 20 years
(mean of 16). Canadian musicians (7 females and 6 males), ranged in age from 20 to 37 (mean age of 26
years), started playing their instruments between ages 4 to 22 (mean age of 10 years) and were playing their
instruments from 2 to 30 years (mean of 14). The instruments played by these musicians were: - violin (6),
viola (2), cello (1), double bass (3), flute (1), oboe (1), clarinet (1), French-horn (2), trombone (1),
percussion (1), piano (6), voice (1), guitar (1), and multiple instruments (4). All musicians in the study
expressed their desire to continue in the career as professional musicians.
A survey in Portuguese and English languages was developed to gather information on instrumentalists'
main musical activity, musical background, important influences in their studies, intentions to dropout of
instruction or discontinue musical activities, family musical background and support, participation in
ensembles and importance of such practices. Subjects were also asked to write a short and concise
definition of a good instrumental teacher.
All Brazilian surveys were translated to the English language. To verify consistency in the translations, all
answers were checked by an English-Portuguese translator. For each question of the survey, verbal answers
were analyzed, classified and categorized with the use of numbers. Similar answers received similar
numbers and profiles were created to better assess in the interpretation of the results. Since many questions
were of a rather descriptive nature, many respondents answered in essay form. Answers were then
classified and categorized by the use of multiple numbers contributing enormously to the study. This

explains the reason why very often there was a larger number of answers than the number of respondents.
RESULTS & DISCUSSION

The results of the study are given following the order in which the questions appeared in the surveys. While
Brazilian musicians were represented mainly by orchestral musicians, Canadian musicians were represented
by professionals or semi-professionals of different areas in the field including students specializing in
instrumental performance. Canadian and Brazilian musicians recalled their musical beginnings and gave
different reasons for beginning instrumental lessons, as seen on table 1.
Table 1
Reasons for starting lessons:
1. Desire to play, innate, not necessarily explained.
2. Enchantment - Heard a particular piece or performer/group and was instantly drawn to music.
3. Had a musician in the family and music was a "natural" or mandatory choice.
4. Parents put children in music school/lessons.
5. Desire to learn how to read music or play properly.
6. Music was the only interesting alternative as an after-school activity.
7. Wish to play in an ensemble.
8. Desire to explore with musical sounds.
9. Program offered at high school or CEGEP.
Canadian and Brazilian responses can be seen in Graph 1.

While 32% of the Brazilian answers regarded enchantment with music, only 7% of Canadian responses
considered the same enchantment as an important motivator to start music lessons. Music's enchantment
power has been discussed and described extensively especially through non-scientific manners. Throughout
the world there are many stories, legends, myths and tales of enchantment by music like in the Italian tale
of Taranto and the origin of the dance style known as tarantella; the stories of the serpent enchanters in
India or the German fable "The flutist of Hamelin". The power of music is enormous, although we are still
not able to verbalize many of the sensations that it produces.
Perhaps we might be able to speculate and link this enchantment caused by music to an "affective" aesthetic
response. A naive listener would have a very deep aesthetic experience which would lead him or her to the
desire of playing an instrument. Despite different definitions or explanations, it is undeniable that what
respondents called enchantment was responsible for initiating many talented musicians to the art of playing

an instrument. Surprisingly, while a larger number of Brazilian musicians in this study explained their
initial motivation for playing an instrument due to enchantment, only one Canadian musician answered in a
similar manner. The question that remains here is if this answer is connected to a cultural issue or if it is just
related to chance. Further research would be needed to address this question.
Still discussing initial motivation to start instrumental instruction, it was found that 47 % of Canadian and
28% of Brazilian responses related to the idea that the motivation to play an instrument comes, many times,
from some innate desire, not always easy to explain in words. This finding suggests that there might be
some individuals naturally drawn to music. As researchers we still need to investigate these desires or
natural "drawns" in order to understand their nature, if genetic or not, and situate and frame learning within
such context.
Teachers seem to be the most important influences in the lives of instrumentalists regardless of their
cultural background. Similarly, 44% of the Brazilian and 37% Canadian responses emphasized the
importance of teachers in the lives of performers. However, Brazilian and Canadian musicians differed in
their second choice of influence; while 27% of Canadian responses addressed the importance of
participating in an ensemble as a motivator for instrumental music education, Brazilians suggested other
issues as important motivators such as participation in a particular ensemble or desire
to have a "particular life style", only peculiar to artists. Table 2 shows the responses presented by both:
Brazilian and Canadian subjects.
Table 2. - Most significant influences in musicians lives
Category Description Brazilian Responses Canadian Responses
Number
1 Teacher 44% 37%
2 Musician/Performer/Group 7% 21%
3 Musical Work 3% 5%
4 Participation in an ensemble 20% 26%
5 Participation in a competition - 5%
6 Other 27% 5%
As mentioned earlier, when talking about major influences in their musical lives, most subjects mentioned
an inspiring teacher as the main influence. This answer agrees with the hypothesis of teachers as the most
important influences in the lives of instrumentalists. However, many teachers play an important role in
discouraging students. Swanwick (1995) believes that instrumental instruction is many times related to
elements of luck and chance , as it is done on a one-and-one basis and is rather idiosyncratic, depending
mainly on teachers' approach. A preoccupation with the education of the instrumental teacher has grown
enormously, and many schools in the United States offer programs exclusively developed for those who
wish to dedicate their lives to instrumental music education.
The most difficult answer to categorize was the one related to the definition of a good teacher. Although the
definitions of a good instrumental teacher were similar in their contents, Brazilian and Canadian
instrumentalists answered in different manners. Canadian musicians tended to be more objective and
concise in their answers, using fewer words to describe their ideal teachers:
Supportive and demanding.
Original.

Someone whose lessons are inspiring.
Brazilian musicians tended to use more words to describe their ideal masters:
Someone who is updated, open minded and allows creativity and new ideas.
One who stimulates students to love and get more involved technically and artistically speaking.
One who helps students develop a critical view of his/her own development and shares the "secrets" of the
profession.
However, both groups, Brazilian and Canadian agreed upon the assumption that a good teacher is someone
who has a good technique, is knowledgeable of his or her instrument and is able to relate it to important
elements and aspects of music history and theory. Other ideas presented in all surveys were related to
teacher's patience, respect for the student, attitudes and faith towards students:
A good teacher has a great deal of faith in his/her students' ability to become a good performer. They
communicate clearly and don't give up on trying to find ways to express important concepts to his/her
students.
An ideal teacher is someone who can look at their students and guide them to what he/she believes is their
largest potential; if the student is talented for chamber music, the teacher should emphasize that, if the student
is talented for solo performance , then that should be emphasized and so on. That is the art of teaching, is
knowing how to bring out the best of each student in a respectful way.
It seems that there will never be one single definition of a good teacher, as people are different and learn in
different ways and at different paces. What seems to be a consensus is the need for knowledge; teachers
must be good instrumentalists themselves. A good relationship with the instrumental teacher is also
desirable and might influence enormously in the student's motivation.
The issue of continuing or discontinuing musical activities generated the largest controversy. In table III,
the results for Brazilian and Canadian subjects regarding the question :
"During your life as a music student, did you ever consider discontinuing your music lessons
and activities?"
Table 3 - General Responses Table 3a - Brazilian Responses by Gender Table 3b - Canadian Responses by Gender
YES NO YES NO YES NO
Brazilian 16 (88%) 2 (12%) Female 12 (100%) 0 Female 2 (28%) 5 (72%)
Canadian 5 (38%) 8 (61%) Male 4 (67%) 2 (33%) Male 3 (50%) 3 (50%)
Interestingly 88% of Brazilian subjects mentioned their desire to discontinue playing their instruments
while still studying, while only 38% of Canadian subjects expressed such intention. Many reasons were
given to explain the desire to discontinue instrumental instruction. Some were of a rather philosophical
nature, when subjects questioned the validity of music in their lives and their positioning towards music:
I went through a period of questioning. Why music in such a messy world?
I felt a lack of motivation to create new things. It seemed that there was nothing new to be learned as there was a constant repetition of ideas in
instruction.
Many musicians mentioned the difficulties of music itself and the type of commitment that is necessary to
achieve a satisfactory performance level:
I wasn't sure if I was good enough for music.
I thought about discontinuing music when I found out how much work is needed in order to achieve a satisfactory performance level.

Other instrumentalists thought about discontinuing their instruction when they found out about the working
environment and its conditions. This answer appeared solely in the surveys of the Brazilian musicians, and
was the main reason given to explain the desire of discontinuing instrumental instruction. Knowing that
Brazil has no strong tradition in instrumental music education, and that students often face many difficulties
to pursue their education it is not difficult to understand the wishes to dropout of instruction. Another
explanation, still of a social-economical nature is the fact that Brazilians are many times forced to start
working professionally while still studying, and are often not ready to face the challenges and the routine of
the working environment. Then, they are more likely to be disappointed with the working environment and
its realities. Also, as Gainza (1984) pointed out, musicians in Latin America deal with constant political and
economical changes that affect thoroughly their studying conditions. Still, more studies are needed to
investigate these important social, economical and cultural issues.
The data obtained from the question on discontinuing instruction indicates clearly the existence of two
different types of motives. We could also refer to such motives as internal and external. External motives
would be those external to the individual, such as financial difficulties or lack of qualified teachers. Internal
motives would be those very particular to each individual, which could be exemplified by the answers that
referred to the periods of questioning the validity of music in one's life or a loss of interest for music. It
seems that these external and internal motives are associated with the concepts of extrinsic and intrinsic
motivation. Although difficult to classify, in this particular study, the vast majority of responses related to
continuing/discontinuing music instruction were of an internal nature. This suggests that intrinsic
motivation plays a very important role in the development of performers. The difficult issue lies on finding
ways to help students develop intrinsic motivation. In an attempt to address such question, Deci (1995)
suggested that, to foster intrinsic or self motivation, the emphasis should be in helping people create the
conditions within which others would motivate themselves rather than trying to motivate them. Deci (1995)
added that extrinsic motivation works solely when the goals and results are known in advance and matched.
Nevertheless, there are many ways to foster intrinsic motivation, through external factors.
Family support plays an essential role in instrumental music education. The vast majority of musicians
talked about an existing amateur musician in the family. Other musicians described a big inspiration
coming from siblings, parents or relatives who were involved in music, and how music seemed to be a
natural path to them. The few musicians who had no family support mentioned difficulties in staying
motivated for music, and more desires to discontinue their instruction.
Most subjects mentioned that they had family support while playing their instruments. Among Canadian
subjects, 92% said they had parental support while pursuing their instrumental instruction and 8% answered
that they had "more or less" parental support. No Canadian subjects mentioned lack of parental support.
Among Brazilian musicians, 78% mentioned support, 17% used the term "more or less" and only 5% felt a
lack of support from parents. Interestingly 81% of the total number of subjects (Brazilian and Canadian)
commented on the existence of at least one person in the family playing a musical instrument. The
instruments played by the family members varied: - violin, piano, guitar, accordion, flute, recorder, cello,
bassoon, oboe and voice.
Another agreement found in this study, relates to ensemble experience. All respondents, no exception,
mentioned the importance of ensembles in instrumental music education. Ensemble experience is seen as a
form of broadening student's perspectives on music as the student learns more repertoire, develops
attention, focus, concentration, discipline and intonation. Students also interact with other people learning
respect and leadership. Especially if we consider that a vast majority of musical activities involve ensemble
work. One subject mentioned that, through ensemble experience, students learn early to sacrifice their own
musical ideas for the greater goal of the grand musical schema, something that he or she will experience at
one point of his/her musical career. Ensemble experience was considered then, an important form of
education.
One question that was raised was related to a comparison between the number of Brazilian and Canadian

musicians involved in orchestral activities. Since there was a larger number of orchestral musicians among
the Brazilian subjects, it was hypothesized that, Brazilian students probably enroll in orchestral activities
earlier than Canadian students due to financial needs as most Brazilian youth orchestras offer small
stipendiums or salaries to their participants. Another hypothesis was that there are fewer ensemble
possibilities for students in Brazil than in Canada, which might explain a larger participation in orchestras.
Still further research is needed to answer these questions.
CONCLUSION
The present study investigated the role of teachers, parents, family and participation in ensembles in
students' decision to continue playing musical instruments. In agreement with previous research on
motivation in instrumental instruction, this study found that teachers and family provide a very important
source of motivation for instrumentalists regardless of their cultures. The same can be said about
participation in ensembles. As many musicians described, by participating in ensembles, students learn to
share musical ideas, gain sight reading skills, and develop perception, attention, concentration and
leadership.
However, cultural aspects should be taken in account when considering students' motivations for starting or
discontinuing music training. In this particular study, most musicians had parental support and had a
musician, professional or amateur, in the family. Perhaps a study conducted with a larger number of
subjects would provide different results. Interestingly, it was evident that depending on cultural
background, subjects responded questionnaires in one or another way; using more or less words and
different expressions reflecting different understandings. These cultural aspects are extremely important as
they show us differences and similarities among people, and should be carefully observed in further
research.
Nevertheless, intrinsic motivation seems to be a key concept in the development of performers, regardless
of cultural background. It is still unclear how people gain and lose interest for music; or what causes them
to initiate or terminate musical activities and performance. We could speculate that intrinsic motivation is
related to feelings of competence and autonomy as suggested by Deci (1995) and Csikszentmihalyi (1990).
Perhaps teachers could help students develop some sense of competence and autonomy by setting goals that
are not beyond or below each student's capacity or need; creating conditions for the development of
intrinsic motivation.
Still, there are many questions related to motivation for music, many that we are not able to answer
precisely. Are there genetic differences that determine people's involvement in activities that require a lot of
practice, patience and concentration, such as music? Can we compare music to passion in the sense that, in
both: music and passion we get "enchanted" and supposedly, if we are able to maintain this "enchantment"
for long periods of time we might develop strong skills and intrinsic motivation to make a long term
commitment? In other words, can we as teachers, help students keep enjoying music throughout the years
while teaching them the necessary, and often, difficult skills?
In summary: there is still much to be discovered about human motivation for music. Hopefully, research in
this area will help us understand and teach the beauties and challenges of music in a more meaningful way,
considering enjoyment and pleasure. A student's relationship with the instrument can be a very rich one. We
just have to help students find their own paths of development and enjoyment through instrumental music.
As Leonard Bernstein (1964), the American conductor and composer once said:
" Why? Motivated by what? That, thank Heaven, is still a glorious mystery; and it is a mystery that
enshrouds every artist I know, rich or poor, successful or not, old or young. They write, they paint, they
perform, produce, whatever, because life to them is inconceivable without doing so."
REFERENCES

Allen, M.L. (1998). An Investigation of Selected Retention Variables Among Middle School String Students. Paper presented
at the Music Educators National Convention. Phoenix, AZ.
Anderson, M. (1996). A study of motivation and how it relates to student achievement. Canadian Music Educator, 38, 29-31.
Asmus, E.P. (1994). Motivation in music teaching and learning. Quarterly Journal of Music Education - Teaching and
Learning, 5, 5-32.
Asmus, E.P. (1986). Students beliefs about the causes of success and failure in music: A study of achievement motivation.
Austin, J. (1991). Competitive and non competitive goal structures: an analysis of motivation and achievement among
elementary band students. Psychology of Music, 19, 142-158.
Borges-Scoggin, G.A. (1993). A study of the pedagogy and performance of string instruments in Brazil and the social,
cultural and economic aspects affecting their development. (Doctoral Dissertation, University of Iowa, 1993). Dissertation
Abstracts International, 49. (University Microfilms No. 9421227-dd).
Chang, M.L. & Costa-Giomi, E. (1993). Instrumental Student Motivation: An exploratory study. Missouri Journal of Music
Collins, A., Brown, J.S. & Newman, S.E. (1989). Cognitive apprenticeship: teaching the craft of reading, writing and
mathematics. In L.B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser . Hillsdale:
Erlbaum, 453-494.
Csikszentmihalyi, M. (1996). Creativity - Flow and the psychology of discovery and invention. New York: Harper & Collins.
Csikszentmihalyi, M. & Csikszentmihalyi, I.S. (1993) Family influences on the development of giftedness. In G. R Block
(Ed.): Proceedings of the Symposium on the Origins and Development of High Ability, held at the CIBA Foundation, London,
January 25. Chichester: John Wiley and Sons, 187-206.
Csiksentmihalyi, M. (1991). Flow - The psychology of optimal experience. New York: Harper & Row.
Davidson, J.W. & Scutt, S. (1999). Instrumental learning with exams in mind: a case study investigating teacher, student and
parent interactions before, during and after music examinations. British Journal of Music Education, 16, 79-95.
Davidson, J.W., Sloboda, J.A. & Howe, M.J.A. (1995). The role of parents and teachers in the success and failure of
instrumental learners. British Journal of Developmental Psychology, 14, 399-412.
Deci, E. (1995). Why we do what we do - Understanding self-motivation. New York, Penguin Books.
Doan, G. (1973). An investigation of the relationships between parental involvement and the performance ability of violin
students. (Doctoral Dissertation, Ohio State University, 1973). Dissertation Abstracts International, 49.
Ericsson, K.A. & Charness, N. (1994). Expert Performance - Its structure and acquisition. American Psychologist, 8, 725-747.
Eisner, E. (1993). Objectivity in Educational Research. In: M.Hammersley (Ed.) Educational Research: Current Issues.
Toronto: Paul Chapman Publishing Company.
Gainza, V. (1984) Music in the Americas. International Society for Music Education Yearbook, 11 (1984): 30-37
Henson, M. (1974). A study of dropouts in the instrumental music programs in Fulton County and City of Atlanta School
Systems. (Doctoral Dissertation, Florida State University, 1974). Dissertation Abstracts International, 49. (University
Microfilms No. 1976-01738-dd).
Howe, M.J.A. (1993). The early lives of child prodigies. In: G.R. Bock (Ed.): Proceedings of the Symposium on the Origins
and Development of High Ability, held at the CIBA Foundation, London, January 25. Chichester: John Willey and Sons,
85-105.
Howe, M.J.A. & Sloboda, J.A. (1991). Young musician's significant accounts of significant influences in their early lives. 1.
The family and the musical background. British Journal of Music Education, 8, 39-52.
Howe, M.J.A. & Sloboda, J.A. (1991). Young musician's significant accounts of significant influences in their early lives. 2.
Teachers, practicing and performing. British Journal of Music Education, 8, 53-63.
Klinedinst, R.E. (1992). Ability of selected factors to predict performance and retention of fifth grade instrumental music
students. Bulletin of the Council for Research in Music Education, 111, 49-52.

Lehman, A.C. 1997). The acquisition of expertise in music: Efficiency of deliberate practice as a moderating variable in
accounting for sub-expert performance. In: I. Deliege & J.Sloboda (ed.) Perception and Cognition of Music. Taylor and
Francis, East Sussex, 161-190.
Mackenzie, C. (1991). Starting to play a musical instrument: a study of boys and girls motivational criteria. British Journal of
Music Education, 8, 15-19.
McPherson, G. E. (1997). Cognitive Strategies and Skill Acquisition in Musical Performance. Bulletin of the Council for
Manturzewska, M. (1990). A biographical study of the life span development of professional musicians. Psychology of
Music, 18, 112-139.
Martignetti, A. (1966). Causes of elementary instrumental music dropouts. Journal of Research in Music Education, 13,
177-183.
Sandene, B. (1997). An investigation of variables related to student motivation in instrumental music. (Doctoral dissertation,
University of Michigan, 1998). Dissertation abstracts International, 58/10. (University Microfilms No. AAT 9811178).
Simonton, D.K. (1999). Talent and its development: An emergenic and epigenetic model. Psychological Review, 106,
435-457.
Sloboda, J.A. (1993). Musical ability. In G.R. Bock (Ed.), Proceedings of the Symposium on the Origins and Development of
High Ability, held at the Ciba Foundation, London, Jan. 25. Chichester: John Wiley & Sons, 106-118.
Sloboda, J.A. & Howe, M.J.A. (1991). Biographical precursors of musical excellence: an interview study. Psychology of
Music, 19, 3-21.
Sloboda, J.A. & Howe, M.J.A. (1992). Transitions in the early careers of able young musicians: choosing instruments and
teachers. Journal of Research in Music Education, 40, 283-294.
Sosniak, L.A. (1985). Learning to be a concert pianist. In: B.S. Bloom (Ed.) Developing Talent in Young People. New York:
Ballantine Books, 143-167.
Sternberg, R. (1998). Abilities are forms of developing expertise. Educational Researcher, 27, 11-20.
Swanwick, K. (1994). Musical Knowledge - Intuition, analysis and music education. London: Routledge.
Younker, B.A. & Smith, W.H. (1996). Comparison and modeling musical thought processes of expert and novice composers.
Bulletin of the Council for Research in Music Education, 128, 25-35.
Whitaker, N.L. (1996). A theoretical model of the musical problem solving and decision making of performers, arrangers,
conductors, and composers. Bulletin of the Council for Research in Music Education, 128, 1-13.
Back to index


ABSTRACT
Proceedings paper
A NEW THEORY OF ADULT MUSIC LEARNING OF MUSIC FOCUSING ON THE PIANO
Toshio IRITANI
CHÔFU WOMEN'S COLLEGE
Kawasaki-shi, Kanagawa-ken
Japan 215-8542
1. Introduction
The purpose of this paper is to argue that there is a different method for adults' learninging of music
(especially for piano playing) ,which is a practical, much faster method and more practical way for of
getting acquainted with piano playing. In the Formal piano education of piano playing, music teachers
normally start with the lessons of learning simple melodies,. Then students and gradually go on to learning
more complex melodies, using such traditional textbooks as the "Beyer" which is written for early
beginners.
In the case of adult, however, such a method of learning is not necessary for adults and this author would
like to proposes a more economical, practical, and efficient and method of learning music. This is
especially true with respect to piano playing.
The structure of music is very similar to the structure of a sentences written in a foreign language. An
adult who has experience of learning a foreign languages, or are acquainted with different kinds of
languages, can probably learn faster to learn the scores written in the stave of musical notation faster. Just
as a written and spoken languages consists of sounds, rules for word combinations and its word order
(grammar), stops, phrases, and articulation, paragraphs, and chapters,. so in the same way music consists
of musical sounds (octaves) that go up or down in steps from lower to higher pitches and vice vice versa.
How each sound is combined and transformed transforms sounds in harmony, forming a parts and phrases.
In addition, in contrast with young children, adults have a wide experience and knowledge of a good
number of good melodies. Adults and memorized these melodies and can even sing them when an old
memory can reactivate the melodies in their brains. The problem is, how to read the notes composed by
composers in order to transform these good melodies into musical notation by playing on the piano
keyboard, through reading the notes composed by a composer. Even in that case when adults know the
melodies, they still must know the basic rules of musical notation, and how it is expressed and
understood,. This is the same as how children or matured adults knows consciously and unconsciously the
rules of grammar and how a sentences and phrase is composed before they speak.
My own experience, is based on starting to play the piano after passing the age of sixty, and have an
experience of learning to play the piano for five years, and still having a piano teacher, but I have now
reached the stage of playing Beethoven's Opus 57 (the so-called "Appassionata") , especiallyincluding the
first and the second movements. In the past I had an experience of haveing successfully played in recitals
for a small groups after learning some easy classical music, such as Morzart's Andante Cantabile (Opus
545), Schumann's Träumelei, Chopin's Nocturn Opus 9- No.2 Etudes Opus 25-18 (the so-called Prelude of
"Rrain-drops"), and Grande Valse Brilliante Opus 18 with a great applause with a great applause. I would
like to explain my footstep of how I succeeded in playing the piano so quickly. and was able to have my
performances greatly applauded.
2. The Basic Theory of Adult's Music Learning of Music: The Cognition and Comprehension of the
file:///g|/poster1/Iritani.htm (1 of 10) [18/07/2000 00:29:10]

ABSTRACT
Compositional Structure of Music

In order to understand a composition and to play it by on the piano, one must grasp the basic principles of the
tonal notations which that are comparable to learning the pronunciation and grammatical rules in for learning a
foreign language. These rules consist, of at least of the following eight principles.
1. The time values (i.e., the kind of notes, semi-breve, minim, chrotchet, quaver, etc.) and their durations.
2. Bar lines, i.e., single or double bar lines which that are comparable to a phrases and or a sentences with a
commas, colons, and or semi-colons.
3. Time signatures (i.e., 4/4, 2/4, 3/4, 12/8 which are written on the top of the start of music,; the number of
the numerator on top signifies the counts of each tones based on the bottom time values.)
4. The understanding of an octaves (ascending and descending the 8 steps of pitches in on the white keys of a
the piano and the 5 black keys within each octave (which lowers or heightens half or semi-tones usually
called sharps and flats.)
5. Clefs: the treble (G) clef and the bass (F) clef. The former signifies the upper parts of the notes written on
the stave and this part is usually played by the right hands., and Tthe latter signifies the lower parts of the
notes written on the stave and this part is usually played by the left hands.
6. Also to be understand are the equivalent time values, e.g., x2= , =2 1/2, or 6 s, etc., and the
corresponding signature of for the rests, , , , etc.
7. Other special notations such as the slur, , crescendo, and decrescendo, turn, etc.
8. Performance direction (which is usually written at the start of classic music usually written in Italian, such
as allegro, adagio, andantino, and tempo, etc.,). and one must decide how to following the speed of play
through this direction. (The above are based on E. Taylor's Music Theory in Practice, 1990, pp. 4-23.)
In addition, there are many more special signatures which that are occasionally encountered in many
compositions, such as the cancel, staccato, trill, and tremolo, etc, etc.,. and also the Students positions of
finguring should also be assisted in the fingering positions by a piano teacher.
The next step is how to transform the acquired musical notations into the composed text written by a the
composer. This process occasionally helps to activate the melody that was previously learned and experienced in
one'swhich is stored in the adult past memory.
1. The Practice with Some Simple Melodies to Get Acquainted with the Above Notations
Here are two abbreviated musical notes melodies. The first was written by Beethoven, and is the last part
of the "Chorus" in the 9th Symphony. and the second was written by Johannes Brahms and is the
beginning of the his Symphony No. 1. written by Johannes Brahms.1
Both melodies are heard quite often and everybody knows this these melodies quite well. If one an adult
tries to play these streams of notes with while paying attention to their basic time values, then the adult
can play very easily and comfortably.

ABSTRACT
< Beethoven's 9th Symphony, Chorus D-minor >
< Symphony No.1 - Theme >>

Even in with the above simple melodies, we learn that each stream of the melody is made up, together
with the upper and lower tones, of a stream of harmony in succession of single time values for which the
composers organized the tonal elements into a beautiful structure of phrases and articulations.
2. Learning of More Complex Phrases and Articulations by Some Noted Composers
In the following I would like to demonstrate in the following how I have acquired a learned a series of sonatas
and other styles of compositions in other styles written by some noted composers such as those of Mozart,
Beethoven and Chopin, the most well-known Western classical composers today. I did this by discovering some
common principles that are hidden in their compositions, such as of Mozart, Beethoven and Chopin, the most

ABSTRACT
well-known Western classical composers today.

Some The basic principles which that I have found, during thewhile learning of the works of those great
composers, consist of the following.
1. Each of the above composers, as well as all others, have invented their own characteristic tonal harmonies
and they have used them in for the variation and in transformation in phrasing and articulation.
2. Each piece of works, especially of sonatas, has has a its own repetition. This means of its own, so that
when learners must first one grasp understand the major tonal characteristics, or main theme, or a motif of
a composition and the succession and repetition of tones or its the variations which usually links the major
themes,. Then the learner can will be able play to relatively easily play the parts of the bar lines contained
in one a phrase and can then he can move on to the second phrase.
3. The next phrase may be quite a new form of a phrase accompanied by a modulation, but the point to note
is the position or the role of that this phrase played plays in the entire structure of the composition.
Generally speaking, each composition consists of (i) introduction of the theme (or motif), (ii) response to
the theme (motif), (iii) transposition or modulation of the tonal phrases and (iv) conclusion of the entireall
the developments (coda). In this way a composition very much is resembles very much the characteristics
of one's one's speech processing of a speech, or a style.
4. Each composition usually consists of the repetition of a phrase,. and The learner one must grasp
understand how and in what manner such a part must be played, in both styles and speed (crescendo and
decrescendo; louder and softer) in unison with the upper tones or in separately. This entirely depends
depends entirely upon the correct and precise reading of the notations written by a the composer.
5. A pPiano players must be conscious in of the entire complete tonal movements, followed by a the
performance direction,. sometimes Sometimes this is distinctively in a staccato fashion, and at other times
in a legato fashion. The learner in must grasping the whole stream of musical events, parts and wholes,
continuation and grouping, similarity, and transposition as an organized experience of music as a whole.
(cf. Köhler, Gestalt Psychology, Paper-bound edition, 1970, p. 236.)
A. The case of Mozart's Sonata
In grasping Mozart's sonata, the following points may be stand out for understanding Mozart's sonata.
1. The repetition of accompanying notes (usually played by the left hand). The following example is from
Andante Cantabile (Opus 545). The lower tones (with the left hands) start with: sol-re, si-re, and then la-re
and do-re (first bar line, in Italian notation), etc. In other words, a one tone is replaced by another tone,.
while the upper tone played by the right hand takes only one minim out of 4 groups of semiquavers or a
pair of a crotchets in accordance with the lower tones played by the left hands.
2. There is always a repetition of the group of harmonic melodies which are played with the left and right
and left hands either simultaneously or alternatively.
3. Many of the Mozart's sonatas accompanies accompany alternatively bright melodies (expressed by sharp
notes of do- fa-sol) and sad melodies (expressed by flate notes of la-mi-sol-mi-re).

ABSTRACT
Some of the easiest parts of Mozart's music are reproduced here.2
< Eine kleine Nachtmusik >
A. Chopin's's Example
Some of Chopin's earlier and simpler work can also be analyzed in the following way. Chopin's has produced a
number of difficult classical piano music in different genres (such as Mazurukas, Polonaises, Waltzes, Ballads,
and Nocturns, etc.) which liberalized the tonal structures, combined the elements of Polish dancing, in used
bright tones, and expressed his delicate moods and sentiments in sublimated forms. This is especially expressed
in his Nocturns, Etudes, and Waltzes which use a number of alternations of sharps and flats in staves. Some of
his earlier and simplest works can be analyzed in the following way.
1. There is a continuation of the same tones (cf. the beginnings and the middle of the Etude "Rraindrops",
and also at the start of the Grand Valse Brilliante, Opus 18. See the beginning parts of both compositions).
2. There is also a repetition of a group of harmonic melodies.

ABSTRACT
3. In the above these two compositions, there is no big major modulation of the tonal structures, compared
with his other difficult pieces, so that an inexperienced pianists can follow the stream of melodies in such
phrases after grasping understanding the whole structure of the above two examples. Here are two
relatively simple examples of Chopin's work that are mentioned in the above.
(cf. Chopin's Etudes 28-15 (the Pprelude of "Rrain-drops") in the opening phrase and the middle
parts (depicting the rainfalls) and the opening paragraph of the Grand Valse Brilliante Opus 18.)
< Grande Valse Brillante >

ABSTRACT
A. Beethoven's Sonata
While Although Beethoven composed 32 different sonatas in his life, the style and the contents of his earlier,
accomplished and later of his works are found out to be quite different from his earlier in the stages of his career,
from one to another, while the characteristic tonal elements of his phrases (mostly melancholic and suddenly
bursting out in tones), however, can be traced in each of his compositions.
His most well-known sonatas were composed in his the period around when Beethoven was in his late twenties
and early thirties. These sonatas are the his Opus No. 13 (called "Pathetic"), Opus No. 27-2 ("Moonlight"), and
Opus No. 57 ("Appassionata").
In the latter two sonatas, some of the easy parts are found as follows. The moonlight starts with the four groups
of sol-do-mi in the first bar line of the treble-clef and changes gradually like la-do-mi, la-do-mi, la-re-fa, and
la-re-fa in the next bar line in the performance direction of adagio sostenuto.
While the opening paragraph of the Opus 57 ("Appassionata") is very fast (in allegro assai), followed by slow
and fast bursting phrases alternatively in 12/8 time signature, the second part consists of a harmonic melodies
written in both treble clefs,. The melodies are la-do la-do-fa sol-si-mi, 3 sol-sis, la-do sol-si (upper tones) sol-si
(middle), and sol-si (lower tones) and the second tones written in the treble-clef consist of the repetition of mi
(upper tones).
The second part starts with a bass clef la-do-mi-la, la-do-mi-la, la-do-mi-la, and la-do-mi-la and goes up to the a
series written in treble clef,. This series is do-mi la-la (combination of upper and a lower tones), do-do, la-la,
sol-sol, sire-sire, re-re, sol-sol, la-la, and mi, fa-re, do-mi, do-si, and mi-re-do-mi, etc., which constitutes a song
of a praise for a lover (who maybe may have been one of his Beethoven's sweethearts).
(cf. The beginning of Beethoven's "Moon Light" and Opus 57 ("Appassionata").

ABSTRACT
< Moon Light Sonata (Sonata quasi Fantasia) >
< Beethoven's Opus 57, First Movement >
(second paragraph and the following)

The first part of the his Opus 57 consists of the a long phrasing with a theme (motif) followed by the second and
the third part with different tonal styles (which using use transpositions). The fourth part consists of a large
variation of up and down movements ending up with a small adagio,. and the coda of the fifth part (starts with
piu allegro) and ends up with rather a calm phrasing.
In this way, most of the Beethoven's sonatas consist of a long series of phrases divided by first, second, and third
movements. (The first and the third movements are very quick and involve bursting tones, while the middle,, or
the second movement, usually is constitutes constituted of a rather calm, comfortable phrasing. And in this
context, it It is rather difficult in this context to play all the parts smoothly and as indicated by his performance
direction, but even in these difficult phrases one can still find a the simple and very characteristic tones of
Beethoven even in these difficult phrases, such as fa-la-do-fala, mi-do-mi-so-do or the same repetition of tones
like mi, la-do, and do-mi-la as is exemplified in the parts of the Opus 57 above.
1. Conclusion
To conclude, I can make the following conclusions about the stage that I have reached after five years of
learning as a late beginner.
1. An increase a in the further skills of reading and understanding musical notes and scores, as well as an
increase of in the skills of finger movements.
The latter problem skills can further be improved through practice by with the assistance of a piano
teacher, and listening to CDs and the actual performances of good playerspianists.

ABSTRACT
2. Piano playing is analoguous to other the mechanical learning of other skills such as using computer key
boards, word processors, E-mails E-mail, and the Internet which are all recently developed mechanical
technological innovations. The only difference between music and those these other techniques exists in
the skill of hearing musical sounds and of memorizing them distinctively. With regard to the finger
movements, the mimicry of motor movements after piano teachers seems to be very important with regard
to the finger movements.
3. The problem of the speed as written in at the head of each composition (performance direction, such as
allegro, presto, largo, etc.) and the problem of a good coordination of the left and right hands still exists,.
but this will be improved by further practice, and by listening to many performances of experienced
pianists.
4. (4) On looking back, I can see that it was rather slow when I first learned ( the understanding of the music
fundamentals) was rather slow, but I had a skill from the beginning, however, concerning the expressiong
of the melodies that I have heard since my childhood. I also had strong motivation to become a good piano
player and musician, and I have not forgotten to practice and some time ago what seemed to be difficult at
the beginning could be overcome by later rests and practices. In this context, I think what Professor
Bartlett called "effort after meaning" was activated in my mind concerning the memorization of melodies;.
in other words, That is to say that a schemata of tonal elements was enlivened unconsciously in my brain
(Bartlett, 1932, 1995).
Now I have a had good experience of the deep feeling or and delicate emotions of composers as to how they
expressed themselves in their compositions.
I can now identify with them in the expression of melodies, harmonies, and rhythms with certain forms of
musical notation, phrasing and articulation, modulation, ornamentation, and pauses.
Notes
1. 1)Beethoven's Symphony No.9 adapted from Tomoe Kitamura (1994). Piano Lessons for Adult
Beginners, Ongaku-no-tomosha, p. 19.
2)Brahms Symphony No. 1 from James Bastienne (1981), Favorite Classic Melodies, Kjes, West,
San Diego, California, p. 9.
2. 1)Mozart, KV. 545 Zen-on Piano Library (1956), Mozart Sonaten 2.
P. 236.
2)Mozart, KV. 525 Eine Kleine Nachtmusik, Zen-on Piano Library (1988) p. 4.
3. 1)Chopin, Opus 28-15, Zen-on Piano Library (1955) pp. 29-30.
2)Chopin, Grand Valse Brilliante Opus 18, Zen-on Music for Piano, No. 128, p. 1.
4. 1)Beethoven, Moonlight Sonata, Zen-on Music, Music for Piano, No. 1. P. 1.
Beethoven, Opus 57 (Appassionata) G. Henle Verlag, p.4.
References
1. Taylor, E. (1990). Music Theory in Practice, London, The Associated Board of the Royal School of
Music.
2. Baxter, H. and M. (1993). The Right Way to Read Music, U.K. Right Way.
3. Keller, H. (1955). Phrasierung und Artikulation. transTrans.lated by Uemura, K. and Fukuda, T. (1969),
Tokyo, Ongaku-no-tomo-sha.
4. Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology, New
York, (??Liverlight?? spelling does not match with above??), republished (??or reprinted??) reprinted in
1970.

ABSTRACT
5. Bartlett, F.C. (1932). Remembering: An Experimental and Social Study. Cambridge, Cambridge
University Press, reprinted in 1995.
Back to index

Motivation and Identity in Music Performance
Proceedings paper
Motivation and Identity in Music Performance: An Exploratory Study

Antonia Ivaldi and Susan A. O'Neill.
Department of Psychology, University of Keele
In the field of achievement motivation, numerous studies have been conducted using the
Expectancy-Value model proposed by Eccles-Parsons, Adler, Futterman, Goff, Kaczala, Meece, and
Midgley (1983). The underlying idea of this model is that individuals' expectations for success and the
value they have for succeeding are significant determinants of their motivation to perform different
achievement tasks (Wigfield, 1994). Each of the constructs contained in the model has been
investigated through the use of survey data. Relatively little research has explored how individuals
interpret these constructs in relation to their own expectations of success and value they have for any
given task.
This paper will focus on the motivational construct of self-belief, which, for the purposes of the
analysis, will be broken down into two parts; that of achievement related beliefs and expectancies for
success. Motivational theories that contribute to the understanding of self-beliefs are self-efficacy
(Bandura, 1986), expectancy-value (Eccles et al. 1983) and self-perceived competence (Harter, 1982).
SELF-RELATED BELIEFS
Competence Beliefs
These refer to individuals' beliefs about their ability and their own evaluations of their competence in
different domains. The beliefs relate to their achievement performance, choice of achievement tasks,
the amount of effort applied, cognitive strategy use, achievement goals and overall self-worth
(Wigfield and Eccles, 1994). Harter (1992) states that an individual's perception of competence may
contribute directly to motivation. Researchers have been interested in how these beliefs may change
over time (see for example, Wigfield, Eccles, Mac Iver, Reuman, and Midgley, 1991; Wigfield,
Eccles, Yoon, Harold, Arbreton, and Freedman-Doan, and Blumenfeld, 1997).
Expectancies for Success
These are defined in terms of individuals' beliefs about how well they will do on an upcoming task.
(Wigfield, 1994), i.e. the confidence they have in their ability. Eccles et al. (1983) state that
achievement expectancies play a significant role in an individual's academic choice. Consequently, it
is important to identify those components that are shaping those expectancies. They state that these
expectancies are most directly influenced by self-concept of ability and the estimation of the difficulty
of the task.
Eccles et al. (1983) define self-concept of ability as "the assessment of one's own competency to
perform specific tasks or to carry out role- appropriate behaviours" (p.82). It is formed through a
process of observing and interpreting one's own behaviours and the behaviours of others. In terms of
an individual's perception of the difficulty of the task, Eccles et al. conclude that these perceptions
may influence self-concept of ability in that individuals who rate a task as more difficult develop
file:///g|/poster1/Ivaldi.htm (1 of 13) [18/07/2000 00:29:13]

lower estimates of their own abilities for that task. One other important factor that is believed to be
influential in shaping an individual's self-concept is the perception of others' expectations.
Self-efficacy
Proposed by Bandura (1994), it is defined as individuals' confidence in their ability to accomplish a
task. Eccles, Wigfield and Schiefele (1998) state that Bandura's theory emphasises expectancies for
success which are distinguished between two types of expectancy beliefs; that of outcome
expectations (beliefs that certain behaviours will lead to certain outcomes) and efficacy expectations
(beliefs about whether one can perform the behaviours necessary to produce an outcome). These two
are distinguished in that an individual may believe that a particular behaviour produces a particular
outcome, but may not believe that they can do that behaviour.
An overview of the literature on self-related beliefs has been presented in order to provide a
framework for the analysis in the present study. In order to understand why and how musicians talk
about these constructs in relation to their own performances, qualitative research methods are most
appropriate. Henwood and Pidgeon (1992) state that qualitative analysis is the representation of reality
through the eyes of the participants, and it is this reality of musical performance that the present study
aims to explore. Discourse analysis is the method that will be employed in order to identify and
interpret the individual's construction of self. It is considered that how an individual talks about these
constructs provide a sense of how their identity is constructed. This is determined by the different
social encounters in which they are involved.
SOCIAL CONSTRUCTIONISM AND DISCOURSE ANALYSIS
According to Burr (1995), an individual's identity is constructed socially, out of the discourses
(spoken interaction) culturally available to us. It is the product of social encounters and relationships
with others. Therefore, identity is created rather than discovered. Neimeyer (1998) refers to "arenas",
as the two concepts that underlie the social constructionist movement; 'language' and 'the self'.
Widdicombe and Woffitt (1995) state that the focus should be on the text or discourse within or
through which selves and identities are socially constructed.
Language is organised into discourses or interpretative repertoires, which in turn employ various
devices that can be used in the identification and interpretation of the individual's construction of self.
Sherrard (1999) defines a repertoire as a recognisable, self-contained point of view. Within this notion
of interpretative repertoire, three aspects of language have been proposed (see, for instance Potter and
Wetherell, 1987). Firstly, that when analysing the accounts that individuals give, there is always
variation, which, in this particular movement, carries more importance than that of consistency.
Secondly, talk has a variety of different functions, other than just the process of transmitting
information. Thirdly, an individuals talk (or writing) is drawn or constructed from existing resources.
It is these resources that have been labelled repertoires. Burman and Parker posit that these repertoires
are not newly created when individuals speak, but instead have been borrowed or remoulded for a
particular purpose at that particular time.
Within the repertoires, a number of different devices may emerge. These could include for example,
resolutions which Sherrard (1999) states are attempts by the individual to resolve contradictions when
they become apparent, and subject positions, which Burr defines as the positions that individuals
occupy in talk and the implications that this has for the individual involved. She illustrates this idea of
positioning in the area of gender, drawing on how men and women are positioned within particular
discourses and what this might be saying in terms of the power relations between them. Other devices
such as dilemmas or metaphors presented by the individual may also be apparent in the text, which

add to the complexity and variation of each individual's discourse.

The present study aims to examine individuals' subjective experience of musical performance in order
to identify 1) the main motivational concepts associated with their experience, and 2) the ways in
which these concepts are interpreted within the context of their accounts. Two key questions will be
addressed as follows: 1) in what ways do the musicians construct their notion of ability? 2) how do the
musicians apportion blame or responsibility in terms of their ability and the recital outcome?
Method
Participants
Seven musicians (three females and four males) from the university music department, preparing for
final year recitals were asked to participate. Although approached randomly, a balance between
gender and choice of instruments was sought. The range of instruments included guitar, flute, piano,
clarinet and saxophone. The conditions of being able to do the recital were that they were all of
diploma standard and had to achieve a minimum of 60% in the prerequisite module. In addition, three
of the seven musicians chose to do two recitals, which required auditions. Choosing a second recital
had no bearing on the interviewee's level of musicianship and was considered in the same light as
those that chose to do only one recital.
Procedure
The seven participants were interviewed separately at two points in time. The interviews were
semi-structured and were tape recorded (with permission) for later transcription. All participants were
assured that their data would remain anonymous.
The first interview took place one month before the recitals and lasted approximately one hour. The
second interview of approximately fifteen minutes was conducted no more than seven days after the
recital, in order to confirm the accuracy of the interpretations made during the first interview. This
was to clarify and expand on any areas that were ambiguous and also to record participants' reflections
on their recital performances.
Data Analysis
The raw data was first transcribed before analysis. Through discursive analysis (Potter and Wetherell,
1987), specific attention will be given to how the musicians construct their conceptions of ability and
their evaluations of the performance outcome. In particular, it will be noted where an individual may
apportion blame or offer justification for the performance outcome.
To increase the credibility of the research, the interview transcripts were read and re-read, with all
interviews analysed by two researchers. On discussion the same interpretations were concluded from
the analysis. Within the interviews itself, the second, follow up interview was an opportunity to
confirm with the participants that the interpretation by the researcher of the initial interview was
correct. This idea of objectivity, in this instance, participant validation, has been proposed by
Henwood and Pidgeon (1995) in terms of making the research more credible.
Reflexivity involves a reflection of the ways in which the researcher can influence the research
process. The researcher needs to be aware how their biases, interests, values and experiences can
affect the research and subsequent interpretations (Banister, 1994), as well as focusing on the power
imbalance between the interviewer and interviewee.


The analysis in this paper will concentrate on two constructs of achievement motivation, that of
'ability related beliefs' and 'expectancies for success'. Discursive analysis will focus on how the
musicians construct their accounts in relation to ability and expectations for success, and what this
tells us about their sense of identity as a musician.
1. ACHIEVEMENT RELATED BELIEFS
Control
All musicians talk about how the outcome of the recital is dependent on the actual day. This is
illustrated by the perceived level of control over the performance that they describe in their discourse.
The degree of internal and external control is different for each individual, with two possible
interpretations. Firstly, that those exhibiting more attributions to external factors such as nerves or
audience evaluations perceive a greater lack of control, and secondly, that these individuals
consequently have a greater lack of belief in their ability. Examples of variations in the levels of
internal and external control are provided below:
Kate: "There is always that anticipation that it could always go wrong on the day...I think
its just something you're aware of all the time and when you have your moments when
you think that you're going to bodge it up and do extremely badly, you just have to deal
with them and calm yourself down... I very much want to get over these nerves, I don't
want them to bodge up my recital, its almost a slight battle with it"
AI: "so what's causing the nerves?"
Kate: " the thought of it going wrong, the lack of confidence. If you could be confident
and certain it was going to be, which, you can't ever be, I think even if you're brilliant,
there are certain bits that you know can go wrong, its definitely the lack of confidence on
the day".
Kate illustrates different degrees of taking responsibility for the outcome. At one point she surrenders
all responsibility for herself by stating that it doesn't matter how brilliant you are, there are aspects
that can still go wrong. Her statement that she does not want her nerves to "bodge up" the recital
implies that she has no control, although in the same sentence, describes them as a slight battle, which
suggests that she does have a degree of control in terms of how much she allows them to interfere
with the performance.
Through this discourse it may be interpreted that she achieves two things. Firstly, as a way of gaining
sympathy as she presents this win/lose situation. Secondly, she wants to illustrate that she has little or
no control over the performance outcome. This means that her musical ability cannot be judged as
such.
In contrast to this, Mark illustrates a greater amount of responsibility for the outcome of his recital. To
him, a lack of control suggests a lack of competence, which he does not want to display. In his
discourse, Mark takes responsibility for the performance by suggesting that it's 'the way that you are
feeling in your head':
"If you're really too nervous, then it can actually have a physical effect when you're
playing because your throat can really tighten up and it restricts your sound....it's kind of
losing the situation ..I won't be like that when I play .....it's the way that you are feeling in
your head, if you are feeling calm and confident you can just do well, or if you are

feeling nervous, you can let it all get on top of you" [Mark].
Mark suggests that the control comes from within, that the decision to choose to 'lose the situation' is
his. This suggests a reflection of his belief in his ability, i.e. if he believes in his ability then he should
be able to stay in control of the situation. It may be interpreted that he is suggesting that those who are
out of control are those who are less confident in their ability. This idea is supported as Mark makes
reference to nerves equalling the level of control the performer has:
"If it's not going well maybe you're too nervous and it's out of your control, if you do feel
that then I just try and kind of make sure that you kind of step back from it and think
right this is me playing and I'm in control, I can play this music...I'm going to really listen
to the notes I'm playing, and then you step back, you get out of that feeling of
nervousness ....if you do feel it's all running away from you it's because you are not all in
control, it's at that point that you've got to think I'm in control of this" [Mark].
Mark describes how he maintains control of the situation. At no point does he attribute a lack of
control to factors outside the individual. Because he is able to do this, he comes across as a performer
with a certain maturity, a musician that is at ease with his own ability. This idea is supported at
various times throughout the interview, and which will be illustrated further in the analysis.
Effort
For most of the musicians, there are accounts of how much effort they have been putting into the
recital. Through mentioning their efforts they illustrate that they have taken responsibility for the
factors that are within their control, such as putting in the work that is required. This way, a
potentially negative outcome can not be attributed to a lack of effort, as the following quotes suggest:
[When speaking of the music] "There is a lot of work gone into that for me, to give an
individual performance of it...I'd have probably didn't realise how much hard work
actually goes into it because even now, a few weeks before, there is still a lot of work
that needs to be done, so you think no matter how long, I mean even though I started
preparing for these about last year, you think well you know, should I have started
preparing earlier should I have thought about them the year before, I think its hard for me
to do that with other, with other things going on as well, with another subject" [Rachel].
[After the recital] " I was still just trying to enjoy it and been just thinking well I've put a
lot of work into this I've been practising for a year I don't want it to all go wrong on the
final day...at first it was a bit daunting but just as it went on I was thinking oh yeah this is
me I can play these pieces, all this work that's been put in" [Rachel].
Rachel implies in her discourse that she has put a lot of effort into the recital. She makes it explicit
that she started working on the recital over a year ago, and points out how much work is involved by
stating that there is still a lot of work that needs to be done. However, she justifies not spending
anymore time on it by stating that she has other subjects competing for her time. It would be
reasonable to assume that a year spent on working towards a recital is long enough, and this way it
could not be concluded that Rachel did not put the required effort in. Her discourse not only illustrates
effort, but also commitment to the recital, as she has spent a year working towards it. Because Rachel
has invested so much time, she is more concerned about it going wrong on the day. It may be
interpreted that because she has convinced the interviewer that so much time went into the recital, a
negative outcome may be attributed more to her musical ability as a result.
However, for one musician, Nicola, her aim is to create the impression that she has not put the work

in. Through making reference to her laziness, or presenting herself as such, she provides a reason to
attribute a negative recital to nothing other than a lack of effort. This has the effect of taking the
emphasis away from attributing a negative outcome to her ability as a musician:
"...I am a bit lazy, I don't want to have to put the work in, I'd just like to be able to get up
there and do it...I think in a sense I prefer to just go in on the day and do it you know, I'm
afraid of getting bored....I don't know I seem to have a fairly laid back attitude to it but
that's probably more to do with laziness than, because then I do occasionally have stress,
panic moments when I think that I'm not going to be able to do it and I think well you
know, I should have started earlier, but on the whole I probably don't really think about it
so much, when it gets to the last couple of weeks I just get on with working on it because
I know that I have to then".
AI: "So what are you scared about?"
Nicola: "the worry about the overall degree...it will probably be the kind of thought on
the day that I've probably messed it up by not doing enough practice and it will mess up
my whole degree...because I leave it to the last minute that's part of what I worry about
just the fact that there are parts of it that could go wrong that I perhaps could have
practised more" [Nicola].
Nicola's justification for not putting in the work, is that it helps her concentrate more and stops her
from getting bored. All her stressful moments or concerns about it going wrong are attributed to her
lack of practising, or not starting the preparation early enough. However, this also has the opposite
effect in that if the recital goes well, it is likely to be attributed to her ability, as she has already
presented herself as someone who did not work for the recital. Nicola points out further that her
laziness towards her recital is a result of who she is, as later on in the interview she states that she
approaches subjects other than music in a similar way.
Musical Skills and Musical Awareness
For a few of the musicians, a significant proportion of the interview focuses on the music that they are
playing. Mark, in particular, provides great detail as to what he is trying to achieve through the music,
which is the main focus of his interview. His discourse offers an understanding of what he believes is
involved in being a performer and in what he wants to achieve through the performance. This style of
discourse serves as a way of representing him as an individual with a high level of musical ability. He
appears as though he has a lot of experience in performing and has a good knowledge of what is
required of him as a musician, as the following quotes illustrate:
[In the solo pieces] "I'm very aware of the tone produced, and very aware of the
dynamics... its like if you are an artist and you've got a palette of paint, you can like sort
of dip into all different sorts of colours, just by changing the way you focus on the sound,
so that's what I concentrate on, and you just try and make some coherent musical idea out
of what I'm playing".
[Of the music] " I know exactly where it is going....if you think of it like where's this
within itself, where is that phrase musically going, how is it all related, what are the
important notes here, what's leading to it, then you just make sense of it musically and it
all becomes, one part of the coherent idea, so it makes it easier to perform".
"I quite like doing solo stuff because then, its all on you,.. .you can really concentrate on
the sound you're producing, ...you're very free to interpret it as you like...if it's a piece

that really moves you, you listen to it and it really does something for you then, when
you come to interpret it, you put so much more into it...if you don't understand the
piece... then you don't make such musical sense of the notes" [Mark].
Mark's analogy of an artist is used to describe the process he uses to communicate the music. It also
suggests that he has worked hard at preparing for the recital (e.g. "I know exactly where it is going").
In addition, playing solo music gives him the opportunity to demonstrate this ability to others.
Through his discourse, he presents himself as a very confident and competent player, which is also
supported at other times, for example:
"I'm fairly competent on the clarinet....I know I'm one of these kind of people who
whatever will get though it" [Mark].
However, it may also be interpreted that the attention to detail he gives when talking about the music
in the interview takes the focus off him as a player, as often he makes generalised statements as to
what is required when performing.
Kate also gives detailed descriptions about her approach to the music. In this instance though, it is
used as way of justifying the amount of effort that is required. She also equates this higher level of
performing with experience and years of practice, which, in a general sense, is what music ability
means to her.
[Speaking about the music] "You need to give it the right sound and it's that side of
things that needs more effort too sometimes".
AI: "What do you mean by the right sound?"
Kate: "Well you can't just play the notes, and you can't just put the dynamics in because
that is not enough, its got to have a line and its got to have expression in it.... Its
definitely the sound and the feeling attached to it that you give people, and give yourself,
which I think is one of the harder things to do and I think it is the thing that comes
between years of practice and years of playing, it's the experience element I think again
that you need".
Choosing a difficult programme

One of the most predominant themes that all musicians consistently talked about was the level of
difficulty in their programme. This was talked about in a variety of ways, which may be interpreted
either as a way of illustrating their ability, or as a way of portraying how difficult the task is:
"It is a scherzo and it is very quick...its going at a hundred and twenty dotted minims a
bar...I wasn't sure about my variations which is my other main piece for the second
recital because that looks so difficult, rhythmically especially, and I wasn't sure that I'd
be capable of having a metronomic brain enough to keep it going because it was so
difficult" [Kate].
Through her description, Kate describes the pace of the pieces as the main source of what makes them
technically demanding. This is also a feature of Paul's discourse as he talks about the music, although
his concern is to make it appear easy to the audience:

"The other pieces is sort of like very, technically involved, its very sort of, well,
ornamented and its quite virtuosic really, it's a bit of a nightmare (laughing), in my book,
I have to move my fingers pretty fast and stuff but yeah its great fun, hopefully that will
come across as well, the fun aspect of it will come across, trying to make something that
is incredibly hard seem quite simple and fun" [Paul].
As he talks about another one of his pieces, Paul indicates that not only was he having problems
learning the piece, but his accompanist was also experiencing difficulty. This lends support to his own
interpretation that the music is technically demanding. His confusion in relaying the different time
signatures of the piece adds to the overall impression that the piece he is learning is very difficult. The
fact that he states that he has only just worked it out implies how difficult it is. The pace of the piece
is also expressed in the way that he describes collapsing in a heap at the end of it, suggesting that it is
an exhausting piece to play:
[Speaking of one piece] "..Just looked at it and though, my God (laughing), where do I
possible start, its got an introduction part which is absolute hell to be honest, it's written
in simple time in respect that its sort of 12/4, 12/4?, no it isn't in 12/4, 12/8, ok and its in,
then it swaps between 4/4 and things but, that's quite simple, but the piano is into 12/4
and 12/8 and we're all different time signatures and we like what's going on, so its taken a
while to break that down into actual manageable parts..... we've just worked it out how to
actually play that part, but, the introduction sorts of like, its very sort of like, this is me,
very sort of, very showy-off-ee, but you just go out there and give it to them...it just
explodes at the end and when I finish I'll just collapse in a heap, but yeah, it will be good,
I'm looking forward to it" [Paul].
Through mentioning the technical demands of his programme, Paul presents himself as someone who
is a competent musician in that despite his struggle to learn and play the piece, he is able to make it
through to the end.
2. EXPECTANCY FOR SUCCESS

Risk
For one performer, her discourse focuses on the elements of risk in the programme she has chosen.
She evaluates the risk at different levels (e.g. 'big', 'massive' 'degree') suggesting she is trying to gain
control over the programme:
"I don't like taking big risks, but in the scherzo I am doing, I don't think it's a risk because
I think I can, I think I am getting above it now, I'm getting hold of it and I'm making it go
the way I want it to go......I wasn't sure about my variations which is my other main piece
for the recital because that looks so difficult...I don't think it's a massive risk, but I think
there is a degree of risk, I don't think that you want something so easy that there is no
challenge, because if there is no challenge then you won't work at it enough. But I take a
risk within my own realistic boundaries, I wouldn't push it beyond what I think I can
play" [Kate].
Kate's expectations for success depend to a large extent on the level of risk that she has taken in her
programme choice. It may be interpreted that the use of the word 'risk' suggests a lack of belief in her
ability as a musician, i.e. she perceives herself as not able to deal with some of the more demanding
aspects of the programme which may result in an unstable performance. Risk implies being out of

control, and this idea is supported by the way that Kate describes that she 'thinks I am getting above it
now... and I'm making it go the way I want it to go'. In this way, Kate is able to attribute a negative
performance outcome to the degree of risk in the programme. It may be suggested that the greater the
degree of risk the more the outcome can be attributed to external factors.
The risk is presented both in terms of her degree and herself as a musician. Its cost is in terms of being
awarded a low mark and potential negative evaluations being made of herself as a musician. It may be
suggested that the word 'risk' is a negative term.
This notion of risk serves as an interpretative repertoire for understanding her experience. Risk is used
in terms of representing the consequences of choosing a technical programme. When compared to the
other participants, the notion of challenge is another interpretative repertoire, which tends to carry a
more positive meaning. For example:
"there is a lot of work gone into that for me, to, to give an individual performance of it
whereas a lot of music you can see the phrases and the dynamics which you can follow,
but on this there isn't anything so I thought that was more challenging to do something
like that, to put more of my own thoughts into the music" [Rachel].
"I'm doing a concerto... and it's quite challenging as well it's got a few really quite
technically difficult passages" [Nicola].
In this context, challenge suggests a belief in ability, i.e. that the individual is secure enough in their
belief to push themselves out of 'the boundaries' which ultimately means taking responsibility for the
outcome. Kate knows her own boundaries, she implies that she is still taking a risk even within what
she believes she can play. When Kate does use the word challenge it is used in a positive way as a
motivator, i.e. without challenge no effort would be made. Its use in this context is in referring to a
global belief that without challenge the recital would be too easy. It is presented in a depersonalised
way as she takes the focus away from herself (i.e., through the use of 'you' rather than 'I').
The Audience
For one participant, Simon, his expectations for a successful recital are weighted heavily according to
how he views the expectations of his audience. For Simon, the real issue is how he is going to be
judged as a musician. It may be suggested that he is concerned that his own perceptions of his musical
ability will not match the perceptions of the audience or judges. Therefore, the greater the discrepancy
between his own and others' perceptions, the more nervous he is likely to be. How good he really is
will be indicated through audience feedback and the recital mark. However, through his discourse,
Simon suggests a lack of confidence in his ability. His perceptions of the audiences' evaluations are
always negative, it never occurs to him that they may make positive evaluations:
"Perhaps it is a little bit of paranoia but you think that the audience can pick up on
absolutely everything you do wrong...when you are playing that's the way it seems and
once you make the first mistake it's as if the audience are, they're kind of bearing with
you rather than listening to you, that's not true but that's the way it seems, well for me
anyway when I'm playing....I know it might seem a but selfish but it's as if the audience
are a distraction, it's as if there against you rather than being on your side ... I don't know,
its probably the nervous paranoia type thing again in that perhaps there are people in the
audience thinking, well just waiting for you to slip up and completely fall down"
[Simon].
"You are being assessed, there are people there who are going to be sitting there judging

how good I am as a musician...I'm worried that the people assessing me will think I'm
terrible or whatever...it is a personal thing, how well you can play your instrument, its
very very personal, for somebody to actually put a number on it... if they put a low
number on it then its going to be soul destroying" [Simon]
For Simon the mistakes lead to a negative view of the performer, and after the first one has been made
it is as if the audience is waiting for their negative evaluation to be confirmed. This also suggests a
lack of belief in his ability, as it implies that he knows he is going to make mistakes before he even
begins the recital.
For Ben, this idea of the audience evaluating him is also significant, particularly when the audience do
not know him. If the audience know him and his ability, and the recital is not as good as it could be,
then the audience are less likely to attribute the outcome to his musical ability but rather to the recital
situation. Those that do not know him are left making an assessment of his ability:
"You don't want the recital to reflect badly on your recital marks, because there is an
audience there, there is a number of people that are going to get an interpretation of you,
perhaps a lot of people there won't know me so I think it will be a bad thing for them to
get a bad interpretation of me, I think that's what worries me the most" [Ben].
For Ben, what adds to these evaluations is how he is going to be compared to the other performers.
The order of the recitals means that he is performing last and this puts pressure on him to do well. He
illustrates a lack of belief in himself, as he does not want to be compared to the other musicians that
have gone before him:
"I'm one of the last, I have to sit through an awful lot of recitals, an awful lot of people
doing well, and that kind of compounds the anxiety... however much they say that they
don't I'm sure that your recital's gauged against other peoples" [Ben].
General Discussion
The analysis in this paper has dealt with two main questions, 1) in what ways do the musicians
construct their notion of ability?, 2) how do the musicians apportion blame or responsibility in terms
of their ability and the recital outcome? In light of the discursive analysis, each will be discussed in
terms of what this tells us about their sense of identity.
In what ways do the musicians construct their notion of ability?
One of the main ways in which the musicians present their self-beliefs about ability is through their
discussions of the various difficulties associated with their programmes. Kate and Paul gave examples
where the fast pace of the music gives the impression that it is very difficult. Others emphasised the
difficulty of the music by discussing how long it took them to learn it. Thus, ability means being able
to perform successfully a technically demanding programme.
In contrast, for a few of the musicians, a significant proportion of the interview focused on the music
that they were playing not in terms of its perceived difficulty, but in terms of being able to
communicate musical ideas. This was evident in Mark's account where he was keen to display an
understanding of the music, which in turn suggested a high degree of musical awareness. In other
words, musical ability is presented as the ability to go beyond mastering the technical demands of the
music and displaying this ability to the audience, but also being able to communicate the musicians'
own interpretation of the music.

For the musicians, how they perceive their ability is related to how they evaluate the outcome of the
recital in terms of how much it is going to be affected by nerves or poor audience evaluations. This
idea was illustrated in Kate's discourse when she described the outcome of the recital as dependent on
the amount of risk that was taken in the programme choice.
For the musicians, their sense of self is related to the idea that being a musician means having a high
musical ability. How the participants construct this notion of ability is therefore a reflection of their
perceptions of self as a musician.
How do the musicians apportion blame or responsibility in terms of their ability and the recital
outcome?
For all musicians the outcome of the recital was described as being dependent on the actual day. This
was suggested by the perceived level of control over the performance that emerged in their discourse.
Those with a greater lack of belief in their ability adopted less responsibility for the recital outcome,
and in these instances they tended to apportion blame to nerves and the perceived negative evaluations
made by the audience.
It may be interpreted that the construction of identity as a musician is determined to a large extent by
the musicians' beliefs in their ability and in their perceptions of the recital outcome. This was reflected
through two extremes presented in the discourse of the musicians; those who took little personal
responsibility for the recital outcome through to those that strived to adopt complete control over the
performance situation.
For most of the musicians, it is important to be seen to be putting a lot of effort into the recital. This
way a potentially negative outcome can not be attributed to a lack of effort. Again, two extremes are
presented here. One musician is eager to illustrate a total lack of effort in terms of the recital
preparation. This way, if the outcome of the recital is negative, then it can be attributed to a lack of
preparation, rather than a lack of musical ability. On the other extreme, most of the musicians want to
illustrate that they have worked hard, this way if the recital outcome is negative, they can justify the
result in terms of other factors which are out of their control, such as debilitating nerves during the
performance. Whatever the technique used, the musicians want to avoid any negative evaluations of
their ability. Their aim is to create a positive impression of themselves as a competent musician. It
may be suggested that by creating these impressions, the participants are in conflict with their own
perceptions of their musical ability. Through this discourse, they are therefore able to maintain their
beliefs about their identity as a musician.
Conclusion
In this study the theoretical orientation of social constructionism and discursive analysis was
employed to explore issues of self-belief and identity in seven musicians preparing for a music recital.
There is a need in music performance literature to provide fuller accounts of musicians' experiences of
preparing for a public performance. The analysis has not only provided a deeper understanding of how
undergraduate music students' identity is constructed within the context of music performance, but
also framed the way in which they viewed motivation. This new insight into music and motivational
theory has important implications for the development of performing musicians in terms of why they
are choosing to perform and the process of their preparation.
As there has been little research to date that has produced in-depth data on motivation and identity in
musicians, future research could explore these issues in relation to other constructs of achievement

motivation. Those in particular that can be explored are the individuals' construction of task values
and goals in terms of the recital, and what this tells us about their sense of identity as a musician.
References
Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood
Cliffs, NJ: Prentice-Hall.
Banister, P. (1994). Report writing. In Banister, P; Burman, E; Parker, I; Taylor, M and Tindall, C.
Qualitative methods in psychology. (pp. 160-179). Buckingham: OUP.
Burman, E. & Parker, I. (Eds.). (1993). Discourse analytic research. London: Routledge.
Burr, V. (1995). An introduction to social constructionism. London: Routlege.
Harter, S. (1982). The perceived competence scale for children. Child development, 53, 87-97.
Harter, S. (1992). The relationship between perceived competence, affect, and motivational
orientation within the classroom: Processes and patterns of change. In A.K. Boggiano & T.S. Pittman
(Eds.), Achievement and motivation. (pp.77-114). Cambridge: Cambridge University Press.
Henwood, K.L. & Pidgeon, N.F.(1992). Qualitative research and psychological and psychological
theorising. British journal of psychology, 83: 97-111.
Neimeyer, R. (1998). Social constructionism in the counselling context. Counselling psychology
quarterly, 11 (2), 135.
Potter, J. & Wetherell, M. (1987). Discourse and social psychology: Beyond attitudes and behaviour.
London: Sage.
Sherrard, C. (1999). Repertoires in discourse: Social identification and aesthetic taste. In N. Hayes
(Ed.). Doing qualitative analysis in psychology. (pp. 69-83). Hove, UK: Erlbaum Psychology Press.
Widdicombe, S & Wooffitt, R. (1995). The language of youth subcultures. London: Harvester
Wheatsheaf.
Wigfield, A. & Eccles, J.S. (1994). Children's competence beliefs, achievement values, and general
self-esteem. Journal of early adolescence, 14 (2), 107.
Wigfield, A., Eccles, J.S., Mac Iver, D., Reuman, D.A., & Midgley, C. (1991). Transitions during
early adolescence: changes in children's domain-specific self-perceptions and general self-esteem
across the transition to junior high school. Developmental Psychology, 27 (4), 552-565.
Wigfield, A., Eccles, J.S., Yoon, K.S., Harold, R.D., Arbreton, A.J.A., Freedman-Doan, C., &
Blumenfeld, P.C. (1997). Change in children's competence beliefs and subjective task values across
the elementary school years: a 3-year study, Journal of Educational Psychology, 89 (3), 451-469.
back to index


IS THERE AN ABILITY OF "ABSOLUTE TEMPO"
Proceedings paper
YOUNG PEOPLE'S AND ADULTS' LARGE-SCALE TIMING

IN MUSIC LISTENING
Eleni Lapidaki, Ph.D., Department of Musical Studies, Aristotle University of Thessaloniki
E-mail: lapidaki@mus.auth.gr
YOUNG PEOPLE'S AND ADULTS' LARGE-SCALE TIMING

IN MUSIC LISTENING
Perception of tempo in music is an extremely complex phenomenon. Although we can begin to

comprehend the dimensions of this topic from our general musical experience, it is only recently that
advances in digital technology have allowed the opportunity to empirically study perception of time and
tempo. Yet, the unavailability of highly controllable test apparatus may not have hindered the
experimental study of tempo as much as has a general lack of understanding of the relationship of
tempo to other aspects of musical organisation.
The Notion of "Absolute" Tempo
There is near unanimity in the definition of musical tempo as the "time" of a musical composition,
hence, the speed at which its performance proceeds (Donington, 1980). However, does a piece of music
have one and only one inherent tempo, and if so, does this concept possess an "absolute" time
framework? On the other hand, can a piece of music survive a wide range of tempi? The literature is far
from consistent on these questions.
Music theorists diverge in their opinions of whether structural relationships in music such as local-level
and more global harmonic relationships and rhythmic and metric relationships remain independent of
tempo (Cooper & Meyer, 1966; Aldwell & Schachter, 1978; Forte, 1979), or whether they are in some
way dependent on tempo (Lester, 1982; Berry, 1986). Piston, for example, believes that the speed of
music justifies a broader view of the harmony than would be indicated merely by root changes (Piston,
1978, p. 208).
Perhaps the most dramatic contrast of opinion about absolute tempo may be seen in the positions of
Reckziegel (1961) and Reinecke (1974). Reckziegel asserted:
The perceived tempo of music obviously depends not only on the duration of one or more
abstract units but also on the rhythmic structure within this duration. Therefore, we would like to
introduce the term of inner "tempo" which has been already used by ethnomusicology for some
time (p. 215).
Reckziegel concluded: "The attempt to determine an inner tempo results in making measurable the
musical time dimension, which is seemingly only comprehensible in a sensory way" (1961, p.223).
file:///g|/poster1/Lapidaki.htm (1 of 13) [18/07/2000 00:29:18]

Conversely, Reinecke (1974) stated that "no evidence has been found to prove that one specific musical
piece has only one 'right' tempo"(p.414). Here one may conclude that, in a single-movement
composition or between the movements of large-scale compositions, the relation of tempi to each other
may be constant and in a definite and unambiguous relationship to an "inner" or "base" (Margulis,
1984) tempo, which, on the other hand, cannot be determined by the musical structure in a precise and
absolute way. This perhaps may be the reason why composers set metronome marks to their music.
Although tempo is considered to be a prominent factor in harmonic rhythm, it is surprising that music
theorists have paid relatively little attention to it. Yet there are apparently no theories of music that
assert that because all note values are obviously relative to each other, a specific time value can only be
determined by referring to the speed of the temporal structure of music in relation to "real" (externally
metered) time. While Glenn Gould (1982) considered the tempo of a composition to be "one constant
reference point," Cooper and Meyer (1966), on the other hand, criticised the notion of fixed
relationships of pulse and the concomitant belief in an absolute tempo:
Tempo, though it qualifies and modifies [pulse, meter, and rhythm], is not itself a mode of
organization. Thus a rhythm or theme will be recognizably the same whether played faster or
slower. And while changes of tempo will alter the character of the music and perhaps influence
our impression of what the basic beat is (since the beat tends to be perceived as being moderate in
speed), tempo is not a relationship. It is not an organizing force... It is important to recognize that
tempo is a psychological fact as well as a physical one (p. 3).
Concurring with Cooper and Meyer, Kramer (1988) stated: "If we consider tempo as both the rate of
beats and the rate of information, then we can incorporate into this broad concept both the objectively
measured and the subjectively felt."
Physiological Basis of Tempo Consistency
The histories of performance practice and psychology teach us that people have long attempted to
define relationships between "real" time (physical, actual or clock-time) and "musical" time
(psychological, psychical or virtual time). For example, there is an ample and conflicting literature
documenting attempts to support the belief that human pulse serves as a physiological basis of time
sense and musical tempo. As early as 1696, Loulie constructed a pendulum with 72 different swing
durations in an attempt to measure the musical effect according to an average number of pulse strokes.
Winckel (1967) stated quite explicitly that this kind of measurement would not do. Jacques-Dalcroze
(1912) also supported the view that human heart provides a basis of rhythm. Jones (1976) noted that
with increased arousal by means of stimulants, familiar patterns of music unfold more slowly than
usual. Conversely, reporting on his experimentation with mescaline, Huxley (1960) found that music
perception did not distort. Further, when Fuchs (1953) "metronomized" Bach's Mass in B Minor--each
movement separately and on various days--he found that his beat was consistently near 80 beats per
minute. Fuchs concluded: "the pulse can certainly measure music. But just as certainly it does not rule
it." (p. 34).
Conversely, Radocy (1980) pointed out that people perceive music of varying rhythmic regularity and
tempo regardless to the speed of physiological processes. Moreover, measuring the principal tempo of
an extensive number of selected recordings known as the Carnegie set, Hodgson (1951) proposed that
all music is based on one fundamental psychological tempo range between 60 and 70 beats per minute,
and it is this psychological range that largely governs our decisions about musical tempo. From a
phenomenological point of view, Clifton (1984) made the following comment: "The "time sense"
cannot be attributed to a specific organ or physiological function. If the term makes sense at all, it can
only refer to the activity of human consciousness" (p. 56).

Empirical Studies of Tempo Consistency

Empirical investigations concerning tempo consistency were performed around the first half of the
century (Wallin, 1911; Frischeisen-Köhler, 1933; Miles, 1937; Harrison, 1941; Rimoldi, 1951;
Mishima, 1956). Most commonly, subjects were asked either to tap a telegraph key as their response
task or to use a metronome to indicate tempo. In Wallin's 1911 study, subjects listened to pairs of
different rates of a metronome and were asked to state which tempo was felt to be more appropriate.
There were considerable individual differences in the preferred rates. In fact, these ranged between the
extreme rates offered by the metronome. Braun (1927) asked subjects to produce a steady series of taps
at any rate they chose; he recorded the tapping rates of six subjects in 11 sessions, at intervals of several
weeks between each session. Braun found that subjects were relatively consistent in their preference
rate, and that the variance within subjects was very small in comparison to the variance between
subjects.
More recently, two studies by Clynes and Walker (1982, 1986) on temporal stability in musical
performance are worth noting. Repeated musical performance by the same musicians and of the same
compositions were timed over a number years. The research findings suggested a high degree of
consistency and precision in the execution of musical tempo. The researchers reasoned that music
appeared to engage and program a psychobiologic clock or clocks which functioned subconsciously but
gave conscious read-outs and thereby seemed to guide the performers' realisation of musical tempo in
an exact and stable manner. These findings are, in turn, consistent with the timing of a symphony
orchestra in several performances of the same compositions over several years at different music halls
of the world measured by Winckel (1962). Similarly, Wagner's (1974) timing of different performances
on the piano of the same piece by Herbert von Karajan showed highly consistent tempi.
With respect to this point about tempo consistency, Epstein (1985), claimed:
So powerful is this element of pulse that if one violates it by distortion of tempo, one runs
the risk of an unsuccessful performance. Such a distortion seems to be violating not only a
musical factor, but a biological one as well, one which sets ground limits to our aesthetic
perception (p.37).
In addition to these studies that employ traditional listening and performance tasks, of particular interest
are those investigations that ask the listener to make judgements about tempo with hardware that allows
for variable-speed control over the musical stimulus. Farnsworth, Block, and Waterman (1934)
designed a study, interestingly entitled "Absolute tempo," that examined whether there is one tempo
consistently associated with familiar waltz and fox-trot tunes. In that study, subjects (college students
unselected for musical ability) were blindfolded and placed in front of a Duo-Art reproducing piano
with the tempo lever in hand. The task was to place the lever at the position considered to give the
"proper tempo" for the tunes played by the Duo-Art piano. Subjects were also placed at a telegraph key,
so that they could tap the "proper tempo" for the same tunes; the taps were recorded on a polygraph.
According to the results, the variations of the means for the proper waltz tempo were slight but for the
fox-trot were equivocal in some degree. Results seem to suggest a mean of a controlling "absolute
tempo" of about 120 beats per minute. In addition, the findings reported positive correlations between
the tapping behaviour and the setting of the Duo-Art tempo lever, i.e. "between the more motor and the
more sensory aspects of tempo"(p. 233). Five years later, Lund (1939) repeated the above mentioned
study and arrived at similar findings, although in his experiment tempi for the waltz and fox-trot were
slightly faster.
The Farnsworth, et. al. and Lund research were important studies because of their use of real musical
stimuli with hardware that allowed subjects to have control over tempi. They were also limited in that
they only investigated popular ballroom dance music which subjects might associate with familiar body

movement. As Donington (1963) claims, "Dance steps can only be performed correctly within narrow
margins of speed." (p.392) Another criticism of this work must be directed at the impreciseness of the
apparatus, although it is obvious that the researchers did the best they could with the tools available at
the time.
Fifty-four years later, Halpern (1988) conducted a similar two-part study with college students
unselected for musical ability. It is remarkably similar in purpose and design to the 1934 work by
Farnsworth and his associates. However Halpern does not note the connection. In her two-part
investigation, nineteen well-known popular songs served as stimuli and were presented to subjects by
an Apple II computer controlling a synthesiser (Study 1). Instead of manipulating the tempo lever of a
player piano, as was the case in Farnsworth's study, subjects could change the tempo of the tunes by
manipulating the software interface on the computer until they sounded "correct." Moreover, instead of
tapping on a telegraph key, subjects were instructed to set a metronome to coincide with what they
imagined to be the "correct" tempo of the songs. Results reported a generally positive relationship
between the metronomic evaluations and the setting of the tempi on the computer, i.e. between
"imagined" and "perceived" correct or preferred tempi for each tune. The results are indeed similar to
those found by Farnsworth and his associates concerning the positive correlation between the tapping
task and the setting of the tempo lever. It was also found that imagined tempi seemed to regress to a
middle range of approximately 100 beats per minute, between the faster and slower perceived tempi. In
Study 2, though, which utilised 10 of the tunes of Study 1 and only the "imagery" task (i.e. the
metronome setting), it was reported that the mean preferred tempo was 109 beats per minute,
significantly faster than the mean imagined tempo from Study 1 and much closer to the mean tempo of
120 beats per minute reported in Farnsworth et al. study. Both parts of Halpern's research suggest that
familiar, popular tunes are represented in our mind with a particular tempo.
Interesting as these results may be, they do not demonstrate whether judgements of correct tempo are
consistent across separate trials over an extended period of time, especially when subjects are presented
with musical compositions chosen because they represent a wide range of musical styles and
familiarity. It also seemed important to investigate how tempo judgements might differ among subjects
with different musical backgrounds.
To investigate these issues, Lapidaki & Webster (1991) conducted a study in which subjects were 15
highly experienced musicians (5 composers, 5 performers, and 5 music education specialists) recruited
from a pool of professors and graduate students of a School of Music in the Midwestern United States
and 5 nonmusicians who were professors and graduate students from other departments of the
university and had little formal music education and involvement in musical activities. Three music
examples (e.g., J. S. Bach's "Air in D Major" from the Suite Number 3 in D major; F. Chopin's Prelude
Number 7, Op. 28, and A. Schoenberg's second piece from "6 kleine Stücke," op. 19) were chosen
because they represented a wide range of musical styles and familiarity. All subjects were tested
individually at three sessions at three-day intervals. For each of the three testing sessions, subjects were
asked to make correct tempo judgements of each of the three compositions. The initial tempo of the
presentation of the compositions was systematically in each session.
The findings of Lapidaki & Webster's study (1991) showed that when tempo is judged by highly skilled
musicians in repeated listening tasks of the same compositions, initial tempo has a dominant effect on
correct tempo judgements. Simply stated, no single correct tempo emerged as a consistent entity of
individual or group performance across the three trials. The sample of adult nonmusicians indicated a
basis for a similar conclusion. Nevertheless, this tended to vary according to the composition in
question. These results did not support the observations reported by Farnsworth, et. al. (1934), Halpern
(1988), and Levitin & Cook (1996) that one tempo is consistently associated with particular listening
examples. On the contrary, listeners' perceptions of correct tempo for a particular composition varied

dramatically from trial to trial. Few statistically significant differences in consistency of tempo
judgements were found as a result of musical background and compositional style. Many of these
tendencies suggested important questions for further study.
It was obvious, however, that additional work was necessary with larger and more varied musical
samples and with better measures of individual familiarity with, and preference for judged
compositions. Also of interest would be how these judgements may differ among subjects from
different age groups and musical background.
The majority of empirical studies on tempo perception have been carried out on adults (Farnsworth et.
al., 1934; Halpern, 1988; Hodgson, 1951; Lapidaki & Webster, 1991; Levitin & Cook, 1996; Lund,
1939). However, there is general agreement that the experience of musical time is not separable from
the subjects' age (Bamberger, 1994; Petzold, 1966; Shuter-Dyson & Gabriel, 1981; Zenatti, 1993). To
counter this deficiency, it has proved necessary to investigate the following question: Is the capacity for
consistent tempo judgements for particular pieces of music affected by the age of listeners (e.g.,
preadolescents, adolescents, and adults)? Once the age question has been answered, it might be then
possible to set varied music educational standards for each age level by considering the often
overlooked development of temporal perception in students and, in turn, create a more effective
condition for the growth of musical experience.
Furthermore, the capability to perceive different musical parameters, such as tonality, harmony, form,
and rhythm, without being able to identify and analyse them, is considered to be the outgrowth of
implicit musical knowledge or acculturation (Hargreaves, 1986; Francès, 1988; Bigand, 1993). In other
words, in this situation what listeners know is not something they are aware of knowing, but rather it is
acquired from knowledge that is implicitly or subconsciously built into their auditory systems through
common everyday exposure to music in their cultural environment. There is general agreement among
researchers, on the other hand, that this knowledge becomes explicit or conscious only after musical
training (Dowling, 1993). In essence, musicians presumably possess a fuller understanding and
appreciation of a piece of music, due in part to their ability to possess a sophisticated scheme or set of
rules for encoding its musical events in terms of musical meanings and, thus, to assign to it a stable
structural description (Sloboda, 1994; Dowling, 1994; Wolpert, 1990; Lerdahl & Jackendoff, 1983).
The study was therefore concerned whether the musical background of listeners, that is, the level of
formal music education and/or participation in specialised musical activities, affected the consistency in
the perception of the correct tempo.
Purpose
The present study was designed to investigate the consistency of "correct" tempo as it might exist in
compositions of of various musical styles among when evaluated by subjects with differing musical
background, age, familiarity with, and preference for selected music. It should be noted that the study
was about the extent to which individuals can set consistent tempi across four separate trials: no attempt
was made to establish whether or not these tempi were correct as compared with those set by the
composers in the original pieces. Along these lines, it was reasoned that if a correct tempo did exist,
subjects ought to be able to arrive at consistent judgements about the tempo of examples despite the
examples being presented with differing initial tempi in every session.
Is there an "absolute" or "right" tempo" which may be considered as a unifying construct of the music
examples chosen and whose function is the synthesis of finite, juxtaposed musical elements in relation
to "real" time? Is the concept of a particular tempo represented in the mind as a consistent musical
entity like pitch, perhaps due to a distinct, yet unconscious, psychobiological clock "programmed"
during the listening process?

To investigate these issues, we reasoned that if an "absolute" tempo did exist, subjects ought to be able
to arrive at a consistent decision about the tempo of examples if these judgements occurred over a
period of several days and if the initial tempo of each hearing was varied systematically. We also
wondered whether listeners from different age groups with high levels of formal music education and
listeners with little formal music education would demonstrate different levels of consistency. Finally,
what effect would the style of the listening examples have on consistency of judged tempo?
Research Questions
(1) Is there a consistent judgement of correct tempo across four separate sessions of the same musical
examples using varying initial tempi for each trial?
(2) Is the consistency of tempo judgement affected by the age of the listener?
(3) Is the consistency of tempo judgement affected by the musical background of the listener?
(4) Is the consistency of tempo judgement affected by the style of music?
(5) Is the perception of tempo affected by the familiarity with
a) the individual pieces and
b) their overall style?
(6) Is the consistency of tempo judgement affected by the listener's preference/liking for a particular
musical example?
Methodology
Apparatus
The software program employed for both recording and playback of performance data was the
professional MIDI sequencing program Performer from Mark of the Unicorn. This program was chosen
in large part because of its ability to alter the graphic window display on the computer screen so that the
metronome controls could be easily manipulated. In addition, the program had the capacity to vary the
tempo precisely, without altering any other musical attributes (e.g., pitch, timbre, articulation, etc.).
The tempo of each musical example-that is, the initial tempo-could be easily set by the experimenter
prior to each session of each musical example. The mouse was used by the experimenter to manipulate
the tempo, following the explicit directions of each subject. Set in manual tempo mode, the tempo slider
of the graphic window display on the Macintosh was used to display and change the tempo in real time
in the metronome window. To change tempo, the experimenter dragged the triangular indicator along
the slider: to the left decreased the tempo, to the right increased it. The experimenter could also use the
arrows at either end of the slider: the + (plus) arrow increases the tempo and the - (minus) arrow
decreases it. Subjects were not asked to use the mouse themselves, since to do so would have required
training for a number of subjects.
Selection of Musical Examples
In all trials subjects listened to the following six compositions: C-major and A-minor Two-Part
Inventions by J. S. Bach (Bach I and II, respectively), Clair de Lune by Claude Debussy, Piano Piece
by Michalis Lapidakis, Yesterday by the Beatles, and The Children of Piraeus (Never on Sunday) by
Manos Hadjidakis. These works were chosen because they represented a wide range of musical styles
(e.g., Baroque, Impressionistic, contemporary idiom, rock ballad, and dance music), familiarity, and
preference.

Subjects
Subjects (n=90) were recruited from different age groups-30 adults (25-52 years, 30 adolescents (junior
and senior high school students), and 30 preadolescents (fifth and sixth grade children). Individuals of
each age group were selected on the basis of musical background and willingness to participate. Within
each age group, half the subjects were musicians, half were nonmusicians.
Procedures
For the four testing sessions, subjects were asked to listen to each composition and tell the experimenter
to alter the tempo upwards ("faster') or downwards ("slower") until the tempo was right; that is, the
most appropriate tempo for that composition, in the opinion of the listener. Once the six compositions
were judged, the subject was asked to return in at least four days time for the next session. This slow
pacing of trials was observed in order to prohibit memory carryover from one trial to another.
Each session for each subject systematically varied the order of the compositions and the initial tempo
(I.T.) of the listening examples in order to eliminate the possibility on contextual cues. Two initial
tempi have been used: M.M. q =20 (slow I.T.) and M.M. q=200 (fast I.T.); all tempo judgements in the
Lapidaki & Webster study (1991) had lain within this range. Each initial tempo was repeated twice:
either in the first and third or in the second and fourth trials.
In order to examine subjects' familiarity with the listening examples a questionnaire form was handed to
them at the beginning of the first testing session. Subjects had to answer questions concerning their
familiarity with the particular example and its relevant musical style, after they judged the correct
tempo of each example.
Finally, with regard to the question of their individual preference/liking for a particular musical
example, subjects were asked to rate it on a scale ranging from 1 (least-liked or poor) to 4 (most-liked
or excellent), after they judged the correct tempo of the example at the fourth testing session. This
information was recorded and used in later analyses.
Results
To test the hypothesis that listeners would render consistent tempo judgements, independently from the
initial tempi, a one-way repeated ANOVA for each musical example was performed using tempo
judgements at each of the four trials as the independent variable. The .05 level of significance was
adopted as the alpha level for these tests.
Results for these analyses show that listeners did not exhibit significant consistency in their judgements
of the most appropriate tempo of the musical examples across the four trials (Bach I, F=84.43, p <
.0001; Bach II, F=86.27, p< .0001; Debussy, F=80.37, p< .0001; Lapidakis, F=139.07, p< .0001;
Beatles, F=59.02, p< .0001; Greek dance, F=78.856, p< .0001).
Further examination of the results revealed that both means of tempo judgements for the trials with the
fast initial tempi were higher than the means for the trials with the slow initial tempi with respect to all
musical examples: the slower initial tempo generally evoked slower preferences, and so on.
Furthermore, in order to ascertain which age group exhibited the highest degree of consistency, the
individual deviation scores (IDS) averaged over the four trials of each piece were used as an additional
measurement of tempo judgement consistency for each musical example. IDS reflects the standard
deviation of the four different tempo judgements (Y1, Y2, Y3, and Y4) at the four trials for an
individual. IDS gives a more global sense for the deviations of each group. IDS was used as a primary
response variable to answer questions about consistency associated with other factors of interest such as

age, musical background, familiarity, and preference.

As shown in Table 1, results clearly indicated that adults were the most consistent and preadolescents
the most inconsistent with regard to all musical examples (p < .001). In other words, the following
consistency scale for all musical examples was observed with respect to the three age groups:
preadolescents < adolescent < adults.
Table 1
Cell Means for Individual Deviation Scores (IDS) Averaged over the Four Trials Arranged by Musical
Example and Subjects' Age Groups from ANOVA Procedure
MUSICAL AGE GROUPS
EXAMPLES PRE-ADOLESCENTSa ADOLESCENTSa ADULTSa
M SD M SD M SD F
BACH I 51.168 35.200 39.592 35.200 18.592 25.733 8.90*
BACH II 49.654 26.973 37.046 21.557 22.769 23.376 9.37*
DEBUSSY 48.962 23.336 42.058 22.661 15.511 20.434 19.02*
LAPIDAKIS 64.351 28.047 55.758 30.590 37.551 30.765 6.31*
BEATLES 43.466 21.085 15.260 12.264 11.094 17.947 30.44*
GREEK 41.404 26.969 36.685 20.589 14.613 18.628 12.29*

DANCE
Note. N=90. an =30.

P < .001
To examine the effect of musical background, a repeated measures MANOVA for each musical
example was employed using tempo judgements of each trial and musical background as variables. The
reason that the regular repeated measures univariate analysis of variance (ANOVA) was not performed
was that data did not exhibit sphericity. The results showed that the musical background of the listener
did not significantly affect consistency of tempo judgements for all six pieces (Bach I, F=0.79, p < .01;
Bach II, F=0.73, p < .01; Lapidakis, F=0.73, p < .01; Beatles, F=0.47, p < .01; Greek dance, F=0.97, p
< .0001), with the exception of the Debussy composition (F=4.00, p < .01).
Furthermore, to investigate whether musicians were more consistent than non-musicians, an
independent samples t-test was performed which used IDSs among the trials of each piece as an
additional measurement of tempo judgement consistency. The results clearly indicated that the only
time musicians and non-musicians differed in consistency of tempo judgements was in Debussy (p <
.01).
To answer the fourth question regarding differences between compositional styles, a repeated measures
ANOVA using style as the experimental factor (five levels corresponding to the five different musical
styles of the pieces) and the mean number of IDS averaged over the four trials of each style as the

response variable was performed. The results revealed that the style of rock ballad exhibited the highest
degree of consistency (M=23.27, SD=22.54) followed by the styles of Greek dance music (M=30.90,
SD=25.02), Impressionism (M=35.510, SD=26.29), and Baroque (M=36.51, SD=29.53 (Bach I,
M=36.53 & Bach II, M=36.49)), respectively (F=13.68, p < .0001). The tempo judgements for the
contemporary idiom were the less consistent among all styles (M=52.55, SD=31.56). In other words,
the following consistency scale with respect to the musical styles was observed in subjects' tempo
judgements: Rock ballad < Greek dance music < Impressionism < Baroque < Contemporary idiom.
A repeated measures MANOVA was performed using tempo judgements for each example averaged
over the four trials and the 5 familiarity levels as variables. Results indicated that familiarity with
musical examples significantly influenced tempo judgements (p< .001).
Furthermore, a repeated measures MANOVA was employed using tempo judgements averaged over the
four trials and preference levels as variables. Results revealed that tempo judgements were significantly
affected by subject's preference for the musical examples (p < .05).
The musical ability of 'absolute tempo'
A closer look at the range separating the fastest from the slowest tempo judgements of individual
subjects for each piece often revealed strikingly small discrepancies. It appears that a relatively small
number of listeners (e.g., adult musicians and non-musicians) possess an exceptional ability with
respect to acute stability of large-scale timing in music. This ability to give over time consistent tempo
judgements to a piece of music in conditions seemingly devoid of an external tempo reference (a score
or the body interaction involved in performance) may be referred to as absolute tempo, analogous to
absolute pitch.
It must be also noted that "absolute tempo" has been observed with musical examples that were
thoroughly known by the subjects. Nevertheless, this finding should be treated with caution, since these
subjects did not exhibit the ability of absolute tempo with respect to all pieces for which they had the
same level of familiarity. Contrary to absolute pitch, one might suppose with respect to absolute tempo
that the same person seems to follow different cognitive strategies of timing for each individual piece,
which leaves one wondering whether the stability in viewpoint is to some extent discrete rather than
continuous.
Interestingly enough, these subjects reported that they were surprised when they heard that their right
tempo choices were virtually identical across trials. Thus, it would seem that physical, psychological,
and environmental factors, such as, fatigue, mood or time of day, did not have an effect on their tempo
judgements. One reason might be that music engages and programs psychobiological clocks or neural
oscillations (Goody, 1977; Epstein, 1985; Clynes, 1986; Pöppel, 1990) which function subconsciously
but give conscious read-outs and thereby guide the listeners' choice of right tempo in an exact and stable
manner.
Recommendations for Music Education
Perhaps the most important insight gained from this study is that right tempo judgements lie deeply
within the human ear which intuitively attempts to supply its own right tempo to melody, phrasing,
harmony, rhythm, and other long-scale musical events, in order to ensure their meaningful coordination
and motion through real time. Along these lines, it becomes obvious that music educators can guide
students to achieve a better sense of recognition and mastery of all kinds of relations in a piece of music
by helping them develop a more refined or discerning concept of tempo (Lapidaki, 1992 & 2000).
To help students of all ages to find a use for the concept of tempo in music, music educators may
consider the design of this research which proposes a fascinating, creative, and- most importantly-an

intrinsically musical activity reflecting our need to organise and control the passage of time in music by
means of digital technology (Lapidaki, 1990).
In this context, the finding that most listeners did not exhibit the musical ability of absolute tempo
becomes a secondary issue. Indeed we all vary in the abilities with which our aesthetic perceptions
operate. After all, we are not metronomes.
References
Aldwell, E. & Schachter, C. (1978). Harmony and voice leading (Vol. 1). New York:
Harcourt Brace Jovanovich.
Bamberger, J. (1994). Coming to hear in a new way. In R. Aiello (Ed.), Musical
perceptions (pp. 131-151). New York: Oxford University Press.Barry, 1990
Berry, W. (1986). Form in music (2nd edition). Englewood Cliffs, N. J.: Prentice-Hall.
Bigand, E. (1993). Contributions of music to research on human auditory cognition. In S.
McAdams & E. Bigand (Eds.), Thinking in sound. The cognitive psychology of human
audition (pp. 231-277). Oxford, UK: Clarendon Press.
Braun, F. (1927). Untersuchungen über das persönliche Tempo [Investigations on the
personal tempo]. Archiv der gesamten Psychologie, 60, 317-360.
Clifton, T. (1984). Music as heard: A study in applied phenomenology. New Haven: Yale
University Press.
Clynes, M. (1986). When time is music. In J. R. Evans & M. Clynes (Eds.), Rhythm in
psychological, linguistic, and musical processes (pp. 169-2240). Springfield, IL: Charles
C. Thomas.
Clynes, M., & Walker, J. (1982). Neurobiologic functions of rhythm, time and pulse in
music. In M. Clynes (Ed.), Music, mind and brain: The neuropsychology of music (pp.
171-216). New York: Plenum Press.
Clynes, M., & Walker, J. (1986). Music as time's measure. Music Perception, 4 (1),
85-119.
Cooper, G., & Meyer,, L. (1966). The rhythmic structure of music. Chicago: The
University of Chicago Press.
Donington, R. (1963). The interpretation of early music. New York: St. Martin's Press.
Donington, R. (1980). Tempo. In S. Sadre (Ed.), New Grove's dictionary of music and
musicians (Vol. 18). New York: Macmillan.
Dowling, W. J. (1994). Melodic contour in hearing and remembering melodies. In R.
Aiello (Ed.), Musical perceptions (pp. 173-190). New York: Oxford University Press.
Dowling, W. J. (1993). Procedural and declarative knowledge in music cognition and
education. In T. J. Tighe & W. J. Dowling (Eds.), Psychology and music. The
understanding of melody and rhythm (pp. 5-18). Hillsdale, NJ: Erlbaum.
Epstein, D. (1985). Tempo relations: A cross-cultural study. Music Theory Spectrum, 7,

34-71.
Farnsworth, P., Block, H., & Waterman, W. (1934). Absolute tempo. Journal of General
Forte, A. (1979). Tonal harmony in concept and practice. New York: Holt, Rinehart &
Winston.
Francès, R. (1988). The perception of music (trans. by W. J. Dowling). Hilsdale, NJ:
Erlbaum.
Fuchs, C. (1953). Rhythm and tempo: A study in music history. New York: W.W. Norton.
Goody, W. (1958). Time and the nervous system. The Lancet, 7031, 1139-1141.
Gould, G., & Page, T. (Winter 1982-83). Excerpts from an interview with Tim Page. The
Piano Quarterly, 120, recording.
Halpern, A. R. (1988). Perceived and imagined tempos of familiar songs. Music
Perception, 6 (2), 193-202.
Hargreaves, D. J. (1988) The developmental psychology of music. Cambridge, MA:
Cambridge University Press.
Harrison, R. (1941). Personal Tempo. Journal of General Psychology, 24 & 25, 343-379.
Hodgson, W. (1951). Absolute tempo: Its existence, extent, and possible explanation.
Proceedings of the Music Teachers National Association, XLIII, 158-169.
Huxley, A. (1960). The doors of perception. London: Ghatto & Windus.
Jacques-Dalcroze, E. (1912). Rhythm, music and education. London: Ghatto & Windus.
Jones, M. R. (1976?). Times our lost dimension: toward a new theory of perception,
attention, and memory. (ed: We are still running down this citation.)
Kirkpatrick, R. (1984). Interpreting Bach's well-tempered clavier: A performer's discourse
on method. New Haven: Yale University Press.
Kramer, J. D. (1988). The time of music: new meanings, new temporalities, new listening
strategies. N. Y.: Schirmer Books.
Lapidaki, E. (2000). Stability of tempo perception in music listening. Music Education
Research, 2 (1), 25-44.
Lapidaki, E. (1992). Time. In B. Reimer & J. Wright (Eds.), On the nature of musical
experience (pp. 246-248). Niwot, CO: The Universiry Press of Colorado.
Lapidaki, E., & Webster, P. (1991). Consistency of tempo judgements when listening to
music of different styles. Psychomusicology, 10 (1), 19-30.
Lapidaki, E. (1990, July). L' imagination au pouvoir: Some riddles on the issue. Paper
presented at the International Symposium on Research and Teaching in the Philosophy of
Music Education, Bloomington, IN.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA:
MIT Press.

Lester, J. (1982). Harmony in tonal music (Vol. 2). New York: Alfred A. Knopf.
Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: additional evidence that
auditory memory is absolute. Manuscript submitted for publication.
Loulie, E. (1696). Elements ou principes de musique [Elements or principles of music].
Paris: Presses Universitaires de France.
Lund, M. (1939). An analysis of the "true beat" in music . Unpublished doctoral
dissertation, Stanford University.
Margulis, V. (1984). Tempo relationships in music. Isny, Germany: Rudolf Wittner, GmbH
and Co.
Miles, D. W. (1937). Preferred rates in rhythmic response. Journal of General Psychology,
16, 427-469.
Mishima, J. (1956). On the factors of mental tempo. Japanese Psychological Research, 4,
27-38.
Pö ppel, E. (1990). Unmusikalische Grenzüberschreitungen? [Unmusical crossings of
limits?].In C. R. Pfaltz (Ed.), Musik in der Zeit [Music in time] (pp. 105-124). Basel,
Switzerland: Helbing & Lichtenhahn.
Petzold, R. G. (1963). The development of auditory perception of musical sounds by
children in the first six grades. Journal of Research in Music Education, 21, 99-105.
Piston, W. (1978). Harmony. New York: W. W. Norton.
Radocy, R. E. (1980). The perception of melody, harmony, rhythm, and form. In D. A.
Hodges (Ed.), Handbook of music psychology. National Association for Music Therapy.
Reckziegel, W. (1961). Musikanalyse: Eine exakte Wissenschaft? [Musical analysis: An
exact science?]. In H. Heckmann (Ed.), Elektronische Datenverarbeitung in der
Musicwissenschaft [Electronic data processing in musicology].Regensburg: Gustav Bosse
Verlag.
Reinecke, H. P. (1974). Vom musikalischen Hören zur musikalischen Kommunikation
[From musical hearing to musical communication]. In B. Dopheide (Ed.), Musikhören
[Musical hearing]. Darmstadt: Wissenschaftliche Buchgesellschaft.
Rimoldi, H. J. A. (1951). Personal tempo. Journal of Abnormal and Social Psychology, 46,
280-303.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability (2nd ed.).
London: Methuen.
Sloboda, J. A. (1994). Music performance: expression and the development of excellence.
In R. Aiello (Ed.), Musical perceptions (pp. 152-172). New York: Oxford University Press.
Wagner, C. (1974). Experimentelle Untersuchungen über das Tempo [Experimental
investigations of tempo]. Österreichische Musikzeitschrift, 29, 589-604.
Wallin, J, (1911). Experimental studies of rhythm and time (Part 1 & Part 2).
Psychological Review, 18, 100-133 & 202-222.

Winckel, F. (1967). Music, sound, and sensation: A modern exposition. New York: Dover
Publications.
Winckel, F. (1962). Optimum acoustic criteria of concert halls for the performance of
classical music. The Journal of the Acoustic Society of America, 34 (1), 81-86.
Wolpert, R. S. (1990). Recognition of a melody, harmonic accompaniment, and
instrumentation: musicians and nonmusicians. Music Perception, 8, 95-106.
Zenatti, A. (1993). Children's musical cognition and tast. In T. J. Tighe & W. J. Dowling
(Eds.), Psychology and music. The understanding of melody and rhythm (pp. 177-196).
Hillsdale, NJ: Erlbaum.
SHORT NOTE OF BIOGRAPHIC DETAILS
Eleni Lapidaki is a professor of music education and psychology at the Department of Musical Studies,
Aristotle University of Thessaloniki, Greece, and a research fellow at the Center for the Study of
Education and the Musical Experience (CSEME), School of Music, Northwestern University, U.S.A.
Her dissertation from Northwestern University has been given the "Outstanding Dissertation in Music
Education Award" by the Council for Research in Music Education (CRME) at the 1998 Music
Education National Conference (MENC) in Phoenix, AZ. Her research concerns a closer interaction
between the artistic, scientific, and pedagogical aspects of temporal experience in music. It was
published in the book On the Nature of Musical Experience (Eds. Bennett Reimer & Jeffrey Wright), in
Psychomusicology, co-authored by Peter Webster, and Music Education Research (Vol. 2, No.1, 2000).
She also presented her research at the 1999 "Research in Music Education: An International
Conference," University of Exeter, and at the 1999 Conference of the Society of Music Perception and
Cognition (SMPC).
Back to index

AN EMPIRICAL INVESTIGATION OF DEVELOPMENTS IN JOINT ATTENTION FOLLOWING STRUCTURED MUSIC WORKSHOPS
Proceedings paper
An empirical investigation of developments in joint attention following structured music

workshops
Raymond A.R. MacDonald
Department of Psychology, Glasgow Caledonian University, Cowcaddens Road, Glasgow G4 0BA,
UK.
Telephone: + 44 (0) 141 331 3971. Fax: + 44 (0) 141 331 3636
Patrick J. O'Donnell
Department of Psychology, University of Glasgow, Adam Smith Building, Hillhead, Glagsow, G12
8RT, UK.
Introduction
Previous research has demonstrated developments in musical and psychological variables for
individuals with learning disabilities following a 10-week music intervention (MacDonald, O'Donnell
& Davis 1999). The psychological mechanism underpinning these developments was highlighted an
area for future research and is the focus of the study presented here. The purpose of this study was to
investigate developments in joint attention made by individuals with a learning difficulty who
participated in structured music workshops.
Individuals with learning disabilities are one of the main target populations for music interventions
with a therapeutic focus (Aldridge, 1993; Oldfield & Adams, 1990; MacDonald & O'Donnell, 1996).
In addition, a wide range of outcomes has been claimed in the published research, including
behavioural improvements, reduction in anxiety, improved motor coordination and enhanced
communication skills (Aldridge, 1993). Moreover, a number of authors have suggested that music
interventions may offer an environment within which individuals with an learning disability can
develop social, cognitive, and physical skills that may enhance their life experiences (Aldridge, 1993;
Oldfield & Adams, 1990; Wigram, 1995). It has been noted, however, that there is still a need for
empirical research that further investigates the process and outcomes of therapeutic and educational
music programmes for individuals with learning disabilities (Radhakishnan, 1991; Purdie, 1997;
MacDonald O'Donnell & Davies, 1999; Schalkwijk, 1994). Such studies will develop understanding
of the nature of music interventions in a modern health care context (Bunt, 1994; Oldfield & Adams
1990; MacDonald, O'Donnell, & Dougall, 1996; Wigram, 1995).
MacDonald O'Donnell and Davies (1999) reported studies that focused on investigating the outcomes
file:///g|/poster1/MacDonal.htm (1 of 8) [18/07/2000 00:29:20]

of music programmes for individuals with learning difficulties. The results highlighted significant
improvements in musical ability, communication skills, and self-perception of musical ability. While
the studies produced results supporting the efficacy of the intervention, the psychological mechanism
underpinning these developments remained unclear and a crucial area for future research. It is this
issue that the present paper addresses.
Joint attention
Joint attention structure is well explored in language development work (Hughes, 1998; Morales,
Mundy, & Rojas, 1998; Sigman, 1998). For example, mothers who spend a longer time in
linguistically active joint attention have children with larger vocabularies and more developed
syntactic structures (Tomasello 1992, 1995; Tomasello & Todd, 1983). In this context, joint attention
is defined as a shared focus of attention to the same object by caregiver and child. A crucial point here
is that the music workshop environment and the environment in which the participants were tested
contained examples of classic joint attention situations.
Recent evidence suggests that joint attention is disrupted in children of non-typical development
(Harris, Kasari, & Sigman, 1996; McCathren, Yoder, & Warren, 1995). For example, children with
Down syndrome find situations of joint attention particularly difficult (Kasari, Freeman, Mundy, &
Sigman, 1995; Roth & Leslie, 1998). The problems found with the management of joint
attention-based language in children with Down syndrome may have implications for
developmentally disadvantaged groups generally, since the basic mechanism involves the
management of a limited cognitive resource in what is a dual task situation. A comparison of
handicapped and non-handicapped infants argued that joint attention deficits could be ameliorated by
an appropriate intervention strategy (Yoder & Faran, 1986). Given the nature of the Gamelan
workshop environment, it is suggested here that development in joint attention is a possible benefit for
the participants.
To further understand this link between joint attention and the workshop/assessment environment, it is
important to consider in more detail the characteristics of joint attention situations. Individuals must
maintain attention on the object, orientate to the speaker, perhaps by changing visual direction,
reorientate to the object, retain a working memory representation of the speaker's utterance, apply it to
the object in hand, reconstitute the object as a focus of shared attention, and ensure that the correct
features and implications are being shared between the participants, perhaps recheck by using NVC
cues, and finally organise some response. All of these skills are involved in the musical and
assessment tasks used in this experiment.
In summary, MacDonald, O'Donnell and Davies (1999) have reported musical and psychological
developments as a result of a music intervention. The reasons as to why these developments take
place, however, are an important area for research. Given that individuals with learning difficulties
have been shown to have deficits in joint attention, and that the musical and assessment environment
contain examples of joint attention situations, it is suggested that participants at the music workshops
will display development in joint attention
Methods
Participants

The study contained 40 particpants. These individuals had either mild or moderate learning disabilities
and, at the time of the study, were receiving health care from a number of institutions in central
Scotland. Participants were randomly assigned to either the experimental or an intervention control
group. Although 20 individuals (10 males and 10 females) were originally pre-tested in the
experimental group, one female was dropped from the study because of illness, and only 19 (10 males
and 9 females) participants were post-tested and used in the analysis. The chronological age ranged
from 17 to 58 years and their mean age was 40.4 years (S.D.=8.41). The intervention control group
contained 21 participants. The chronological age range of the group was 25 - 43 with a mean age 37.6
(S.D.=7.01).
Equipment
The music intervention was a percussion-based workshop programme focused upon playing a series
of instruments from Indonesia known as a Javanese Gamelan. Gamelan is a generic name for a set of
tuned percussion instruments consisting of gongs, metallophones, cymbals, and drums that can be
found throughout Malaysia and Indonesia and range in size from 4 to 40 instruments (Lindsay, 1989).
Use of the Gamelan for a population of individuals with learning disabilities was outlined by Sanger
& Kippen (1987), who describe a particular musical and social event in which the Gamelan was used
as part of a 2-week music programme.
Participants in both groups were pre- and posttested on the following validated measurement
instruments: (a) The Elmes test of musical attainment (MacDonald & O'Donnell, 1994, 1996; (b) The
communication assessment profile for adults with a mental handicap (CASP) part 2 section 3 (van der
Gaag, 1988, 1989,1990); and (c) Self-Perception of Gamelan ability visual analogue question
(MacDonald & O'Donnell, 1996). In addition, all assessment sessions were video taped using a VHS
video recorder.
Procedure
Ethical approval was obtained from The Greater Glasgow Research and Ethical Committee, and
participation in both the experimental and control group was voluntary. Participants were able to
withdraw at any time. Participants in the experimental group were pre-tested and then attended
weekly workshops for 10 weeks. These workshops lasted approximately 90 minutes and began with
rhythm exercises. The purpose of this warm-up session was to relax the group and help set up
cohesive group dynamics that are essential to the success of a workshop. The rest of the time was
usually given over to playing the Gamelan. Various methods were employed by the workshop leader
to communicate the musical ideas to the participants. Initially, participants were asked to repeat a
rhythmic pattern being played on one of the Sarons. More complex patterns were played as the
workshop progressed, and there was opportunity for improvisation within the context of any piece of
music. The participants also had the opportunity to select a particular part of the Gamelan. The
emphasis was on group involvement and rhythmic awareness through musical participation while at
the same time attempting to cater for the individual needs of participants.
The organisational vehicle for delivery of the musical intervention was Sounds of Progress (SOP), a
music and theatre company that draws 75% of its musicians from within the special needs sector. SOP
provides opportunities for people with special needs to explore their creativity through music.
Ongoing work includes providing music therapy, delivering music workshops, and recording and
performance projects. The company encourages musicians to develop their skills to the highest
standard and has an explicit educational objective in terms of developing the musical skills and

awareness of all individuals who participate in SOP activities. The company focuses on enhancing a
wide range of musical skills and developments in rhythmic ability on percussion instruments; singing
skills and compositional and improvisational skills are some of the objectives.
Participants in the intervention control group were pre-tested and then attended communal cooking or
art classes once a week for 10 weeks. The art group met once a week for approximately 90 minutes.
There were two groups of six individuals with one occupational therapist present in each group. The
two groups each worked on a separate piece of art. This was a large painting (10m X 5m) to be
completed over the 10-week period. The cooking group also met once a week for 10 weeks with two
sub groups containing six individuals. The class lasted approximately 90 minutes with one
occupational therapist present for each group. Each week the groups would prepare and eat a meal
together. Following the 10-week programme, all participants were post-tested.
All the assessment sessions during pre and posttesting were video taped and key aspects of non verbal
communication were focused upon for the analyses. Conceptually, the foci of measurement were
indicators of attention structure, joint attention structure, and divided attention (between the
experimenter and the task). Since the different aspects of the testing situation have different task
demands, measures were constructed for the different components. The order of testing involved first
a clapping sequence or set of sequences, then a Gamelan tone copying session, a Gamelan note
sequence set of tasks, and a communication testing session. Also included was self-assessment of
musical confidence. The Communication task involves two phases: scanning a set of pictures on a
page and identifying the one demanded by the experimenter followed by responding to the
experimenter pointing at a picture; and asking what the object is and what it does. As has been argued
earlier, all the tasks involve examples of joint attention. The targets of the operationalised measures
were precisely these features of joint attention.
In the clapping task, joint attention was defined as both participants focusing on the experimenter's
hands during the clapping, or the participant focusing in pace when listening to the sound. Some
participants had trouble doing either and allowed their attention to wander, usually by looking at the
experimenter's facial cues. Times were measured to the second and expressed as a percentage of time
in joint attention. The number of times the participant interrupted the experimenter was also recorded
since it is a measure of failed attention. The participant has failed to appreciate the shared object.
On the Gamelan sequence task, the number of times gaze was directed away from the keyboard
during the experimenters' demonstration was measured. On the CASP, two measures were taken. The
participants' delay in pointing, timed from the moment at which the experimenter finished the
instruction, was recorded (repeated or misheard instruction trials were ignored). Also, the participants'
delay in giving the name was recorded on the second part of the CASP. The CASP sessions involved
participants switching attention backwards and forwards from the experimenter to the task in hand,
the CASP booklet. Some participants coped with this by focusing gaze almost exclusively on the
booklet and relying on listening to the experimenter to orientate to the task demands. Others switched
attention to the experimenter at the beginning and end of the individual trials. Still others allowed
their attention to be taken up by checking the experimenter's facial cues even during any given trial.
The number of switches of attention was therefore recorded as a measure of attention allocation
strategy. Number of irrelevant comments made during the CASP session was used as an indicator of
attention distraction Both the experimental and the control groups underwent this assessment prior to
and after the interventions.
Results

Table one summaries the non-verbal communication measures for the control and experimental
groups. It highlights that there are significant improvements in 5 of the 6 measures of NVC for the
experimental group, while there are no significant changes in the control group on any of the
measures. Specifically, the experimental group show significant improvement in joint attention as
defined by: time watching the experimenter clap; delay in pointing to the communication booklet;
time studying the communication booklet; number of attention switches and number of irrelevant
remarks.
Table 1. NVC Measures for Control and Music Groups for T1 and T2: Sign Test
Variable Probability
Experimental Control
t1 t2 p t1 t2 p
Time Watching Exp.Clap %: 84 95 .01 78 87 .50
Delay Pointing to CASP 1.7 1.0 .01 2.0 1.9 22

(secs.)
Time Studying CASP Book % 73 95 .009 69 72 .77
N. of Attention Switches 7.9 4.2 .001 9.0 9.8 .72
N. of Irrelevant Remarks 2.5 .33 .004 4.2 8.6 .009
Delay on Answering 1.6 .95 .02 1.5 1.2 .22
Discussion
The results clearly show significant improvements on measures of non-verbal communicative
functioning in the experimental or music group, and not in the control group, which was exposed to a
joint social task (a cooking or art class). All the measures showed improvement from time one to time
two, with the exception of the interrupt measure, which did not offer sufficient variability to be
analysed. Note that the number of irrelevant comments goes up in the control group. Actually they get
more talkative in an irrelevant way as time goes on. What is perhaps more surprising is the lack of
progress made by this group. After all, they are tested on the CASP for the second time, and they
have, by the time of second testing, extensive experience of a social task. The control task, however,
does not show generalisation. The CASP improvement in the first instance and the related
improvements on the NVC communicative measures indicate that, for the experimental group, some
aspect of the music training is generalising to communicative competence. In operational terms, the
music group spends more time attending to the experimenter when he is giving instructions, but less
time when he is not. They listen rather than make comments, and they concentrate on the task when
the instructions finish without switching attention back and forth to the experimenter. They listen
when they should, and when they should concentrate on the CASP, they shut out the experimenter.

The earlier analysis of the cognitive demands of the CASP in the introduction emphasised the
complex nature of the assessment and its relationship to joint attention. Why should the music training
generalise to this task? The first point to be made is that music participants do get better on the music
variables and that this improvement correlates with improvement on the CASP (MacDonald &
O'Donnell, 1996). Given the importance of, for example, rhythmic processes to speech and to motor
coordination, it is possible that the training has developed mechanisms supporting language. A crucial
feature of the Gamelan training is its mix of music and cooperative sociability. The explanation for
some task generalisation should emphasise the precise nature of the experimental group's activity.
What marks the Gamelan is the combination of listening to instructions, paying attention to other
people's performance and appropriately executing one's own. It does involve executing a planned
sequence in the context of a joint attention task. If the joint attention parallels between the CASP and
Gamelan are pursued, then the question becomes: "Why does practice on one joint attention problem
generalise to another?" An answer needs brief reference to research on joint attention structures.
Recent work on the topic suggests that joint attention is subject to developmental processes interacting
with experiences, and that it is partly an acquired skill (Hughes, 1998; Roth & Leslie, 1998;
Tomasello, 1995). In the case of the Gamelan workshops, practice will improve the music task, and
also certain features of communication. Participants will improve their recognition of others' interrupt
and activity signals. But what do participants learn that generalises? One possibility is that they learn
to concentrate on one thing at a time. Participants can either attend to the music or try to worry about
others simultaneously. What they must learn to do is to inhibit other people at least for a time. The
skill that might generalise is learning not to attend to the social cues unless at particular moments. The
evidence from the CASP based nonverbal measures show that they are paying attention to one thing at
a time, suppressing irrelevant talk and looks, and being more focused on and quicker to the task
In conclusion, it can be argued plausibly that music interventions are an ideal focus for the
development of joint attention skills. Research evidence suggests these joint attention skills can be
taught (Girolametto, Verbey, & Tannock 1994). A common object that the child or client follows
naturally is the key element and it may be that music provides the ideal common focus of attention.
References
Aldridge, D. (1993). Music therapy research 1: A review of the medical research literature within a
general context of music therapy research. The Arts in Psychotherapy, 20, 11-35.
Bunt, L. (1994). Music therapy: An art beyond words. London: Routledge.
Girolametto, L., Verbey, M., & Tannock, R. (1994). Improving joint engagement in parent child
interaction: An intervention study. Journal of Early Intervention, 18, 155-167.
Harris, S., Kasari, C., & Sigman, M. D. (1996). Joint attention and language gains in children with
Down syndrome. American Journal on Mental Retardation, 100, 608-619.
Hughes, C. (1998). Executive function in preschoolers: Links with theory of mind and verbal ability.
British Journal of Developmental Psychology,16, 233-253.
Lindsay, J. (1989). Javanese Gamelan: Traditional orchestra of Indonesia. Oxford: Oxford
University Press.
Kasari, C., Freeman, S., Mundy, P., & Sigman, M. D. (1995). Attention regulation by children with
Down syndrome: Coordinated joint attention and social
referencing looks. American Journal on Mental Retardation, 100, 128-136. MacDonald, R. A. R., &

O'Donnell, P. J. (1996). The effects of structured music

workshops on the communication skills, musical ability, and self-perception of mildly and moderately
mentally handicapped individuals. In R. R. Pratt &
R. Spintge (Eds.), MusicMedicine Vol 2 (pp. 169-182). St. Louis: MMB Music. MacDonald, R.A.R.,
O'Donnell, P.J. & Davies, J.B. (1999). Structured music workshops for individuals with learning
difficulty: an empirical investigation. Journal of Applied Research in Intellectual Disabilities 12(3)
225 - 241.
MacDonald, R. A. R, O'Donnell, P. J., & Dougall, G. (1996). Research considerations: Bridging the
gap between qualitative and quantitative research methods. Paper presented at The 8th World Congress
of Music Therapy. Hamburg, Germany.
McCathren, R. B., Yoder, P. J., & Warren, S. F.(1995). The role of directives in early language
intervention. Journal of Early Intervention, 19, 91-101.
Morales, M., Mundy, P., & Rojas, J. (1998). Following the direction of gaze and language
development in 6-month-olds. Infant Behavior and Development, 21, 373-377.
Oldfield, A., & Adams, M. (1990). The effects of music therapy on a group of profoundly mentally
handicapped adults. Journal of Mental Deficiency Research, 34, 107- 125.
Purdie, H. (1997). Music therapy with adults who have traumatic brain injury and stroke. The British
Journal of Music Therapy, 11(2), 45 -51.
Radhakishnan, G. (1991). Music therapy - A review. Health Bulletin, 49(3), 195-199.
Roth,D., & Leslie, A.,M .(1998). Solving belief problems: Toward a task analysis. Cognition, 66,
1-31.
Sanger, A., & Kippen, J. (1987). Applied ethnomusicology: The use of the Balinese Gamelan in
recreational and educational music therapy. British Journal of Music Education, 4(1), 5- 16.
Schalkwijk, F. (1994). Music and people with developmental disabilities. London: Jessica Kingsley.
Sigman, M. (1998). Change and continuity in the development of children with autism. Journal of
Child Psychology and Psychiatry and Allied Disciplines, 39, 817-827.
Tomasello, M. (1992). The social bases of language acquisition. Social Development, 1, 67-87.
Tomasello, M. (1995). Joint attention as social cognition. In C. Moore & P. J. Dunham (Eds.), Joint
attention: Its origins and role in development (pp. 103-130). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Tomasello, M., & Todd, J. (1983) Joint attention and lexical acquisition style. First-Language, 4,
197-211.
van der Gaag, A. (1988). The communication assessment profile for adults with a mental handicap.
London: Speech Profiles.
van der Gaag, A. (1989). The view from Walter's window: Social environment and the
communicative competence of adults with a mental handicap. Journal of Mental Deficiency Research,
33, 221-227.

van der Gaag, A. (1990). The validation of a language and communication assessment procedure for
use with adults with intellectual disabilities. Health Bulletin, 48(5), 254-260.
Wigram, T. (1995). A model of assessment and differential diagnosis of handicap in children through
the medium of music therapy. In T. Wigram, B. Saperston & R. West, (Eds.), The art and science of
music therapy (pp. 181 - 194). London: Harwood Academic Publishers.
Yoder, P. J., & Farran, D. C., (1986). Mother infant engagements in dyads with handicapped and
nonhandicapped infants: A pilot study. Applied Research in Mental Retardation, 7, 51-58.
Back to index

Discrimination and Interference in the Recall of Melodic Stimuli: Musically trained and untrained adults versus children
Proceedings paper
Discrimination and Interference in the Recall of Melodic Stimuli: Musically trained

and untrained adults versus children.
Clifford K. Madsen
Center for Music Research
The Florida State University
Tallahassee, FL 32306-1180
clifford.madsen@music.fsu.edu
Discrimination and Interference in the Recall of Melodic Stimuli among School Children
This study represents an extension of research published over three decades ago concerning melodic memory
(Collings, 1966; Madsen, Collings, McLoed & Madsen 1969; Madsen & Staum, 1983). Research concerning
memory has been both long and continuous seemingly because memory appears to be an important ingredient in
human life, general education and especially in music education (Deutsch, 1971; 1972; 1973; 1975; Dowling,
1973; Murdock, 1974; Tanguiane, 1994).
It has been speculated that interference in memory comprises 85-95% of all forgetting. In verbal memory, the
detrimental effect of prior learning (proactive interference), and the detrimental effect of temporally later learning
(retroactive interference), are affected by the number of items in a sequence, by the length or amount of time
occurring between items, by the length of time between sequence presentation and beginning of recall, and by the
number of prior sequences learned (Murdock, 1974).
In tonal memory, interference for pitch appears to be generalized across all octaves (Deutsch, 1973), inhibited by
semantic content (Long, 1977), and enhanced by the interpolation of familiar melodies (Dowling, 1973). Memory
for tones also appears to be affected by attending to interpolations (Deutsch, 1971), by the similarity and
repetitions of tones within sequences (Deutsch, 1972, 1975) and by the placement of the test tone within an
interpolated sequence (Deutsch, 1975).
Although many studies have been concerned with memory perception of musical stimuli (Attneave & Olson,
1971; Cuddy & Cohen, 1976; Davies & Jenning, 1977, Davies & Yelland, 1977; Dowling, 1978; Dowling &
Fujitani, 1971) some controversy exists suggesting that memory for single tones is distinctly different than the
process which functions as a melodic gestalt for musicians (Davies, 1980; Taylor, 1980). Croonen (1991)
describes two experiments where a tonic triad was placed at various places through a series of tones and indicates
that the location of a tonic triad in a tone series is important for recognition. Tsuzaki (1991) examined the effects
of musical context to determine under what conditions melodies retain their gestalt properties and found that "the
significant effect of the starting tone in the diatonic condition suggests that the presentation of the diatonic scale
might have imposed a strong anchoring point."
Duration of sequences and time allowed for recall have also been associated with the ability to recall tonal entities
(Deutsch, 1972; Masssaro, 1970; 1971; Wickelgren, 1969) as has the length of sequences (Taylor, 1972), and the
file:///g|/poster1/Madsen.htm (1 of 8) [18/07/2000 00:29:23]

length of pause occurring between test tone and the beginning of interpolated material (Deutsch, 1978). Presently,
there are various theories of music memory that attest to the ongoing controversy in exploring this important
activity (Booth & Cutietta, 1991; Cutietta & Booth, 1995; Large, Palmer & Pollack, 1995; Tanguiane, 1994).
Additionally, appropriate methodology in investigating music cognition has also been advanced (Krumhansl,
1990).
In early music research by Collings (1966) college music majors were tested for retention of major/minor
melodies with duple and triple meter after one, two, three, four, and five interpolations of similar melodies.
Results indicated that even the slightest differences in melodies were differentiated and that subjects were better
able to discriminate a duple rhythmic background in combination with a harmonic minor scale. A similar study
was administered to non-music majors that yielded similar results (Madsen, et. al., 1969).
A large-scale study by Madsen and Staum (1983) assessed 400-college students' ability to recognize a target
melody after several other similar melodies were played. Results showed that
Subjects were able to discriminate effectively throughout the stimulus presentations with
an extremely high accuracy for those specific melodies that were identical to the test
melody. In addition, when melodies were identical except for slight modifications,
melodies presented in duple meter appeared less susceptible to interference than
melodies in triple meter or than melodies having modal changes. As would be expected,
accuracy declined across interpolated melodies; however, even after 8 interpolated
melodies, subjects recalled the test melody with at least 43% accuracy (Madsen &
Staum, 1983).
Robertson (1998) recently completed a replication of the Madsen and Staum study in assessing differences
between Asian subjects compared to subjects in the original study. Her results indicated that there were indeed
patterns of differences when comparing Chinese subjects with their United States counterparts with Chinese
subjects demonstrating greater accuracy for both the simple and duple meters. In another study examining the
effects of age, music experience, and style of musical stimuli on recall of transposed melodies the authors
concluded that both "age and experience effected different aspects of the task, with experience becoming more
influential when interference was provided." (Halpern, Bartlett & Dowling, 1995). Additionally, Boltz (1998)
investigated the relationship between an event's temporal (e.g., rhythm, rate, total duration) and nontemporal
information (e.g., sequence of pitch intervals) and found that the nature of encoding is strongly dependent on the
structure of environmental events and the degree of learning experience. Thus this entire line of research indicates
that more research should be done with various aged subjects as well as different levels of musical sophistication.
It was the purpose of the present investigation to assess the nature of forgetting (interference) in retaining melodic
sequences and to identify the characteristics of perceptual memory process for these same musical stimuli for
grade school students compared to adults. This study represents an extension of the Madsen & Staum (1983) and
the Robertson (1998) line of research using identical stimuli with the exception that all melodies were re-mastered
using current technology.
Method
Subjects
Sixty-two elementary aged students were randomly selected for the sample, ages ranged from 9-12 with a mean
age of 10.88. Elementary aged children attended a large university developmental research school. This school has
a long-standing policy of selecting students such that the total student body is representative of the greater
demographic distribution of a much larger community of approximately 250,000.
Sixty young adults, having had no private instruction and less than three years of formal group music study with a
mean age of 22.87 and 60 trained musicians (mean age 23.66) with a minimum of 10 years of formal individual
and/or group music instruction were selected from a large university. Untrained and musically trained adult
subjects were randomly selected from intact classes within this large university with a total population of 30,000;
the trained musicians were selected from the same institution's school of music, which has a total music
population of approximately 1,000.

Apparatus
All melodies were programmed using Opcode's Studio Vision Pro sequencer. The tone source was a Kawai
G-Mega using an acoustic piano patch. The tone module was set for equal temperament, A= 440hz. All MIDI
velocities were set to 64. Melodies were digitally recorded at a sample rate of 44.1 kHz, 16-bit stereo using Pro
Tools hard disk recording software. The results were burned to an audio compact disc.
Melodies
Melodies used in this study were identical to those used in both the Madsen & Staum (1983) and the Robertson
(1998) studies except that they were re-programmed using current technology (see above).
Four melodies were used for the study, each derived from a descending diatonic scale beginning and ending on
Eb. Each melody was altered in mode (major and minor) and meter (compound and simple) to produce in total,
sixteen different melodies modeled after Collings (1966). Thus, each of the four original melodies consisted of (a)
compound major (CM), (b) compound minor (cm), (c) simple major (SM), and (d) simple minor (sm). Each of the
sixteen melodies was used one time each as the test melody for the sixteen trial tests. These same sixteen melodies
were also used as distraction for each other.
Within each test trial, nine consecutive melodies were heard. The first always included the test melody in the first
position, while that same melody again appeared in one of the other eight recall positions throughout the trial.
Each of the eight recall positions was tested twice within the sixteen examples. Additionally, in half of the trials
the test melody also appeared in another recall position with a meter change only (R= rhythm) and in the other
half, with a mode change only (M= mode). All other melodies for each example were randomly selected from the
group of sixteen original melodies with the condition that no melody appeared twice within one example.
Additionally, the eight melodies that followed the initial melody were randomly determined. Thus, for every trial,
an initial test melody was played, the identical melody was played again, the same melody with a mode or meter
change was played, as were six other interpolated melodies. The melodies, each lasting 10-seconds in length, were
recorded with a 3 second pause between interpolated melodies and a 15 second pause between the end of one
example and the beginning of the next example. Therefore, from the beginning of the first melody to the end of
the last melody in each trial spanned a memory length of one minute, 54 seconds.
Procedures
A special answer sheet was developed for the elementary population and the regular music teacher did the testing.
Thus the procedure was nested within the natural environment of these youngsters. When subjects arrived in the
experimental room they were given an answer sheet with written instructions to be followed along with taped
directions. The task involved listening to the first melody, then comparing each additional melody with the very
first one by marking S (same as the first melody) or D (different from the first melody) in each of eight
consecutive boxes as each additional melody was played. Subjects were given three practice examples to complete
before testing was initiated. The duration of the test was approximately 30 minutes with an additional ten-minute
break after the first eight melodies. Subjects were randomly assigned to the two testing conditions: Half of the
subjects were given the first part of the test (examples 1-8) first, while the remaining subjects were first
administered examples 9-16. Two separate experiments were administered on subsequent days. In the first
experiment A the effect of simple and compound alterations was controlled by placing all compound major and
minor identical melodies in Recall Positions 2, 4, 6, or 8 and all simple major and minor identical melodies in
Recall Positions 3, 5, 7, or 9. In experiment B the effects of modal alterations were controlled by placing all major
simple and compound identical melodies in Recall Positions 2, 5, 6, or 9 and minor melodies in Positions 3, 4, 7,
or 8.
Adult populations were also tested on two successive days with the orders of the examples reversed for half of the
subjects on both experiments as described above.
Results
Responses were initially calculated by determining the percentage of individuals who perceived each melody as
identical to the initial test melody whether or not it was actually the same. In this manner each group was given an
overall percentage score across the various melodies whereby rank orders could be determined. Each group's

percentage scores of correct responses were then computed across the eight interpolated positions. Discrimination
across melodic interpolations was indeed apparent for each group and indicated that subjects discriminated
differentially across the 16 examples in each temporal position.
Statistical analysis indicated that there were differences across the three groups as would be expected (H = 12.40
p< .001). Percentages of total correct responses were significantly different across populations: 53 for the
children, 69 for untrained subjects and 74 for trained musicians. Even after 1 to 8 extremely similar melodies most
subjects were able to identify the original melody. Recall position 1 yielded 66 % accuracy for children, 89 % for
untrained adults, and 89 % for trained musicians; recall position 8 evidenced 39 % for children, 62 % for
non-trained adults, and 66 % for trained musicians.
Further nonparametric analyses indicated while there was not a significant difference for the adults on
experiments A versus B, there was a significant difference for the children.
Experiment A: Grade School
A rank ordering of responses for each placement yielded the highest ranks (first and second ranks) for
differentiation of the identical melody when it was in fact identical, in the temporal positions of 2, 4, 5, and 8. In
position 3, the identical responses were ranked first and third, for position 6, first and fifth, for position 7, first and
seventh and 9, second and sixth. It was also noted that in two positions a relatively greater percentage of subjects
perceived the recall melodies as the same when they were the identical melody with a mode or meter change. In
fact, a consistently higher trend of reporting "same" occurred for identical melodies with modal alterations than
with meter alterations.
Experiment B: Grade School
The most important finding for this group of this trial arrangement was that in every case the percentage of correct
responses was less than for the same melodies when presented in a different arrangement (experiment A above).
The rank order indicates that only in positions 2 and 8 did subjects choose the correct answers as their top choices.
In positions 3, 5 and 6 the correct responses were ranked first and third. In position 4 correct responses were
ranked as third and fifth, and in position 7 correct responses were ranked first and fourth (irrespective of ties).
Position 9 received the lowest correct response where correct responses were rated sixth and eleventh indicating
much greater interference.
Experiment A: Adult Nonmusic
It should be remembered that this group represented a replication of the original study (Madsen & Staum, 1983)
using current technology and re-mastered tapes. Results indicated almost identical scores in both Experiments A
& B. In positions 2, 3, 4, 5, 7, and 9 the top responses were always correct. In position 6 the highest response was
ranked first but the second response was ranked third. In position 8 the correct responses were ranked second and
forth.
Experiment B: Adult Nonmusic
Correct responses were ranked both first and second for positions 2, 3, 6,and 9, while all other positions were
ranked first and third.
Experiments A & B: Adult Musicians
Correct responses from the Adult Music Group were ranked in first and second places throughout the entire 32
trials.
While it is not surprising that musicians were better at this task, it is interesting that there was a great deal of
variability across the positions for all groups. For example, the collective correct response for the grade school
subjects was slightly more accurate on position 4 in experiment A than either the adult nonmusic group or the
musicians. While the musicians were better overall, the adult nonmusic group was better on the third position
when compared to the music group.

Discussion
Consistent with previous studies (Collings, 1966; Madsen, et. al., 1969; Madsen & Staum 1983; Robertson, 1998),
it appears that subjects are able to discriminate among melodies that are identical or very similar to an initial
melody over an extended period of time. The most important finding of this research indicates that even young
children have the ability to discriminate among highly similar melodies even with many interpolations. It should
be remembered that all 144 melodies were actually derived from one basic melody and all tested melodies were
extremely similar structurally.
While in previous research (Madsen, Staum, 1983) non-music majors adults correctly perceived 69% of the
melodies as being identical (even with 9 interpolated melodies) young children in the present study correctly
perceived 53%. In Experiment A the correct total for the children was 59% and dropped to 46% in Experiment B.
While students completed Experiment B on a different day, it is speculated that the previous listening for
Experiment A produced inattentiveness for the subsequent testing of Experiment B. Perhaps this drop in total
correct responses was entirely due to fatigue in that the correct responses for adults were almost identical for both
of the adult groups. Apparent differences among the three groups and within the two experimental orders are
interesting and may be important in future research, especially when the two tasks are considered separately. For
example, when correct responses are contrasted for Experimental Order A versus B only the children's group
demonstrated a significant difference between the two experimental configurations. Statistical analyses of these
data were purposefully conservative to avoid the "noise" inherent in such a large study. Further analyses might
tease out other relationships. Specifically, issues concerning relationships among the duple and compound meters
as well as the major versus minor relationships need further investigation.

It may be speculated that melodies are a form of "chunking" similar to that which has long been reported in verbal
memory (Miller, 1956). The musical "chunk" forms a comprehensible unit that is less vulnerable to interference
than unrelated single items. The degree of similarity of interfering stimuli, and the length or numbers of melodic
"chunks" retainable in a memory span, are still questions to be clarified in understanding melodic perception and
memory.
Implications of these data to music perception seem both important and interesting. As stated previously,
musicians seem to be called upon to exhibit much fine discrimination and the ability to remember various musical
events. Additionally, the apparent ability of professional musicians to remember literally thousands of separate
"tunes" seems to be a very special ability, especially since many of these melodies seem to be quickly learned
after just initial modeling. The present research suggests that the apparent ability of even unsophisticated
youngsters to remember melodies is highly developed. Obviously, more research in warranted.
References
Attneave, F., & Olson, R.K. (1971). Pitch as a medium: A new approach to psychophysical scaling. American
Journal of Psychology, 84, 147-166.
Baddeley, A.D. (1976). The Psychology of Memory. New York: Basic Books, inc.
Boltz, M. G. (1998). The processing of temporal and nontemporal information in then remembering of event
durations and musical structure. Journal of Experimental Psychology: Human Perception & Performance, 24(4),
1087-1104.
Booth, G.D., & Cutietta, R.A. (1991). The applicability of verbal processing strategies to recall of familiar songs.
Journal of Research in Music Education, 39(2), 121-131.
Croonen, W. (1991). Recognition of tone series containing tonic triads. Music Perception, 9(2), 490-498.
Collings, D.S. (1966). Principles of retroactive inhibition applied to melodic discrimination. (Unpublished
master's thesis, The Florida State University.)
Cuddy, L.L., & Cohen, J.A. (1976). Recognition of transposed melodic sequences. Quarterly Journal of
Experimental Psychology, 28, 255-270.
Davies, J.B. (1980). Memory for melodies and tonal sequences: A brief note. (Unpublished paper, University of
Strathclyde, Glasgow, Scotland.)
Davies, J.B., & Jennings, J. (1997). Reproduction of familiar melodies and the perception of tonal sequences.
Journal of the Acoustical Society of America, 61(2), 534-541.
Davies, J.B., & Yelland, A. (1977). Effects of two training procedures on the production of melodic contour in
short-term memory for tonal sequences. Psychology of Music, 5(2), 3-9.
Deutsch, D. (1979). Tones and numbers: Specificity of interference in short-term memory. Science, 168,
1604-1605.
Deutsch, D. (1972). Effect of repetition of standard and comparison tones on recognition memory for pitch.
Journal of Experimental Psychology, 93(1), 152-162.
Deutsch, D. (1973). Octave generalization of specific interference effects in memory for tonal pitch. Perception
and Psychophysics, 13(2), 271-275.
Deutsch, D. (1975). Facilitation by repetition in recognition memory for tonal pitch. Memory & Cognition, 3(3),
263-266.
Deutsch, D. (1978). Interference in pitch memory as a function of ear or input. Quarterly Journal of Experimental
Psychology, 30(2), 283-287.

Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive Psychology, 5(3), 322-337.
Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological
Review, 85(4), 341-354.
Dowling, W.J., & Fujitani,D.S. (1971). Contour, interval and pitch recognition in memory for melodies. Journal
of the Acoustical Society of America, 49(2), 524-531.
Halpern, A.R., Bartlett, J.C. & Dowling, W.J. (1995). Aging and experience in the recognition of musical
transpositions. Psychology & Aging, 10(3), 325-342.
Krumhansl, C.L. (1990). Tonal hierarchies and rare intervals in music cognition. Music Perception, 7(3), 53-96.
Large, E,W., Palmer, C. & Pollack, J.B. (1995). Reduced memory representation for music. Cognitive Science,
19(1), 53-96.
Long, P.A. (1977). Relationship between pitch memory in short melodies and selected factors. Journal of
Research in Music Education, 25(4), 272-282.
Madsen, C.K., Collings, D., McLeod, B., & Madsen, C.H., Jr. (1969). Music and language arts. (Paper presented
at NAMT Regional Convention, Atlanta.)
Madsen, C.K. & Staum, M.J. (1983). Discrimination and interference in the recall of melodic stimuli. Journal of
Massaro, D.W. (1970). Consolidation and interference in the perceptual memory system. Perception and
Psychophysics, 7(3), 153-156.
Massaro, D.W. (1971). Effect of masking tone duration on perceptual auditory images. Journal of Experimental
Psychology, 87(1), 146-148.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits of our capacity for processing
information. Psychological Review, 63, 81-97.
Murdock, B.B., Jr. Human Memory: Theory and Data. New York: John Wiley & Sons, 1974.
Robertson, B.J. (1998). Discrimination and interference in the recall of melodic stimuli by Asian students and
their spouses. (Unpublished Master's Thesis, Florida State University).
Taylor, J.A. (1972). Perception of melodic intervals within melodic context. (Doctoral dissertation, University of
Washington, 1971). Dissertation Abstracts International, 32, 6481A-6482A.
Taylor, J.A. (1989). Psychomusicology: Perceptual cognitive research with implications for the composer and
performer. (Paper presented at the College Music Society Annual Meeting, Denver.)
Tanguiane, A.S. (1994). A principle of correlatively of perception and its application to music recognition. Music
Perception, 11(4), 39-48.
Wickelgren, W.A. (1969). Associative strength theory of recognition memory for pitch. Journal of Mathematical
Back to index


Miura
Proceedings paper
A Conceptual Design of a CAI System for Basse Donnée in Harmony Theory

Masanobu MIURA, Tohru SHIMOISHIZAKA, Yumi SAIKI and Masuzo YANAGIDA
Doshisha University, Doshisha Women's College of Liberal Arts.
1. INTRODUCTION
Harmony theory and Counterpoint are the main subjects in music education for traditional music and even for most of contemporary
music. These subjects are based on the western classical music well established in 17-18 centuries. In most music courses of universities in
Japan, the harmony theory is taught using a standard but rather old textbook written by Yujiro Ikenouchi in 1964. Students are required to
learn the harmony theory at the beginning stage of their courses, where students mean not only those majoring in composition but also
those majoring in performance. Harmony structures of most prevailing music including popular music are more or less of the same origin
as Deutsch-Austerreich or Italian baroque music.
The contemporary harmony theory consists of so-called inhibition rules. So, students are required to understand individual inhibition rules
together with inter-relations among them. To learn harmony theory by oneself, however, is said to be difficult, because mutual
dependencies among rules are so complicated, though individual rules are not so difficult to understand. The most hard thing for students
is that they cannot judge by themselves the correctness of their answers to given bass or soprano parts because there is no exercise book
that gives all the allowable answers of the subjects given in it owing to abundance of possible solutions. So, they are forced to rely on their
instructors for judgements. Proposed here is a system that can judge whether student's answers violate any of the inhibition rules or not and
makes students study harmony theory by themselves. It is expected that studying with such a system being provided with appropriate
comments gives the same effects as studying under a tutor.
One of the approaches to realize such a system is to generate all the possible answers to given bass or soprano subjects. We have
constructed a proto-type system that works on a subset of inhibition rules implemented in algorithms. This system, however, is not suitable
for supplementing and modifying rules or pointing out errors in students' answers.
As conventional CAI systems usually employ knowledge base, our system also employs this kind of knowledge base. To express
inhibition rules in the harmony theory, conventional methods such as the IF-THEN scheme, frame representation or semantic networks
seem to be not suitable for this case. For example, if you try to express an exhibition rule using the IF-THEN scheme, the description of
the conditional part becomes huge and complicated. Proposed in the second half of this paper is a "Rule Unit" model for expressing
inhibition rules. A rule unit represents a single rule and a network consisting of unit rules represents the total system of inhibition rules in
the harmony theory.
2. AN OUTLINE OF THE HARMONY THEORY
Basse donnée is a task to assign upper three voices for given bass sequences observing the inhibition rules designated in harmony theory.
A sequence of upper three voices is regarded to be correct if it does not violate any inhibition rule. As the task of "Basse Donnée" doesn't
require any aesthetic evaluation, there can be several or a lot of allowable answers for a given sequence of bass. There is no exercise book
that gives all the allowable answers because hundreds of correct answers are possible for a given bass sequence. This difficulty forms a
high barrier for students' self-learning of the harmony theory. Some rules stated in a book are sometimes described in different forms in
other books, and some rules taught by a teacher are different from what others teach. Sometimes a teacher's personal or special rules are
inconsistent with rules described in textbooks. So, the structure of the rule system depends on textbooks or teachers and is very
complicated. In this paper, a rule commonly stated in all text books is denoted as "Rc", and a rule written in a textbook i but not included
in {Rc} is denoted as "Rbi", and a rule given by a teacher Tj but not included in {Rc} or {Rbi} is denoted as "Rtj". As music is a field of
art, the degree of requirement differs rule to rule. For example, some inhibition rules should be strictly observed but others are not. Some
inhibition rules written in a textbook are modified or neglected by some teachers. Consequently, Rcs, Rbis and Rtjs are not in simple
inclusion relation to one another, and students are embarrassed by the complexity.
3. A CONCEPTUAL DESIGN OF A TRAINING SYSTEM FOR HARMONY THEORY AND ITS IMPLEMENTATION
3.1 A CONCEPTUAL DESIGN
Described here is an imaginary system for learning and training the harmony theory mainly for beginners. The purpose of the system is not
to assign chords to a given melody, as many composition systems do, but to make student practice the inhibition rules in the harmony
theory. The targets of the system are not only beginners but also those who have mastered the rules at some degree and want to
systematize their knowledge. In case of latter half, it is desirable for the system to have facility to show all the possible chord progressions
accepting partially correct chord progression assigned by students. The system is expected to carry out the tasks of both "Basse Donnée"
and "Soprano Donnée". In case of basse donnée, for example, the system first provides a student a bass sequence with which the student
can practice a specific inhibition rule. The student is supposed to have some knowledge about the rule he/she is going to practice. The
system generates appropriate bass subjects employing a random number generator. The student's answer is fed to this system, and the
system judges whether the answer is allowable or not by referring to a set of registered rules. If the answer violates one or some rules, the
system is expected to point out the student's knowledge deficiency or misunderstanding of the rule concerned and provide supplemental
exercises concerning the rule he/she violates.
A brief conceptual function of the system is depicted in Fig.1. The knowledge base in Fig.1 means a set of inhibition rules written in a
file:///g|/poster1/Miura.htm (1 of 8) [18/07/2000 00:29:29]

Miura
certain textbook and/or given by a teacher. The main part of the system is the generator of allowable chord progressions based on a set of
inhibition rules.
Fig. 1 A conceptual scheme of a training system for "basse donnée"

3.2 IMPLEMENTATION
A proto-type system has been realized as a "BASSE DONNÉE SYSTEM" by implementing a set of rules into an algorithmic system, in
which chord progression is represented as a state transition network, as depicted in Fig.2, where nodes correspond to chords expressed by
chord ID without specification for inversion type, arrows mean possible chord progressions. The reason why algorithmic approach was
taken is pursuing processing speed and easiness of implementation in the beginning stage. The number of inhibition rules in introductory
parts of the harmony theory is about 40 and the number of nodes in the state transition network is only five, so whole the system is within
manageable size by algorithmic description. The current system can handle only triads and generate all allowable chord progressions and
note assignments. Using this system, users can check if their answer is included in allowable answers, but the system currently doesn't
have the facility to point out errors in users' answers.
Fig. 2 The transition network employed in BDS
4. USER INTERFACE OF THE CURRENT BASSE DONNÉE SYSTEM

4.1 DESCRIPTION OF THE SYSTEM
An example of windows display of the Basse Donnée system is shown in Fig.3. Inversion types, or position types, handled by the current
system are root position and the first and the second inversions. Examples of the rules coded in the current system are inhibition from

Miura
parallel progression, voice crossing, restrictions on inversion types and so forth, all the rules employed in the system belong to {Rc} or
{Rbi}. An example of a data format to describe a bass sequence to be fed to the system is shown in Fig. 4(a). This file contains key code
(19 = c moll, in this case), meter code (2 = 4/4, in this case), number of measures (6, in this case), number of notes in input bass sequence
(16, in this case) and description of each bass note (height, start-time and duration). The height of a note is represented in the
SMF(Standard MIDI File) format. Start time and duration are expressed by a unit obtained by dividing a quarter note into 480 equal time
durations.
Fig.3. Display of the Basse Donnée System

D : Dense Assignment, S : Sparse Assignment, ?: Non-standard Assignment,
Oct : Octave Assignment, : missing 3rd
I1: first inversion of the Tonic chord, usually represented as I6

Miura
19 key code (c moll)

2 meter code (4/4)
6 number of measures
16 number of input bass notes
60 0 480 heightã€€start_pointã€€duration
53 480 480
55 960 960
56 1920 480
50 2400 480
(a) Input data to the Basse Donnée (b) Description of the input format
System
(c) Score corresponding to the input data (a)
Fig. 4 The data format.
Receiving a bass sequence, the system searches allowable chord progressions observing the chord-function rules, chord-progression rules
and note assignment rules. As the system generates all the allowable chord progressions, you can select one of the chord progressions and
the system searches the allowable note assignments for upper three voices according to a chord progression chosen by users. According to
the chord progression ID chosen by users, the system begins to generate all the allowable assignments of the upper three voices for the
specified chord progression. The system, then, asks users to choose one of note assignment IDs and shows the sequence of upper three
voices of the specified note assignment ID on screen in a score format (top) together with MIDI code format (right top in lower half) as
shown in Fig.3. Users can specify what additional information are to be shown on the display besides standard score elements, such as
chord function, chord degree, inversion type, cadence type and allocation type, where "chord function" means one of Tonic(T),
Dominant(D) or Subdominant(S), "chord degree" is an expression of chords using interval numbers of the root note counted from the
Tonic. Superfix 1 or 2 representing inversion type is added to chord degree in case inversion is applied. Cadence type denoted by K1, K2
or K3 is determined every time when Tonic chord appears based on the foregoing chords, and allocation type expresses distinction among
standard/non-standard(?) assignment, Dense(D)/Sparse(S) for standard chords, Octave(Oct) assignment and missing third( ).
Click-Sensitive buttons are displayed with thick characters and marks, while insensitive buttons are made somewhat faded at each stage of
user input. Every operation is possible by clicking the mouse button. Users can listen to the chord progression as MIDI sound by clicking
the "Play" button. The system provides users facilities for validity checking of their answers to given bass sequences. Users can specify a
part of chord progression and make the system show all the allowable chord sequences, that match the partial chords specified by the
users. Exploiting the facility, users can check if their answers violate any of the rules without inputting all the notes in their answer to the
system.
4.2 PERFORMANCE EXAMPLE
Figure 3 shows a display example for a given bass exercise sequence in a textbook (IKENOUCHI, et al, 1964). First, a user is required to
input an exercise sequence by clicking "Input subject" button, and click the chapter number and the subject number afterwards, then the
system reads the bass sequence from data-files and displays it on screen as a subject for exercise. If the user immediately wants to see the
output results calculated by the system, he/she just clicks "Search chord progression" button. Then the system shows the total number of
possible chord progressions in the functional level, i.e. individual note assignment is out of the scope in this stage. Then, the user is asked
to choose a chord progression ID among those listed in a pull-down display. This is done in the input box for "chord progression ID" by
choosing one of the ID numbers shown in the box.
If the user wants to see the chord function assigned to each bass note, he/she put a mark in the box "Show chord function and cadence" and
if the user wants to know the chords in degree number together with inversion type, he/she puts a mark in the box "Show chord
progression". In case the user wants to check whether his/her answer is allowable or not, he/she is required to input some part of his/her
answers filling three input boxes labeled "Restriction on any voice on any chord" in the left on the screen to specify the position or the
bass note ID, its voice(soprano/alto/tenor) and note number in MIDI format. Clicking the "Search" button, the system begins to search all
the allowable note assignments under the specified condition, and it displays the total number of allowable note assignments in the "Note
assignment ID" box in the center-top. The user chooses an ID in the "Note Assignment ID" box, then the system shows the results as both
in score format in the top window and in MIDI code in "Details" box. The resultant chord progression can be confirmed through listening
to it by clicking the "Play" button.
Experiencing this system, a lazy student majoring in piano performance said that "It seems not necessary to study harmony theory as we
have got this system".

Miura
4.3 DEFECTS OF THE CURRENT SYSTEM
The current system has defects from a viewpoint of user interface. Users feel inconvenience when they input notes to the system as a bass
sequence or upper three voices, though the necessity to input notes occurs not so often. Even in case a user wants to specify partial chords
for a given bass, the number of notes to be fed to the system is, in general, less than 20 or so in ordinary cases. This difficulty will be
resolved by modifying the way of note input to follow the graphical user interface employed in conventional music software systems.
The current system has some substantial defects from a viewpoint of music besides inconvenience in user interface. First, rules for
jumping progression are not included in the rule system of the current system. Jumping progression is not allowed nor recommended in
general, but sometimes jumping progressions yield special effects depending on local chord progressions. An easy solution of this problem
is to loosen some restrictions on jumping progressions, but it should be optional. Next, the subset of all the inhibition rules implemented in
the current system does not include rules for D7(dominant seventh chords) and D9(dominant ninth chords), i.e., the current system can
handle only triads. The last is ordering in assigning ID numbers on the chord progressions and note assignments generated by the system.
At present, the system assigns IDs according to order of searching, in other words, in depth-first manner. This is not satisfactory way to
students as they feel difficulty in finding what they want on screen. A legitimate and systematic ordering is required.
5. A RULE UNIT MODEL
In the Basse Donnée System described in chapter 4, the rules are implemented in algorithms. So, it is difficult to supplement or modify
rules. Algorithmic implementation is not suitable for pointing out errors in answers given by students, and more over, it is not expected
that experts in harmony theory can make or modify computer programs. Proposed here to cope with the situations is a "Rule Unit Model",
a new scheme for representing the rules of harmony theory in a unified and normalized format and for facilitating mutually dependent
rules dynamically. Using this model, mutual relations among rules are represented as links among units in a network named a
"Rule-Network".
The Rule Unit Model is designed for representing rules in the harmony theory by rule units having the same structure independent of rule
types. An inhibition rule is commonly expressed as depicted in Fig.5. A rule unit has connection channels to other rule units. As shown in
Fig.5, a rule unit has two input channels (for I and Inv) and five output channels (for I', I'', I''', Inv and J). Structured data are transmitted
through connection channels as I, I', I'' or I'''. "Inv" denotes an invalidation cue to invalidate or kill other units. "J" means judge
information or output of the rule unit concerned. "H" denotes temporary halt of the rule unit.
Fig.5 Details of a Rule Unit

There are two meanings in connections among rule units; inclusion and invalidation. An example of "inclusion" relation is depicted in Fig
6(a), where "Rp" denotes a parent unit (or a parent rule) and "Rc1" through "Rcm" are its children units (or children rules).
In general, matching conditions required in Rp is looser than those required in any of Rck (1â‰¤kâ‰¤m), which treats detailed cases
under conditions described in Rp. So, rule Rp should be adopted only if none of Rck is adopted. For example, Rp is a rule concerning chord
progression of V-VI and Rc1 is a rule concerning V-VI-II, Rc2, V-VI-IV, where Rc1 and Rc2 are detailed rules under Rp, which should not
be adopted if either Rc1 or Rc2 is adopted.
"Invalidation" means that adoption of a rule kills other rule(s). We call a rule that invalidates other rule(s) Ri a "Killer rule" Rk as depicted
in Fig 6(b). An example of a killer rule is one concerning chord progression V-VI that invalidates a rule concerning progression between
root positions having no common notes. With adopting Rk, Ri is not necessarily observed, i.e. Ri should be invalidated.
Mutual relations among rules make it hard to extend the rule system implemented in algorithms. With the rule unit model, however, it
becomes much easier to construct the total system and extend its rule having mutual relations though the initial cost is high.

Miura
6. A CONCEPTUAL DESIGN OF A RULE NETWORK
Using rule units described in 5, complete relations among rules can be expressed as connecting links in a "Rule Network", whose nodes
correspond to rule units and links corresponds to data channels in which control information and structured data are transmitted. Mutual
relations among rules, such as parent/children or general/specific relations and killer/invalidated relations, are represented by control
information or description in structured data transmitted through channels among rule units in the network.
6.1 HIERARCHICAL RELATIONS AMONG RULES -GENERAL/SPECIFIC RELATIONS-
Examples of inter-relations among rule units are shown in Fig.6. Figure 6(a) depicts a relation between a parent rule Rp and its m child
rules Rc1 through Rcm, that are regarded as detailed or exceptional rules under the parent rule Rp. That is, conditions for child rules Rck
(1â‰¤kâ‰¤m) are more detailed than those for the parent rule Rp. The structure of Fig.6(a) is called a "Rule Tree" and a set of rules Rp
and Rc1 through Rcm form a "Rule Chain". Structured data are fed to a rule unit via a channel as information I consisting of chord
functions, a chord degree, tone height of four voices, inversion type and so forth for each given bass notes.
(b) Killer-Invalidated relation
(a)Parent/Children relation
Fig. 6 Typical local structures of inter-rule relations.
When a set of data enters a rule unit for the first time, the rule unit checks if the rule is to be adopted on the input data by filtering with
corresponding conditional description. In case the rule unit judges that it has to apply the rule to the input, the rule unit investigates
whether the rule has any child rules or not. In case the rule has no child rule, the rule is immediately applied to the input data, otherwise
the unit stores the number of the child rules in a counter in the structured data and the data are sent to the top rule unit Rc1 among the child
units. Rc1 investigates if it is the first time for the unit to receive the data. In case it is, the unit investigates if the data satisfy the conditions
of Rc1. In case of YES, the main rule of Rc1 is applied to the data ignoring the parent rule Rp and the rest of child rules Rc2 through Rcm.
As Rc1 is a child rule of Rp, the counter is reduced by unity to memorize the number of unfinished child units. Then, the system checks the
counter. In case the counter shows zero, it means that the rule unit just processed is the last child unit under a parent unit concerned, and
the data are returned back to the parent unit. In case the counter shows a non-zero value, the data are sent to the succeeding child unit until
no child unit is left. In case no child unit matches the data, the counter value is reduced to be zero and the data are sent back to the parent
unit. In this case the data enter the parent unit as the second time, then the rule Rp is immediately applied to the data.
The scheme described above realizes a smart control mechanism that only one single rule in a rule chain is adopted while other rules in the
chain is suppressed.
A rule unit outputs its resultant judgement J for the input data, but J is a binary variable at this moment, the system cannot express "degree
of goodness" of the chord progression given as its input. Quantitative evaluation of the "degree of goodness" is expected to be realized by
introducing an analog value for J according to degree of importance of the rule concerned. In this case, the value of J should be set large if
the data violate important inhibition rules. To implement this mechanism, a hierarchical control system to handle meta-rules for specifying
dynamic priority description among rule units and a simple threshold logic that rejects chord progressions in case the sum of all the analog
output J values from rule units exceeds a certain value.
6.2 INVALIDATION OF A RULE
There are some rules that should be ignored under some conditions. To realize such a situation, a control mechanism for invalidation
among rules is implemented in the rule unit system. A control sign Inv is assumed to be sent through a channel connecting rule units
concerned from the killer unit to the unit to be invalidated. The relation between a killer unit and an invalidated unit is depicted in Fig. 6,
where Rk denotes a killer unit and Ri, an invalidated unit. In case Rk is adopted, invalidation information is sent to Ri, resulting in
invalidation of Ri.
6.3 A RULE NETWORK

The system of inhibition rules in Harmony theory is expressed as a network consisting of rule units as nodes and control channels as links

Miura
among them. The total system of inhibition rules designed at the present stage is depicted in Fig.7, where each unit in (a) represents an
independent unit and (b) represents dependent unit groups. An independent unit is defined as a unit corresponding a rule that has no
mutual dependent relation to other rules, while a dependent unit has some relation to other rules. The relation may be the parent/child
relation as described in 6.1 or the killer/invalidated relation described in 6.2. Unit ID's in Fig.7 correspond to rules explained in Table 1.
Though the rule network system depicted in Fig.7 can handle harmony theory within the first and the second inversions of the basic triads,
rules concerning other chords such as dominant seventh and dominant ninth can be supplemented to the system in the same way as the
basic triads.
(a) Independent Unit group (b) Dependent Unit group

Fig. 7 The current Rule Space represented with Rule Units within triads.
Rn: Rule Unit corresponding to Rule n in Table 1.
Table 1 Details of the Rules implemented in the current system.

Rule ID Rule name
R01 Ranges of voices
R02 Voice crossing
R03 Distances between voices
R04 Jumping progression
R05 Parallel octaves
R06 Parallel prime
R07 Parallel fifth
R08 Hidden fifths and octaves
R09 Hidden prime
R10 Position(root position)
R11 Connection between root positions|having common notes|
R12 Connection between root positions|having no common notes|
R13 2nd chord|root position|->5th chord|root position|
R14 2nd chord|root position|->5th chord|first inversion|
R15 5th chord|root position|->6th chord|root position|
R16 4th chord|root position|->2nd chord|root position|
R17 5th chord|root position|->6th chord|root position|->5th chord|root position|
R18 5th chord|root position|->6th chord|root position|->2nd chord|root position|
R19 Cadence
R20 Position|first inversion|

Miura
R21 Connection between root position and first inversion|have same notes|
R22 Connection between root position and first inversion|don't have same notes|
R23 Connection between first inversions
R24 2nd chord|first inversion|->5th chord (root position)
R25 2nd chord|first inversion|->5th chord|first inversion|
R26 Any chord ->2nd chord|first inversion|
R27 Any chord->2nd chord|first inversion, root note high, and sparse|
R28 5th chord (first inversion, having two 6th notes) -> any chord (root position, having any 5th note)
R29 Position|second inversion|
R30 Connection between root position and second inversion
R31 Connection between first position and second inversion
R33 1st chord(second position)->5th chord functions as Dominant
R34 1st chord ->5th chord (second position)->1st chord|root position|
R35 1st chord|root position|->5th chord (second position)->1st chord
R36 1st chord ->6th chord (second position)->1st chord
R37 1st chord|second position|->5th chord
R38 2nd chord or 2nd chord|root position|->1st chord|second position|->5th chord
R39 2nd chord|root position, root note high, and octave position|->1st chord|second position|->5th chord
R40 4th chord or 4th chord|root position|->1st chord|second position|->5th chord
7. CONCLUSIONS
Described in the first half of the paper is a design and user interface of a system that can algorithmically generate all the allowable note
assignments to allowable chord progressions for a given bass sequence. The system is also used to check whether a sequence of chord
progression with concrete note assignments given by a student is allowable or not, referring to a definite set of inhibition rules in Harmony
theory within triads. Functions and user interface are described in details showing some example of its performance.
A rule unit model is proposed as a scheme for representing the system of inhibition rules in the harmony theory, demonstrating that it can
describe even complicated rules including those having exceptions and those having invalidation relations. The proposed rule unit system
will replace the functional basis of the current "Basse Donnée System" that can generates all the allowable chord progressions for given
bass sequences. The system is currently implemented on an algorithmic scheme, but will be reconstructed using the rule unit model
proposed in the second half of the present paper. A driving mechanism of the rule unit model has been constructed and a student model is
in conceptual design stage.
Acknowledgement
This work is partly supported by Grant-in-Aid for Scientific Research (No. 10680395) of Ministry of Education, Science and Culture,
Japan.
REFERENCES
IKENOUCHI, T., & SHIMAOKA, J., et al. (1964) Harmony Theory -theory and practice-. Ongaku no tomo-sha, (in Japanese)
Back to index

THE EFFECTS OF BACKGROUND MUSIC UNDER THE PERSP...VANT SPEECH EFFECT AND CONTEXT-DEPENDENT MEMORY
Proceedings paper
THE EFFECTS OF BACKGROUND MUSIC ON MEMORY UNDER THE PERSPECTIVE OF

IRRELEVANT SPEECH EFFECT AND CONTEXT-DEPENDENT MEMORY
Daniel Müllensiefen, University of Hamburg, Department of Systematic Musicology
1. Introduction:
The effects of background music have often been investigated in the past. For descriptions summarising
empirical studies see for example Uhrbrock (1961), Lundin (1985), de la Motte-Haber (1996), Rösing (1998)
or Behne (1999). From the 1930s to last century's sixties, there was a major interest in how music could
enhance manual activity and how it could therefore be employed as a means to improve productivity in
industry production (e.g. Wyatt & Langdon, 1937; Kerr, 1945, 1946; Smith, 1947). Since the 1960s the focus
of interest shifted towards the question how music in the background affects cognitive activities (e.g.
Greenberg & Fisher, 1971; Zimmer & Brachulis-Raymond, 1978; Davidson & Powell, 1986; de la
Motte-Haber & Rötter, 1990; Rauscher, Shaw & Ky, 1993; Cockerton, Moore & Norman, 1997).
Nevertheless, in his recent meta-analysis of over 100 empirical studies, Klaus-Ernst Behne showed, that it is
impossible to ascribe any general positive or negative effect or even any effect on ongoing primary activities
to background music at all (Behne, 1999). He assumes that the magnitude and the direction (positive or
negative) of the effect of background music are dependent on factors such as the relevance of the primary
activity to the subject, the musical preferences or the ability to integrate the music in the actual situation.
While this might hold true as a general rule of thumb, it still can not be taken as a precise cognitive theory
about how background music is processed and what effects it yields in defined situations. Surprisingly, the
vast majority of empirical studies on background music pay little or no attention to the underlying cognitive
mechanisms of the processing of background music (e.g. Carlson & Hergenhahn, 1967; Greenberg & Fisher,
1971; Zimmer & Brachulis-Raymond, 1978; Sticht, 1983; Davidson & Powell, 1986; Hagemann &
Schürmann, 1988; Drewes & Schemion, 1992; Rauscher, Shaw & Ky, 1993; Vinh, 1994; Cockerton, Moore
& Normann, 1997). Many of these studies only report the success or failure to detect an effect of music on
the primary cognitive activity, few try to explain it in terms of a cognitive theory.
Thus, I explicitly chose two theoretically and empirically well developed cognitive concepts to examine the
effects of background music on memory. These theories are the so-called irrelevant speech effect and the
concept known as context-dependent memory. Within their respective theoretical frameworks, recent
experiments showed the dependence of memory performance on the physical structure of background music.
Nevertheless, is not the focus of this paper to present another replication of these results. Rather the principal
question of the experiment described here is, whether these effects can also account for memory performance
under the influence of background music in a natural, everyday learning context
Since the two theories are rarely discussed in the context of background music, I would like to begin with a
short characterisation of both theories before describing an experiment with background music combining
the two theories.
2. The Irrelevant Speech Effect

The irrelevant speech effect (ISE) is an effect of interference that is observed in tasks involving some kind of
working memory. It has been described by Alan Baddeley et al. in connection with his model of working
memory (e.g. Salamé & Baddeley, 1982, 1987, 1989; Baddeley, 1986; Baddeley, 1997).
file:///g|/poster1/Mullensi.htm (1 of 10) [18/07/2000 00:29:31]

Generally speaking, irrelevant speech effect means an impairment in serial short-term memory tasks that is
caused by simultaneous background speech or noise, hence irrelevant to the primary task. This impairment is
also observed when task items are presented visually, hence the ISE is not due to acoustic confusion.
In general terms the ISE might be characterised as follows:
● below 90 dB, the magnitude of the effect is independent of the volume of background speech (Colle,
1980; Salamé & Baddeley, 1983; Salamé & Baddeley, 1987)
● the magnitude of the effect is independent of the meaningfulness of the background speech and even
of the language presented in the background (Colle & Welsh, 1976; Salamé & Baddeley, 1982;
Salame & Baddeley, 1987; Jones, Miles & Page, 1990)
● Generally, the ISE is believed to be observed only in serial recall experiments (Salamé & Baddeley,
1990) but LeCompte (1994) and Beaman and Jones (1997) also found it in free recall-, recognition-
and pair associate-experiments.
● most researchers locate the ISE in the retention or rehearsal phase of the memory process, but recent
results (Meiser, 1997) proved that it can be found in the encoding phase as well.
The ISE is an experimentally well investigated phenomenon. Within the above mentioned experimental
restrictions, it can be replicated reliably.
At the moment, there are two competing interpretations of the ISE.
One interpretation within the scope of Baddeley's model of working memory explains the ISE as two
processes recurring on the same cognitive resource (e.g. Baddeley, 1986, p. 89):
Acoustically presented speech-like sounds gain automatic access to the so-called phonological loop in
working memory. The phonological loop is the place where verbal information is stored for short periods of
time. The verbal target items of the memory task are stored in the phonological loop as well. But since the
capacity of the loop is restricted, less items can be stored there when the irrelevant, acoustically presented
material accesses the loop at the same time. Thus a reduction of the recalled number of items is observed in
the retrieval phase. According to this interpretation, the magnitude of the effect depends on phonological
similarity between the target items and the irrelevant background material (e.g. Gathercole & Baddeley,
1993, p. 13).
A different interpretation of the ISE is given by Dylan Jones with the 'changing-state hypothesis' within his
Object-Oriented Episodic Record-Model of working memory. According to Jones' model (e.g. Jones, 1993),
the items of a serial recall experiment which the subjects are supposed to remember form a stream of objects
that is stored in an amodal short-term store (blackboard). In this store, the stream can be rehearsed and thus
the serial informations of the items remain intact. If a second stream of objects enters the blackboard, both
streams are disturbing each other and the serial information of the events might be lost. In other words,
according to this interpretation the magnitude of the effect depends on the similarity between neighbouring
objects within one stream. This means if the adjacent events of the irrelevant stream are very dissimilar, the
result will be a stream with clearly noticeable serial informations. Such a sequence disturbs the rehearsal of
the target items to a large degree. The characteristic feature of such a stream is called 'changing-state' by
Jones et al.
Despite numerous experiments in the last years, the discussion over both interpretations has not yet come to
an end. According to some recent studies (Klatte, Kilcher & Hellbrück, 1995; Meiser, 1997), however, the
changing-state criterion proved to be a useful experimental variable, which - in combination with Baddeley's
working memory model - could account for a broader range of experimental data than either one of the
theories.
In recent years there have been a handful of studies carried out within the ISE experimental framework using
music instead of speech as irrelevant acoustic background material. The results of these experiments can be
summarised as follows:

● sine tones and real music can disrupt memory processes to the same degree as spoken syllables (Jones
& Macken, 1993; Klatte & Hellbrück, 1993).
● with music, perceptual factors of higher degrees (like distraction or auditory streaming) seem to
moderate the ISE (Nittono, 1997).
● instrumental as well as vocal music can produce the ISE (Klatte & Hellbrück, 1993; Klatte, Kilcher &
Hellbrück, 1995), although instrumental music might cause a smaller effect (Salamé & Baddeley,
1989).
● the changing-state criterion seems to be useful to predict memory disruption by musical stimuli
(Morris, Jones & Quayle, 1989, Klatte, Kilcher & Hellbrück, 1995).
All in all, to produce the ISE by use of music, the changing-state criterion seems to be significant. For
musical sounds, changing state might be defined as pronounced segmentation in the auditory stream
produced by sudden energy transitions in the acoustical spectrum that are not regular (c.f. Klatte, Kilcher &
Hellbrück, 1995). This criterion is applied in the experiment described later.
3. Context-dependent memory
The notion of a context-dependent memory is related to the daily experience that memory performance is
enhanced when the situation of retrieval of information resembles the one of learning. The starting-point for
the investigation of context-dependent memory (CDM), with the external surrounding defined as context,
may have been an experiment by Godden and Baddeley (1975). Scuba divers were asked to learn simple
words on land or underwater and to recall these words later in either of these conditions. The divers who had
the same conditions during learning and retrieval showed a far better memory performance. This superior
memory performance under matching learning and retrieval conditions is defined as the CDM-effect.
In the last 25 years, many studies have been carried out within this theoretical framework using a variety of
contexts such as different rooms, indoor vs. outdoor or even the presence or absence of the smell of
chocolate (Schab, 1990). Steven Smith (1988) gives an overview of the studies conducted until the late
eighties.
Unfortunately, the phenomenon of context-dependent memory proved to be rather unreliable. For example,
Fernandez and Glenberg (1985) failed to produce a CDM-effect under the exact same conditions that Smith
used several years earlier (Smith, 1979) to demonstrate a CDM-effect between different rooms. According to
writers summarising the CDM-literature, this unreliability is one of the aspects that is hardest to explain
about CDM (e.g. Eich, 1985; Smith, 1988; Thompson & Davies, 1988; Roediger & Guynn, 1996).
Still, CDM-effects have been observed using music as a context stimulus in memory experiments.
The conclusions that can be drawn from these 'music-dependent memory' experiments are the following:
● music may serve as a context stimulus to produce significant effects in a CDM experimental design
(Smith, 1985; Balch, Bowman & Mohler, 1992; Thaut & de l'Etoile, 1993; Balch & Lewis, 1996).
● the musical tempo seems to be the decisive parameter to make the music be 'felt' the same by the
subjects (Balch & Lewis, 1996).
● the so-called mood-mediation-hypothesis (Eich, 1995) may serve as a theoretical explanation for the
musical CDM. According to this hypothesis the different musical tempo might induce different moods
(or at least different levels of arousal) in the subjects which make them perceive the musical selections
as different (Balch & Lewis, 1996).
According to these conclusions, it might be expected that a CDM-effect will occur when two otherwise
identical musical stimuli only differ in tempo.

4. The experiment
As described in the introduction, this experiment aims partly at revealing the mechanisms that account for the
effects of background music while subjects are engaged in a short-term memory task. Two candidate
mechanisms that describe the relation between the physical structure of music and memory performance
have been found in the literature. In the case of the irrelevant speech effect, clear and irregular segmentation
of the musical stream is thought to impair the verbal short-term memory. The CDM-effects found in the
study of Balch and Lewis (1996) are caused by different tempos in the musical background selections.
All the effects described above were observed in typical laboratory settings where most of the variables
relevant for memory performance were controlled within the experimental design. Unlike the mentioned
studies the second aim of this experiment is to replicate the two effects in a so-called field experiment that
comes much closer to an everyday-like learning situation in terms of its experimental conditions than the
typical laboratory setting. That way, I hope to answer the question whether the two effects can be useful in
explaining short-term memory performance in everyday life.
Experimental Design:
The experimental design was developed so that both effects - the ISE and the CDM - could be tested in one
experiment. The design comprised five conditions. In four of them music was played while the subjects were
presented with the target items and during recall. The fifth condition had no music during learning and recall
phase. Two versions of the same musical piece were used. One had a clear changing-state character and was
considerably faster (CS-Version), while the other one was a slower version with presumably less changing
state to it (Non-CS-Version). The five experimental conditions followed the permutated design shown in
figure 1:
Experimental condition Music during presentation Music during recall
A CS-Version CS-Version
B CS-Version Non-CS-Version
C Non-CS-Version CS-Version
D Non-CS-Version Non-CS-Version
E Silence Silence
After the experiment the subjects were asked to fill in a questionnaire that contained questions about their
demographic data, musical preferences and customs concerning learning with music in the background. In a
closing discussion after the experimental session subjects were asked about their subjective impression of the
task, the experimental music and learning customs in general.
Task and material:
In the first phase of the experiment subjects were asked to memorise 25 simple German words that were
taken from a standardised test of amnesia (Metzler, Voshage & Rösler, 1992) while they were listening to
one of the musical versions in the background. Every word was presented for five seconds. After presenting
the 25 words for the first time the same words were shown again but in a different order. This was done with
the intention to extend the time of exposure to the music to about 250 seconds. The presentation of the words
was followed by a short break of two minutes during which a completely distinct musical piece was played
to prevent the subjects from remaining in the mood state induced by the first background music.
After the break, the subjects were asked to write down all the words they could remember in any order while
one version of the experimental music was played in the background. The recall phase lasted as long as the
presentation phase.

The music played in both cases was the Praeludium No. 8 of the first part of the Well-Tempered Clavier by
J.S. Bach.
For the CS-Version, an interpretation by Glenn Gould which is characterised by pronounced staccato playing
was used. This version of the praeludium was sped up with a digital audio editor without changing the pitch.
The Non-CS-Version was an interpretation of the same praeludium by Friedrich Gulda which is marked by
extensive use of the pedal and played legato. It is this difference in articulation between Gould's and Gulda's
interpretation that defines the changing state criterion in this experiment. This feature turned out to be
sufficient to produce the ISE in a similar experiment with real music described by Klatte and colleagues
(Klatte, Kilcher & Hellbrück 1995, Exp. 4). Gulda's interpretation was slowed down considerably (with the
pitch remaining the same), so that the duration of the CS-Version was only 33% of the Non-CS-Version. The
intensity of the music was controlled by an analogue Minuphone sound level meter and ranged between 55
and 85 dB(A) for both versions following the natural intensity changes of the music.
This design is obviously much closer to the experimental procedure used by Balch and Lewis in their 1996
CDM-study than to the ordinary ISE-experiment. Unlike the experimental conditions that proved to reliably
produce the ISE, free recall was used instead of serial recall, the target items were whole (but simple and
short) words, not numbers nor single letters, and the duration of presentation, break and recall phase was
much longer than the usual 10 seconds. All these changes meant to create a learning situation coming much
closer to a real-life condition than a laboratory setting. The intention of this study was not to simply produce
the ISE, but rather to find it in a situation that is very similar to conditions under which people really learn,
like for example in a class room. In any case, the possibility to find an ISE under these modified conditions
cannot be precluded. LeCompte, for instance, showed that the ISE can be detected under certain conditions
with the free recall-paradigm as well (LeCompte, 1994), and according to a classroom study by Hygge
unpredictable noises with irregular fluctuations in the noise level can impair complex learning tasks with
longer experimental durations (Hygge, 1993).
Subjects and location:
279 German school children participated in the experiment. 97% of the subjects aged 11 to 13. All of them
attended "Gymnasiums", the most demanding school form in German higher education. The subjects were
tested in groups of about ten people. The experiment was carried out in their own classrooms which
represented the most common learning environment to the subjects.
Results:
The results of the five experimental groups were expected to show an inhibiting effect of the CS-version of
the musical selection and enhancing effect if the same music was played during learning and recall
(conditions A, D and E). Thus group D (non-CS-version during learning and recall) or group E (silence in
both situations) were supposed to attain the highest scores, while groups B and C (CS-version in one
situation and non-CS-version in the other one) were expected to show the worst memory performance.
Figure 2 shows the actual mean number of words correctly recalled for the five experimental groups.
Figure 2
Condition mean no. of correct responses standard deviation

A 16,27 3,81
B 17,26 3,76
C 16,08 3,45
D 16,13 3,95
E 16,17 3,69

A two-factor (2 music conditions while learning × 2 music conditions while recall) analysis of variance
(ANOVA) was performed. Neither one of the main factors was significant, F(1, 273) = 2.19, p=0.14 for the
learning condition and F(1, 273) = 1.47, p=0.227 for recall, nor the interaction effect between the two
factors, F(1, 273) = 1.21, p=0.27.
Although the mean score of group B was significantly higher on the 5%-level according to one-tailed t-tests
than the scores of all other groups, a one-way ANOVA with the five groups as steps of the factor revealed no
significant F-value, F(4, 273) = 1.46, p=0.21. This result was confirmed by the non-parametrical
Kruskal-Wallis-Test that was computed along with the one-way ANOVA, p=0.27. T-tests for the mean
differences of other combinations of two groups did not reach an acceptable level of significance.
The results were controlled by some of the variables that were examined on the questionnaire. Only two
variables exhibited a significant relation with the results of the memory test.
Firstly, those pupils who marked on the questionnaire that they had been disturbed by the music during the
experiment had significantly worse test results (ANOVA: F(2, 204) = 3.259, p = 0.04). At the same time,
those subjects saying they never listened to classical music felt disturbed by the experimental music to a
significantly greater portion than those pupils who listened to it sometimes or regularly (Chi-Square-Test
after Pearson: value: 14.053; df: 4; two-tailed significance: 0.007).
Secondly, a significant relation between the subjects' sex and their memory performance was found. Girls
remembered a mean of 1,77 words more than boys. This led to a highly significant p-value of < 0.000 in
Mann-Withney's U-Test. As the proportion between girls and boys differed in the five experimental groups, a
multiple regression analysis was performed with five dichotomic variables (sex and four dummy-variables
that represented the experimental conditions). Only the variables sex and the experimental condition turned
out to have significant beta-weights (sex: standardised beta=0.248, p<0.000; condition B: standardised
beta=0.153, p=0.34). The corrected R2-coefficient for the global regression model was only 0.058. Thus, it is
not possible to explain the test results only by means of sex and experimental conditions. The distribution of
the standardised residuals of the regression model approximated the normal distribution. So no hint was
found in the data for other systematic influence factors that might have been overlooked in the regression
model.
5. Discussion:
First of all, the better results of the girls can be explained easily. A female advantage of access to and use of
elaborate memory strategies until the early adult age has frequently been reported in the literature on memory
development (e.g. Cox & Waters, 1986; Waters & Schreiber, 1991; Jones et al., 1996).
With the exception of group B, all groups achieved almost the same mean results in the memory test. The
better performance of group B was completely unexpected, because the different versions of the music
during learning and recall should have impaired results according to the CDM. Additionally, the CS-version
in the learning phase should have had an detrimental effect as well. It is very unlikely to assume that the
subjects of group B simply had a better memory, because group B consisted of 62 subjects that came from
four different classes of three different schools. The other experimental groups consisted of subjects that
were similarly distributed over different classes and schools. Hence, differences in memory performance
between classes should have levelled out in the five experimental groups.
So generally speaking these results allow the conclusion that the two effects did not play an observable role
in this experiment.
They may have been overridden by other factors that are more important when real music is used instead of
artificial sounds and learning takes place outside a psychological laboratory. Two candidate factors in this
experiment might have been the musical preferences and the use of mnemonic strategies.

The statistically significant relationships between the low frequency of listening to classical music, the
qualification of the experimental music as disturbing and inferior memory performance favour the
interpretation that musical preferences may have played a crucial role in this experiment.
In most of the closing discussions after the experimental sessions, Subjects expressed that they used some
kind of mnemonic technique to commit the words to memory. This was of course favoured by the retrieval
mode of free recall and by using meaningful words as target items. The significantly better performance of
the girls who supposingly made use of their developmental advantage in this respect (see above) is an
indirect proof of the important role of elaborate memory strategies.
Of course, it is not known if these factors are indeed responsible for overshadowing the predicted memory
effects. But fact is that these assumed influence factors are not or only very poorly integrated in the
theoretical concepts of the ISE and CDM: Musical preferences are not even mentioned in both theories.
Memory techniques do not play a role in the ISE-concept due to the serial rehearsal strategy which is always
presupposed (apart from telephone numbers serial rehearsal is used in very few real-life situations). In the
CDM-theory only the relation of target items with environmental cues is theorised as memory technique.
In summary, the failure to produce the desired memory effects and the assumed influence of factors outside
the theoretical framework lead to the conclusion that the irrelevant speech effect and context-dependent
memory alone are of little importance in describing memory performance under the influence of background
music outside the psychological laboratory.
Influential factors such as attitudes and preferences with respect to background music in general, preferences
concerning the chosen experimental music, and memory techniques employed by the subjects should be
controlled in a future experiment. The question whether and how the physical structure of music affects
memory processing may probably be asked again when at least the above mentioned factors are controlled
within the experimental design.
References
Baddeley, A. (1986). Working memory. Oxford: Oxford University Press.
Balch, W., Bowman, K. & Mohler, L. (1992). Music-dependent memory in immediate and
delayed word recall. Memory & Cognition 20 (1), 21-28.
Balch, W. & Lewis, B. (1996). Music-dependent memory: the roles of tempo change and mood
mediation. Journal of Experimental Psychology: Learning, Memory, and Cognition 22,
1354-1363.
Beaman, C. & Jones, D. (1997). Role of serial order in the irrelevant speech effect: tests of the
changing state hypothesis. Journal of Experimental Psychology: Learning, Memory, and
Cognition 23, 459-471.
Behne, K.-E. (1999). Zu einer Theorie der Wirkungslosigkeit von (Hintergrund-)Musik. In K.-E.
Behne, G. Kleinen & H. de la Motte-Haber (Eds.). Musikpsychologie Bd. 14. Göttingen:
Hogrefe. pp 7-23.
Carlson, J. & Hergehahn. (1967). Effects of rock-n-roll and classical music on learning of
nonsense syllables. Psychological Reports 20, 1021-1022.
Cockerton, T., Moore, S. & Norman, D. (1997). Cognitive test performance and background
music. Perceptual and Motor Skills 85, 1435-1438.
Colle, H. (1980). Auditory encoding in visual short-term recall: effects of noise intensity and
spatial location. Journal of Verbal Learning and Verbal Behaviour 19, 722-735.

Colle, H. & Welsh, A. (1976). Acoustic masking in primary memory. Journal of Verbal
Learning and Verbal Behaviour 15, 17-31.
Cox, D. & Waters, H.S. (1986). Sex differences in the use of organization strategies: a
developmental analysis. Journal of Experimental Child Psychology 41, 18-37.
Davidson, C. & Powell, L. A. (1986). The effects of easy-listening background music on the
on-task-performance of fifth-grade children. Journal of Educational Research 80 (1), 29-33.
Drewes, R. & Schemion, G. (1992). Lernen bei Musik: Hilfe oder Störung? In K.-E. Behne, G.
Kleinen & H. de la Motte-Haber (Eds.). Jahrbuch der Deutschen Gesellschaft für
Musikpsychologie Bd. 8, Wilhelmhaven: Noetzel. pp 46-66.
Eich, E. (1985). Context, memory, and integrated item / context imagery. Journal of
Experimental Psychology: Learning, Memory, and Cognition 11, 764-770.
Eich, E. (1995). Mood as a mediator of place dependent memory. Journal of Experimental
Psychology: General 124 (3), 293-308.
Fernandez, A. & Glenberg, A. (1985). Changing environmental context does not reliably affect
memory. Memory & Cognition 13, 333-345.
Gathercole, S. & Baddeley, A. (1993). Working memory and language. Hove, Hillsdale:
Godden, D. & Baddeley, A. (1975). Context-dependent memory in two natural environments:
on land and underwater. British Journal of Psychology 66, 325-331.
Greenberg, R. & Fisher, S. (1971). Some differential effects of music on projective and
structured psychological tests. Psychological Reports 28, 817-818.
Hagemann, H.W. & Schürmann, P. (1988). Der Einfluß musikalischer Untermalung von
Hörfunkwerbung auf Erinnerungswirkung und Produktbeurteilung: Ergebnisse einer
experimentellen Untersuchung. Marketing ZFP 4, 271-276.
Hygge, S. (1993). Classroom experiments on the effects of noise on long term recall and
recognition in children aged 12-14 years. In A. Schick (Ed.). Contributions to psychological
acoustics. Oldenburg: Bibliotheks- und Informatiopnssystem der Universität Oldenburg. pp
627-641.
Jones, D. (1993). Objects, streams, and threads of auditory attention. In A. Baddeley & L.
Weiskrantz (Eds.). Attention: selection, awareness, and control. Oxford: Claredon Press. pp
87-104.
Jones, D. & Macken, W. (1993). Irrelevant tones produce an irrelevant speech effect:
implications for phonological coding in working memory. Journal of Experimental Psychology:
Learning, Memory, and Cognition 19, 369-381.
Jones, D., Miles, C. & Page, J. (1990). Disruption of proofreading by irrelevant speech: effects
of attention, arousal or memory?. Applied Cognitive Psychology 4, 89-108.
Jones, M., Yokoi, L., Johnson, D., Lum, S., Cafaro, T. & Kee, D. (1996). Sex differences in the
effectiveness of elaborative strategy use: knowledge access comparisons. Journal of
Experimental Child Psychology 62, 401-409.
Kerr, W. (1945). Experiments on the effect of music on factory production. Applied
psychological monographs 5, 1-40.

Kerr, W. (1947). Effects of music on factory production. Applied psychological monographs 5,

127-132.
Klatte, M. & Hellbrück, J. (1993). ‚Der ‚Irrelevant Speech Effect': Wirkungen von
Hintergrundschall auf das Arbeitsgedächtnis. Zeitschrift für Lärmbekämpfung 40, 91-98.
Klatte, M., Kilcher, H. & Hellbrück, J. (1995). Wirkungen der zeitlichen Struktur von
Hintergrundschall auf das Arbeitsgedächtnis und ihre theoretischen und praktischen
Implikationen. Zeitschrift für Experimentelle Psychologie Bd. XLII (Heft 4), 517-544.
LeCompte, D. (1994). Extending the irrelevant speech effect beyond serial recall. Journal of
Experimental Psychology: Learning, Memory, and Cognition 20, 1396-1408.
Lundin, R. (1985). An objective psychology of music. Malabar, Florida: Krieger.
Meiser, Th. (1997). Arbeitsgedächtnis und Changing-State-Hypothese. Dissertation Universität
Heidelberg.
Metzler, P., Voshage, J. & Rösler, P. (1992). Berliner Amnesietest (BAT). Göttingen: Hogrefe.
Morris, N., Jones, D. & Quayle, A. (1989). Memory disruption by background speech and
singing. In E.D. Megaw (Ed.). Contemporary Ergonomics 1989.. London: Taylor & Francis,
494-499.
Motte-Haber, H. de la (1996). Musik im Hintergrund. In H. de la Motte-Haber (Ed.)Handbuch
der Musikpsychologie. 2. edition. Laaber: Laaber. pp 215-256.
Motte-Haber, H. de la & Rötter, G. (1990). Musikhören beim Autofahren. Frankfurt a.M.: Peter
Lang.
Nittono, H. (1997). Background intrumental music and serial recall. Perceptual and Motor Skills
84, 1307-1313.
Rauscher, F., Shaw, G. & Ky, K. (1993). Music and spatial task performance. Nature, 365. 611.
Rauscher, F., Shaw G. & Ky, K. (1995). Listening to Mozart enhances spatial-temporal
reasoning: towards a neurophysiological basis. Neuroscience Letters 185, 44-47.
Rauscher, F. (1999). Prelude or requiem for the ‚Mozart effect'?. Paragraph III. Nature 400,
827-828.
Roediger, H. & Guynn, M. (1996). Retrieval processes. In E.L. Bjork & R. Bjork (Eds.)
.Memory. San Diego: Academic Press. pp 197-236.
Rösing, H. (1998). Musikgebrauch im täglichen Leben. In H. Bruhn & H. Rösing (Eds.).
Musikwissenschaft: Ein Grundkurs. Reinbek: Rowohlt. pp 107-129.
Salamé, P. & Baddeley, A. (1982). Disruption of short-term memory by unattended speech:
implications for the structure of working memory. Journal of Verbal Learning and Verbal
Behaviour 21, 150-164.
Salamé, P. & Baddeley, A. (1983). Differential effects of noise and speech on short-term
memory. In G. Rossi (Ed.). Proceedings of the fourth international congress on noise as a public
health problem.). Milano: Centro ricerce e studio amplifon. pp 751-758.
Salamé, P. & Baddeley, A. (1987). Noise, unattended speech and short-term memory.
Ergonomics 30, 1185-1194.
Salamé, P. & Baddeley, A. (1989). Effects of background music on phonological short-term

memory. Quaterly Journal of Experimnetal Psychology 41A, 107-122.

Salamé, P. & Baddeley, A. (1990). The effects of irrelevant speech on immediate free recall.
Bulletin of the Psychonomic Society 28, 540-542.
Schab, F. (1990). Odors and the remembrance of things past. Journal of Experimental
Psychology: Learning, Memory, and Cognition 16, 648-655.
Smith, H.C. (1947). Music in relation to employee attitude, piecework, production, and
industrial accidents. Applied Psychology Monographs 14.
Smith, S. (1979). Remembering in and out of context. Journal of Experimental Psychology:
Learning, Memory, and Cognition 5, 460-471.
Smith, S. (1985). Background music and context-dependent memory. American Journal of
Psychology 98, 591-603.
Smith, S. (1988). Environmental context-dependent memory. In G. Davies & D. Thomson
(Eds.). Memory in context: context in memory. Chichester: Wiley, 13-34.
Sticht, I. (1983). Lerngewohnheiten: Lernen bei Musik und bei Ruhe: Eine Untersuchung zu den
Auswirkungen von Hintergrundmuisk auf die Arbeitsleistung von Schülern unter dem Aspekt
der Erwünschtheit der Musik. Dissertation Universität Wien.
Thaut, M. & de l'Etoile, S. (1993). The effects of music on mood state-dependent recall. Journal
of Music Therapy 20 (3), 70-80.
Thomson, D. & Davies, G. (1988). Introduction. In G. Davies & D. Thomson (Eds.)Memory in
context: context in memory. Chichester: Wiley. pp 1-10.
Uhrbrock, R. (1961). Music on the job: its influence on worker morale and production. Personal
Vinh, A.-L. (1994). Die Wirkungen von Musik in der Fernsehwerbung. Dissertation St. Gallen.
Hallstadt: Rosch-Buch.
Waters, H.S. & Schreiber, L. (1991). Sex differences in elaborative strategies: a developmental
analysis. Journal of Experimental Child Psychology 52, 319-335.
Wyatt, S. & Langdon, N. (1937). Industrial health research board report No. 77. Great Britain
Reserach Council 77.
Zimmer, J. & Brachulis-Raymond, J. (1978). Effects of distracting stimuli on complex
information processing. Perceptual and Motor Skills 46, 791-794.
Back to index

PSYCHOLOGY OF PERCEPTION AND MUSIC EDUCATION
Proceedings paper
THE COGNITIVE PEDAGOGY OF AURAL TRAINING

Orlando Musumeci - Conservatorio Alberto Ginastera - Buenos Aires
". . . to look at rather than through the lens that is shaping the materials . . . "
Jeanne Bamberger: The Mind behind the Musical Ear
INTRODUCTION
Music psychology and aural training
This paper is concerned with the ways in which research in cognition in general and in music psychology in particular can
inform the pedagogy of aural training/music theory in conservatories and music schools. A review of the scarce literature
on the topic shows no more than just polite complaints and good wishes, both from researchers and educators, that the
psychology of music should inspire aural training, but almost nothing that might be considered as a direct undertaking of
that task. An increasing number of people within the conservatories and schools of music is slowly becoming aware of
the tremendous importance that psychological and cognitive findings may have for music teaching (cf. Butler 1997, Cook
1994, Covington and Lord 1994, Musumeci 1998a-b). Common sense and musical experience strongly suggest that
plenty of links should exist between both domains. Almost everybody agree in that "taking into account some basic
properties of human brain and senses would be of much help in taking decisions concerning music pedagogy" (Rakowski
1999: 33); aural training has been equated with "brain training" (Henson 1987: 73) and both psychologists and
pedagogues readily acknowledge that "what is aural training, after all, if it is not the acquisition of cognitive skills in
music?" (Butler 1997: 47). But all the thrust provided by this opinions vanishes when it comes to make such links
explicit, and as a consequence such endeavours, the development of a "cognitive pedagogy of music", are late to arrive.
Psychologists are strongly reluctant, short of interest, or unable, to speak of the didactic implications of their findings, and
the literature shows scarce efforts to relate both disciplines in an explicit manner.
The teaching of aural training in conservatories and music schools - strongly linked to the teaching of notation - is
consistently regarded by teachers and students as a challenging task. There are lots of obscure points in the pedagogic
theory that are left to be explained by imprecise concepts like musical gifts, or the mysterious possession or absence of a
"musical ear". Aural training courses, like the whole milieu of the music schools, are still explicitly or implicitly very
reliant on the myth of giftedness, and the inefficacy of the pedagogical procedures is masked by the belief that "talent"
will fill in the didactic gaps. Even at the most elementary level nobody can be sure that he/she will pass an aural
education course. It is not like other subjects like history, language, or even mathematics: in ear training education, no
matter the effort invested, the learner is always in danger of bumping into an insurmountable obstacle: to run short of
"ear".
In this paper I will try to put together existing findings in the psychological literature and anecdotal evidence gathered in
the conservatory classroom. Drawing on some ideas from the research literature I will attempt to find a cognitive
explanation - though perhaps somewhat preliminary and broad - for didactic processes which I believe are effective for
teaching aural training. Many times it can be found that psychological findings indeed confirm what particularly
insightful ear training teachers noticed. We have indeed some excellent books and pedagogic treatises (i.e. Aguilar 1978),
mainly based upon musical intuitions, that many times coincide with some principles corroborated by psychological
research. In that cases the educational implications are not foreseen but confirmed by later psychological findings. The
further gathering of empirical psychological data should be guided by a framework that should include theoretical and
practical issues pertaining to music pedagogy. Only after this preliminary work has been performed is that further
suggestions could be made about the areas where music education can profit from past and present findings in music
perception research. I will try to show how the intuitions of the accomplished music teacher can be explained by some
extant findings in music psychology, since I am strongly convinced that there is already large enough empirical data on
music perception to allow the outline of a broad theoretical perspective taking into account elements from both fields.
Aural training and Standard Musical Notation
file:///g|/poster1/Musumeci.htm (1 of 11) [18/07/2000 00:29:36]

The term aural training has been applied in relation to many different skills from singing solfeo lessons, tapping rhythms,
singing scales, writing down dictations, to discriminating musical textures, forms and timbres (Henson 1987). In fact
aural awareness can be seen as an ability concomitant with any musical activity, but in this work I am concerned with the
concept of aural training focused in the teaching of Standard Musical Notation (henceforth SMN).
The use of notation is a constitutive feature of Western music. In its origin notation was simply a tool for remembering
the tunes, a mnemonic devise that really did work: all music prior to this century has reached us by notational means; and
it is precisely that usage what still mainly justifies its teaching in the conservatories: to have access to the vast "written"
repertory accumulated from the Renaissance to our days. Important as it goal may be, its status has been erroneously
exaggerated in conservatories’ academic circles to the point of being considered the main goal of music theory courses.
Western-tonal music grew increasingly dependent on symbols, both for compositive and performance terms, not only for
needs of recording and transmission but to achieve high levels of complexity. If in its origin notation was only a kind of
by-product of music, along the centuries notational symbols in turn also shaped the way of thinking about music (cf.
López Puccio 1978). If we consider notation as the final output of the cognition of music, as the formalization of abstract
relations among sounds, it is made evident the usefulness of any notation for music teaching and learning. When we put
musical ideas on paper, they "hold still" and "talk back" (Bamberger 1991: 118), and this two-way confrontation is central
for the development of musical understanding, as it was during the history of Western music.
The above arguments justify my view that music literacy - by means of any notational system - is paramount for the study
of music, and not just an accessory skill: it is essentially "an act of reflection . . . of which features are most salient and in
need of recording". . . [a] "post hoc consideration of a piece of music" (Serafine 1988: 37). Accordingly, we can assume
that any attempt to teach notation should start by re-constructing this sequence in the novice music learner, guiding the
reflection over the features that are prominent in his/her representation of the music. To assess that we must consider first
how the illiterate listener listens to music.
FIGURAL AND FORMAL KNOWLEDGE
How people listen to music
Musical listening is a multidimensional experience. The physical stimulus offers us many interacting dimensions -
rhythm, melody, harmony, timbre, text - integrated in a perceptual continuum. To this wholeness our cognition adds even
more layers of "meaning", ranging from low level grouping strategies involving a few events to abstract relations at
higher structural levels, and also aesthetic and emotional responses:
"the sense made of phenomena is always a construction . . each of the individuals finds in the material and
thereby gives existence to aspects that simply does not exist for the other. For the person who attends the
metric aspects of rhythm, figures [in the sense of figural knowledge] remain unrecognised; for the person
who attends to figures, the classification of events according to their shared duration remains inscrutable"
(Bamberger 1991: 29)
Figural knowledge, or sometimes functional, refers to that kind of global, continuous apprehension of the musical
phenomenon that gives instantaneous access to a holistic representation of the music. Our cognition keeps track
simultaneously of all music parameters, assigning each surface event a function inside a figural representation that is
highly dependent on the surrounding context, "events within a figure are contextually bound" (Upitis 1987: 41). Usually a
minor alteration in a single dimension, or the tiniest attempt to break down this functional relation into its different
constituents, is enough to alter or lose altogether the musical identity of the figure.
Formal knowledge or sometimes metrical, stands for a kind of music apprehension that is completely different. It is
concerned with those aspects of the musical stimulus than can be counted, measured, classified, mostly in terms of
proportional durations and frequency ratios, and is put to play when a conventional representation of music, as in SMN, is
needed.
Figural apprehension is the kind of knowledge involved in several different responses to music - aesthetic, emotional,
kinaesthetic - that involve some kind of meaning. The transit from a figural mode to a formal one may be conceived in
relative terms: any time that we integrate a formal concept in a context, we are moving up in the hierarchy and giving rise
to a figural concept. Where structural components are involved, it could be said that figural knowledge allows the access
to a higher level hierarchy that shapes the meaning of individual events, and viceversa, to formalise in the context of this
work means to access to a lower level in a hierarchical representation of music. However, it is not that the figural and
formal modes are mutually exclusive, nor can we imagine each one of them in a pure form. While most of the times both
figural and formal modes alternatively contribute to music cognition, their contribution may be considered asymmetrical.
If a "purely" figural approach is in a sense limited because it has a restricted access to formal knowledge, this fact does

not diminish its musical value. On the contrary, a purely formal approach is limited in a more important way, since no
formal knowledge is musical, without the meaning added by the figural dimension. Presumably a musically literate
musician can switch from one mode of perception to the other at will, though I suspect that there is always a mismatch
between both modes; our figural perceptions always staying beyond what we are able to formalise; we always ‘know
more than we can tell’.
The figural - formal transaction
The teaching of notation - certainly a complex formalization - consists in allowing the learner to perform a fluid interplay
between both modes. At the very beginning the learner needs to switch from his/her figural representation to a formal
one, a process that we call the figural - formal transaction. The crucial point is how to integrate both dimensions in the
learner’s representation: one is intuitive, continuous and holistic, the other is rational, discrete and analytical. When we
are perceiving in a figural mode we do not hear each note in the music, but rather "a non-linear relationship [exists]
between the notes in the score and what people hear when they listen to a performance of it" (Cook 1994: 79). The
formalization needed for notation follows exactly the opposite way: it replaces a figural experience with a formal
description, "a meaningful continuity with a meaningless particularity" (Cole 1974, cited by Terry 1994: 104). It is this
wholeness what renders so difficult to determine exactly all the processes involved in the figural experience of music, but
certainly grouping, beat and meter abstraction, perception of regularities like pattern alternation or repetition,
segmentation, and awareness of tension-relaxation schemas can be considered kinds of figural apprehension. One central
contention of this work is that when we "rescue" our students from that undifferentiated sea of music, when we guide the
first transactions between their figural and formal knowledge, if we fail in establishing the right first link between both
modes of representation it probably will never be easily and naturally restored. Or otherwise, it will be restored just by
some exceptional individuals who are able to find their way alone in that tempestuous sea.
Pedagogical implications: modelling the learner’s cognition
The formalization required for SMN necessarily needs some kind of categorisation of this figural perceptions - to make
discrete something that is naturally continuos -, and categorisation in turn involves a finite number of categories within a
given continuous dimension. If we assume that possible figural hearings are infinite and formal representations are finite,
we conclude with the very obvious assertion that two or more figural representations may have, depending on the level of
analysis, a common formal representation; and also the opposite, that a formal representation - no matter how detailed it
may be - will have more than only one functional representation. This is so because the figural experience is holistic and
will be automatically affected by the context. A higher hierarchical organisation can always be preserved with a
theoretically infinite number of surface (low level) features. Conversely, a formal construct may preserve its integrity
while its figural function may be affected by an infinite number of contextual, higher level frames of interpretation. In
Example 1 are exemplified two cases of formalizations coupled with different figural interpretations. In both cases the
figural interpretation involves the consideration of a context operating at a level higher than the one at which the
formalization operates.
Example 1
FORMALIZATION DIFFERENT FIGURAL INTERPRETATIONS
three events stress context
three heights articulation context
What puzzles the learner in his/her first contacts with SMN is the finite, relatively limited number of concrete means of
notation (musical figures) - a characteristic inherent to any symbolisation - and the consequent infinite number of musical
"faces" that that symbols may adopt in different contexts:
"mistake[s] can be traced to a mismatch between internal mental representation and conventional
descriptions, [and so] it is useful to help students confront these differences. In this instance, that would

mean helping students to move back and forth between metric and figural hearings and between metric and
figural descriptions of a rhythm or melody" (Bamberger 1991: 66-7).
The usual strategy of the ear training pedagogy is to present as many examples of formal configurations as possible, each
one tied to a unique figural representation. This one-to-one relationship obscures the listener’s active role in the process
of construction of these figural meanings from a neutral notation, and consequently the student achieves, at best, a figural
- formal transaction that is rigid and inflexible. He/she finds it difficult to distinguish one mode of representation from the
other, and has problems to differentiate among the aspects of his/her figural experience that can be reflected by notation
and those that notes cannot portray, no matter the degree of detail in the score’s specifications.
This dialogue between the specifications present in the notation and the sense that the learner makes of it is of crucial
importance and must be carefully aided. It could be said that the role of the teacher is to model the learner’s cognition by
helping to disentangle both kinds of representations while allowing a fluid interplay between them. As few as a single
error, only a small mismatch between this two dimensions, may be catastrophic for any further learning. The didactic
strategy proposed here is in a sense opposed to the traditional one: it consists in presenting the learner with as many
different figural representations as possible that a single formal representation can stand for.
To learn what is already known
How can extant psychological studies inform an ear training pedagogy guided by the conceptual framework outlined so
far? One assumption of this approach to the teaching of notation is that formal knowledge can only be constructed from
figural knowledge. In this conception the figural mode is not an imperfect or preliminary form of knowledge but a
foundation and in turn also a goal of the formal one. If we found that every member of a culture is able to construct a
figural representation of the music, that would imply that every person within the range of standard perceptual and
cognitive abilities should be able to learn to read and write music.
Cultured adults process automatically enormous amounts of musical information. The exposure to the music of the
culture, together with innate cognitive machinery to structure the environment, originates an ‘implicit knowledge’ that
can be found in most people. This ability consists mainly in grouping musical sounds and in giving them meaning inside a
culture-driven pattern (cf. Dowling 1999). This musical competence is relevant here because it seems to involve similar
processes in both musicians and no musicians. If we turn to experiments concerned with some of the different processes
involved - like grouping (cf. Deliège 1987), the perception of tensing-relaxing patterns (cf. Bigand 1993) and of
underlying hierarchic structures (cf. Serafine, Glassman and Overbeeke 1989) - we find that naive listeners indeed show a
performance very similar to that of musicians. It seems that musical training does not affect substantially the ability of
normal adults to respond to the figural aspects of the music, but rather that "music perception is fundamentally similar in
listeners with varying degrees of sophistication" (Trehub, Schellenberg & Hill 1997: 104).
There is also enough evidence to assume that the processes involved in this music competence develop gradually from the
pre-natal stage (cf. Lecanuet 1996) and through childhood as a result of normal exposure to music. Infants group auditory
information according to Gestalt principles of proximity and similarity in ways similar to those used by adults (Demany
1982; Fassbender 1993; Thorpe and Trehub 1989; Thorpe, Trehub & Morrongiello 1988), can discriminate between
different rhythmic and melodic patterns (Chang and Trehub 1977a-b), and are sensitive to phrase structure (Krumhansl &
Jusczyc 1990; Jusczyc & Krumhansl 1993). These and other cognitive processes involved in the figural apprehension of
music - like the understanding of tonal closure and harmonic relationships - develop steadily through childhood and "are
generally well in place in human cognition by the age of 10 or 11 years" (Serafine 1988: 224).
This challenges the traditional conception of the aural training pedagogy that treats the learner as if his/her first contact
with the music took place at the conservatory classroom. If the music theory teacher assumes a valuable pre-existing
musical knowledge in the learner, the process of formalization will consist simply in learning "what he/she already
knows" (Bamberger 1991: 259). The transition from figural to formal will consist simply in putting names to "things" that
the novice already perceives and notices. In that sense the job of the teacher is not to teach but just to show.
When a person listens to a given piece she recognises - understands - several characteristics of the music in a
straightforward manner. The cruder, most basic formalization of that understanding is the discrimination between that
precise piece and another, even if she could not rationalise the difference. In that case she "knows more than she can tell".
In a very direct way that situation of the novice learner differs only in degree from the experience that any musician
knows: to listen to some sounds or music passage, to perceive and understand the function, the "meaning", but not being
able to write it down or to play it in the piano: "that sounds like a cadence but I don’t know exactly which the chords or
the inversions are". Musicians and aural training teachers, no matter their audio-analytic sophistication, continuously are
faced with passages, chords, musical relations that they "understand", but are not able to formalise.

But if the average listener can indeed make all the subtle discriminations involved in the "musical" apprehension of
sounds, why is it that not everybody can learn to formalise this knowledge in the form of SMN? I argue that when people
fail to understand the basis of music notation it is because aural training didactics fails when coming to formalise their
pre-existing musical knowledge: the "units of perception" do not match the "units of description" (cf. Bamberger 1991:
8).
One of the central contentions of this work is that with the adequate teaching almost every musically cultured adult
should be able to learn the basics of music reading and writing. We only need to find means in which the learner could
profit from what he/she already knows. If they do not find bonds between their percepts and the formalizations required
for notation the music teacher will simply be, as the saying goes, "answering questions that the learners have never asked
themselves" and that knowledge will never become operative. I propose that the best way to establish links between
figural and formal percepts in music is to draw on the extant figural knowledge of the learner and to exploit any
previously acquired abilities in general auditory pattern processing. In the following section I present some brief
comments and suggestions of how a didactic approach to the teaching of rhythm could take into account that premises.
A DIDACTIC APPROACH TO RHYTHMIC TEACHING
The key didactic issue is to find out the kind of formal description of rhythm that the novice can more easily access, the
one that is closer to his/her figural description. The crucial question for the aural training teacher is how to lead the
learner’s rhythmic processing beyond the basic cognitive mechanisms like regularity extraction and segmentation into
groups (cf. Drake 1998). What is the next step that the learner should take towards a formalization of his/her perceptions
of uniformity and change in the rhythmic flow? At this point the usual focus in ear training courses is to start from the
metric aspects of rhythm. Traditional pedagogy has assumed that, given that everyone can perceive an underlying beat,
they can easily use it as a temporal ruler to measure the length of events that group over the pulsed field. But, as we have
seen, that is an "after the fact" reasoning that assumes a metrical "skill" inexistent in the learner. I argue that the next
stage in the formalization of rhythmic perception should rely not on the metric aspects of groups of events, but on the
functional relations among them. Now let’s examine what are the functional relations in a group than can be more easily
formalised (I use the word functional instead of figural because it reflects more accurately the idea of relatedness of one
event to other, or others, in a group).
Speech and music prosody
The grouping of events - a functional perception - takes place in the framework of "a regular pattern of strong and weak
beats to which he [the listener] relates the actual musical sounds" (Lerdahl & Jackendoff 1983: 12) called metrical
hierarchy. It seems evident that the first novices’ attempts at the formalization of rhythmic groups should be more
conveniently done at a single level of the metrical hierarchy and without loading on long term memory, that is, weak
beats grouped around strong ones, and precisely speech prosody offers an example of that kind of grouping. Since most
music in the Western tradition can be interpreted in terms of figural units that can be described in prosodic terms, I
propose that the figural-formal transaction should start by the formalization of rhythmic groups in terms of prosodic units.
Prosody is the overall acoustical profile of a spoken/chanted utterance that results from the organisation of weak discrete
elements - syllables - around stressed ones; the term stress designates perceptual salience achieved by any mean like
dynamic changes, lengthening of sounds, and pause, etc. In speech comprehension language prosody plays a paramount
role (cf. Cutler, Dahan & Donselaar 1997, Pynte 1998), and the same is true for music. We can imagine the musical
parallel in the case when we chant a melody in a monotone, at half way between speech and singing. In the absence of
accurate melodic or rhythmic information, we can still keep track of its structure relying solely in the information carried
by the prosodic profile of the output: almost every music can be reproduced, or mimicked, with variable degrees of
precision by the human vocal apparatus.
Both speech utterances and music involve the perception of auditory patterns in the framework of sequential and
hierarchical cognitive processing, where continuous sound strings are analysed into discrete phonemes or notes. For
language and music reading the opposite way is followed: a symbol string is decoded to produce a
spoken/sung/performed output. This continuous strings of syllables or sounds, as they unfold in time, are structured
around metrical stresses that determine the prosodic profile of the phrase. The similar nature of constituents marking in
speech and music (cf. Carlson, Friberg, Frydén, Granström. & Sundberg, 1989) indeed suggests the existence of some
shared cognitive mechanisms for grouping in language and music (cf. Pinker 1997: 535, Fassbender 1996: 80).
The first didactic work with prosody must start with the distinction between stressed and unstressed events in a sound
string. This is not a trivial point, since we have seen that in the multidimensionality of the listener’s experience it may be
difficult to keep track of the salience of events along a single dimension. But the formal processing of speech sentences -
a well learned process in most schooled adults - constitutes a clear example of the grouping of events (syllables) along a

single temporal dimension. The syllable is the basic rhythmic unit of Spanish, and the prosodic profile of an utterance is
shaped by the way in which weak syllables group around stressed ones. One of the most characteristic features of the
spoken language is that all the syllables in a rhythmic group, whether stressed or unstressed, tend to follow each other at
more or less evenly spaced intervals of time. The identification of stressed syllables in words and sentences may be used
as a common cognitive framework for the processing of musical events. By splitting words into syllables, and imposing a
steady beat, we can bring easily our students to the perception of the prosodic profile of an utterance, and to transfer it to
SMN. In a typical didactic sequence we can ask the student:
● to recite a sentence following a steady beat and assigning one syllable to each beat
● to find the stressed syllables that organise the meter
● to place bars before the stressed syllables, and finally
● to transfer to music notation the metrical structure of the utterance - establishing a correspondence between strong
and weak syllables and strong and weak musical beats - by replacing the syllables by any conventional music
symbol (Example 2).
Example 2
Es - ta ca - si - ta tie - ne ven - ta - nas
| | | | | | | | | | a steady beat
| | | | | | | | | |
| | | | | | | | | |
Once the metrical structure is established, the further identification and description of prosodic constituents follows. Each
prosodic unit, or foot, is organised around a metrical stress. The formalization of this feet in the learner’s representation
must consist in the distinction between the different ways in which syllables/events organise around this stress. Since the
same metrical structure allows several prosodic interpretations, depending on the grouping, by interchanging different
words to match a given metrical structure, the different groups/feet that unstressed syllables/events may form around
metrical stresses can be made evident for the learner (Example 3).
Example 3
1 |________| |_______________| |_________| |________________|

Un ju - ga - dor ca yó des - ma - ya - do
2 |___| |________________| |_________| |______________________|

Ár - bo - les al - tos pó - da - los to - dos
3 |________________| |_________| |_________________| |__________|
In a typical analytic-reading activity the learner will be asked to search for all the possible different figural interpretations,
that is, all possible prosodic feet that could be formed, following the rule that each of them must contain one, and no more
than one, metrical stress (or head). In that way, for instance, a 4 bar rhythmic phrase will present 4 different groups, each
"wrapped" around a head. The feet can be kept similar among themselves, within the possibilities of the phrase structure,
or different feet may be combined (Examples 4 and 5).

Example 4 Example 5
|_______| |_________| |_________| |____________| |_______| |___________| |_______| |____________|

|_________| |_________| |_________| |_________| |_________| |_______| |___________| |__________|
|____________| |_________| |_________| |_______| |____________| |____| |______________| |________|

|_______________| |________| |_______| |____|
The concept of duration

People is not very good at comparing durations because the formal operation involved in comparison of temporal lengths
is strongly influenced by the figural organisation of the events (i.e., the detection of the lengthenings of events between
groups are "harder to detect and identify than similar lengthenings in other positions within the group" [Drake 1998: 18;
cf. also Fitzgibbons, Pollatsek & Thomas 1974]). Usually a notated duration stands not for the metrical length of an event,
but mainly for the rate of attacks between events that in turn will determine its figural grouping. From experimental data
and musical experience it is well known that "musically calibrated durations [are not] the fundamental processing units of
rhythm" (Serafine 1988: 60). Indeed notational figures used in rhythmic SMN are very limited even at specifying relative
durations: a crude arithmetic relation of twice as much, or half, does not reflect the timing of even the simplest tune when
it is performed musically. More important, a tune can be perfectly recognised when the absolute durations of the events
are changed (as with changing articulations) provided that the attack points are preserved. A primary role of rhythmic
notation is not to specify the exact relative durations of events, but rather the relative time spans between attacks, leaving
the performer an "enormous latitude [for] . . . the manner (legato-staccato) of filling them with sounds" (Monahan 1993:
127). Even when absolute durations have an important role in determining grouping behaviour, that comes as a mere
consequence, or by-product, of the change in the rate of attacks (cf. Povel & Okkerman 1981).
In spite of this, most music syllabus approach the issue of durations from the stand point of the durations of events. In
conservatory courses all music figures (crochets, quavers, semiquavers, etc.) are "taught" fairly quickly - usually in the
first year of formal training (cf. Musumeci 1998b) - relying in the common sense assumption that for teaching relative
durations as those represented by rhythmic notation, one needs at least two different durations to compare between each
other. But what that conceptions promote is the close (and erroneous) association - in the learners representation of the
event - of the different dimensions of event attack point and event duration. In the teaching of notation the distinction
between this two dimensions must be carefully aided. If it is not so, the learner will not understand what a notational
figure - i. e. a quaver - stands for (it is usual to be asked by first year students "how long exactly is a quarter note").
It is for that reason that I propose to avoid the concept of figures’ duration at the beginning of rhythmic notation teaching.
In the examples presented so far it should be noted that the syllables/quavers stand for isochronous beats in which events
are attacked, with not special concern with its duration. However, when chanting a rhythmic phrase students are
encouraged to perform so many subtle deviations in absolute and relative durations as necessary for achieving the "sense"
of the phrase. When chanting "like speaking" all variations in timing (as well as in loudness, pause and pitch like in
speech prosody) are performed intuitively under the illusion of beat’s isochrony. By eliminating the factor of figure’s
durations the learner can concentrate in the "important differences between notes that look the same on the page"
(Bamberger 1991: 39). The simple figural percepts that arise from different groupings at a single level (Example 6) - like
if the quaver marked * can be thought of as "forward looking" or "backward looking" - are basic and underlie all further
perceptual organisation involving different rates in the attacks of events, and in consequence their formalization must
precede the work with the different proportional durations represented by the whole range of conventional musical
figures.
Example 6
*
|____| |_______|

|_| |__________|
But rhythmic teaching must also go on to help the grasping of that groupings that do depend in gross variations in the rate
of appearance of events, like the relations expressed by conventional figures. This relations are indeed very few and
simple: patterns composed of a few events with attack points holding relations multiples of two and three, and further
rhythmic complexity is achieved through a hierarchical arrangement of these basic "rhythmic contours" (Monahan 1993:
127). In traditional aural-training settings the confusion between absolute and relative duration is perpetuated by means of
the explicit taxonomy of contents, that presents as different several groupings-of-figures that have in fact a common
figural-grouping when transposed to another metrical hierarchy (i.e., the groups half note - two crochets and crochet - two
quavers are metrically (formally) different but figurally identical.
The conflict arising from this ambiguity surely is one of the factors responsible for the mismatch between novices’ figural
and formal representations mentioned in previous sections. I argue that the formalization of these hierarchical
transpositions should come only after the basic relationships of rhythmic contours have been grasped at a single metric
level. By introducing whole-beat rests to our ‘elemental’ prosodic phrases we can convey a whole range of time-span
proportions without the burden of introducing different durational symbols. Since the goal is not to teach relations among
figures but among time spans, this method allows the learner to grasp the concept of duration as an abstract relation
between time spans, rather than a property of figures themselves. A whole range of rhythmic contours can be conveyed
without using more than just one metrical level (Example 7). Only after the formalization of this rhythmic contours is that
the "transposition" to other hierarchical levels symbolised by the different range of durational figures makes sense.
Example 7
= =
or
A COGNITIVE PEDAGOGY OF AURAL TRAINING

The kind of training proposed so far has not the traditional goal of just teaching to read rhythmic notation, but rather that
the student should become an analyst of the notation to find which factors will lead his/her cognition to reverse the
process of formalization and give rise to figural concepts. In a former section I argued that traditional teaching presents
formal descriptions (notations) tied to single figural interpretations, usually forgetting that the sense made of music is not
determined completely by the score, but rather is govern by several factors with variable strengths and that may be in
concordance or in conflict. Each passage of music may have different preferred hearings - or figural interpretations.
Lerdahl and Jackendoff (1983) formalised the interplay of these factors in a set of preference rules that establish "not
inflexible decisions about structure, but relative preferences among a number of logically possible analysis" (p. 42). It is
paramount to show the learner how this several preferred hearings originate from a single formal representation.
I consider a useful pedagogic tool for aural training classes to encourage the identification of all "logically possible
analysis" - in terms of prosodic feet - that may be performed over a rhythmic surface. The simplest rhythmic phrase may
be a source of enlightenment regarding exactly what characteristics of musical understanding are reflected by
conventional notation and what others are ignored (Example 8).
Example 8
|_| |___________| |__________| |___________|
|_____| |__________| |__________| |_______|
By constructing and deconstructing the different figural representations corresponding to different preferred hearings
learners may gain a conscious access to the factors that govern the formation of that representations. This factors, like
Gestalt principles of temporal proximity, similarity, and good continuation, and abstract concepts like balance between
phrases, symmetry, and motivic repetition, must be made evident to the students from the beginning of teaching. This
elemental, crudely stated, "well formedness rules" are perfectly accessible to the music novice and constitute a basic
premise for all further learning. The educative power of the concept of preferences among competing organising

principles cannot be overestimated. Such concept does not play an accessory role in music reading but on the contrary,
good music reading consists in evaluating in real time the odds of each different musical figure to arise from the current
notation. This ability to foreseen the chances of alternative groupings must not be considered just one more didactic
devise to gain proficiency in note reading but a fundamental component of musical understanding. From this standpoint
the ambiguity inherent to music notation represents not a marginal effect of an inefficient system, but rather the space left
for a "cognitive" interpreter.
Rhythmic complexity in most music is achieved by competing simple preferences at multiple levels. But, regrettably, the
prevailing didactic approach to rhythmic training in conservatories and music schools indeed reflects a noteworthy lack of
contact with the principles of structural organisation that are at the heart of "musical intuitions". I argument that this lack
of musicality is the consequence of confounding structural complexity with sheer difficulty, confusion that in turn reveals
a profound ignorance of the ways in which human cognition "composes" and in turn interprets music pieces. If it is
possible "to explain artistically interesting aspects of musical structure in terms of principles that account for simpler
musical phenomena" (Lerdahl & Jackendoff 1983: 2-3), that imply that artistic music teaching should always remain
close to that simpler phenomena: "structural simples" (Bamberger 1991) like Gestalt principles, temporal symmetry or
balance, alternation between stability and tension, and so forth.
CONCLUDING REMARKS
Aural training pedagogues must take into account what is now widely accepted in music cognition literature: that artistic
content - or musical meaning - is not an unexplainable achievement of humans, but rather the output of very refined
cognitive processes that are common to the species and that are revealed by the widespread receptive music competence
of people. What that implies is that formal music learning - involving SMN or not - is accessible to any person with
standard cognitive abilities and exposure to music. The lack of "ear" can be seen only as the failure to develop
spontaneously the links between the inborn and acquired cognitive processes underlying musical understanding and a
formal system of description. Keeping this in mind would reverse the charge of responsibility for learning that we found
in typical music settings, where the success of the learner mainly depends on his ability, or talent, to pick up the musical
knowledge provided by the teacher. A failure to do so is considered most of the time as beyond the reach of music
pedagogy. On the contrary, a "cognitive teaching" would be attentive to the way in which learners make sense of the
musical phenomenon, and their success would heavily depend on how that implicit knowledge is didactically developed -
transformed from figural into formal - by the teacher.
What can be done to improve the communication between pedagogues and psychologists?, and who can do it? Within the
great diversity of lines and approaches, it could be said that researchers in perception have done their job well. They have
been improving their methods, technologies and - more important - their theoretical frameworks steadily during the last
three decades, and there is a growing concern with achieving a musical significance in their works. Perhaps music
educators would be grateful if music psychologists focused more on the evolutive aspects of the phenomena that they
investigate, and indeed it is very rare to find in the academic papers a explicit reference to the transference of their
findings to music pedagogy. But with regard to the teaching profession, I am inclined to think that after all psychological
findings are there waiting for music educators to make sense of them. If somehow these findings may be too many for the
professional teacher to handle, it seems useful to outline a framework - articulating pragmatic and theoretical concerns -
inside which it could be assessed the relevance of the psychological data for pedagogic ends. This work is an attempt to
foreseen how that kind of synthetic framework could enrich professional music teaching.
REFERENCES
Aguilar, M. C. (1978). Método para leer y escribir música a partir de la percepción. Bs. As.: María del
Carmen Aguilar Ed.
Bamberger, J. (1991). The Mind behind the Musical Ear. Cambridge, MA: Harvard University Press.
Bigand, E. (1993). The influence of implicit harmony, rhythm and musical training on the abstraction of
"tension - relaxation schemas" in tonal musical phrases. Contemporary Music Review, 9 (1-2), 123-37.
Butler, D. (1997). Why the Gulf Between Music Perception Research and Aural Training? Bulletin of the
Council for Research in Music Education, 123, 38-48.
Carlson, R., Friberg, A., Frydén, L., Granström, B. & Sundberg, J. (1989). Speech and music performance:
parallels and contrasts. Contemporary Music Review, 4, 389-402.
Chang, H. W. & Trehub, S. E. (1977a). Auditory processing of relational information by young infants.


Chang, H. W. & Trehub, S. E. (1977b). Infants’ perception of temporal grouping in auditory patterns. Child
Clarke, E. F. (1988). Generative principles in music performance. In J. A. Sloboda (Ed.) Generative
Processes in Music, pp. 1-26. Oxford: Clarendon Press
Cole, H. (1974). Sound and Signs, aspects of music notation. London: Oxford University Press.
Cook, N. (1994). Perception: A Perspective from Music Theory. In R. Aiello and J. Sloboda (Eds.) Musical
Perceptions, pp. 64-95. Oxford: Oxford University Press.
Covington, K & Lord, C. H. (1994). Epistemology and Procedure in Aural Training: In Search of a
Unification of Music Cognitive Theory with its Applications. Music Theory Spectrum, 16 (2), 159-70.
Cutler, A., Dahan, D. & Donselaar, W. van (1997). Prosody in the Comprehension of Spoken Language: A
Literature Review. Language and Speech, 40 (2), 141-201.
Deliège, I. (1987). Grouping Conditions in Listening to Music: An Approach to Lerdahl & Jackendoff’s
Grouping Preference Rules. Music Perception, 4 (4), 325-60.
Demany, L. (1982). Auditory stream segregation in infancy. Infant Behaviour and Development, 5, 261-76.
Dowling, W. J. (1999). The Development of Music Perception and Cognition. In D. Deutsch (Ed.) The
Psychology of Music [1st. ed. 1982], pp. 603-25. San Diego: Academic Press.
Drake, C. (1998). Psychological Processes Involved in the Temporal Organisation of Complex Auditory
Sequences: Universal and Acquired Processes. Music Perception, 16 (1), 11-26.
Fassbender, C. (1993). Auditory grouping and segregation processes in infancy. Kaste Verlag, Nordestedt.
Fassbender, C. (1996). Infant’s auditory sensitivity towards acoustic parameters of speech and music. In I.
Deliège and J. A. Sloboda (Eds.) Musical Beginnings, pp. 56-87. Oxford: Oxford University Press.
Fitzgibbons, P. J., Pollatsek, A. & Thomas, I. B. (1974). Detection of temporal gaps within and between
perceptual tonal groups. Perception & Psychophysics, 16, 522-8.
Gérard, C. & Auxiette, C. (1992). The Processing of Musical Prosody by Musical and Nonmusical Children.
Music Perception, 10 (1), 93-126.
Henson, M. (1987). Musical Awareness. Huddersfield: Music Department of Huddersfield Polytechnic.
Jusczyc, P. W. & Krumhansl, C. L. (1993). Pitch and rhythm patterns affecting infant’s sensitivity to musical
phrase structure. Journal of Experimental Psychology: Human Perception and Performance, 19, 627-40.
Krumhansl, C. L. & Jusczyc, P. W. (1990). Infant’s perception of phrase structure in music. Psychological
Science, 1, 70-3.
Lecanuet, J-P. (1996). Prenatal auditory experience. In I. Deliège and J. A. Sloboda (Eds.) Musical
Beginnings, pp. 3-34. Oxford: Oxford University Press.
Lerdahl, F. and Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
López Puccio, C. (1978). Prólogo. In María del Carmen Aguilar (Author) Método para leer y escribir música
a partir de la percepción. Bs. As.: María del Carmen Aguilar Ed.
Monahan, C. B. (1993). Parallels Between Pitch and Time and How They Go Together. In Thomas J. Tighe
and W. Jay Dowling (Eds.) Psychology and Music: The Understanding of Melody and Rhythm, pp. 121-54.
Hillsdale, NJ: Lawrence Erlbaum Associates.
Musumeci, O. (1998a). Should we change conservatory to "renovatory"? Proceedings of the 1998 ISME
Seminar on the Education of the Professional Musician. Harare: ISME.
Musumeci, O. (1998b). La didáctica de los cursos de teoría y solfeo: el material musical (The didactics of
theory and solfeo courses: the musical material). Boletín del CIEM. 12, 47-51.

Papousek, M. (1987). Melodies in motherese in tonal and nontonal languages: Mandarin Chinese, Caucasian
American, and German. Presentation at the Ninth Biennial Meeting of the International Society for the Study
of Behavioural Development, Tokyo, Japan, July 1987.
Pinker, S. (1997). How the Mind Works. Allen Lane The Penguin Press.
Povel, D. J. & Okkerman, H. (1981). Accents in equitone sequences. Perception & Psychophysics, 30,
565-72.
Pynte, J. (1998). The Role of Prosody in Semantic Interpretation. Music Perception, 16 (1), 79-98.
Rakowski, A. (1999). Perceptual dimensions of pitch and their appearance in the phonological system of
music. Musicae Scientiae¸ 3 (1), 23-39.
Serafine, M. L. (1988). Music as Cognition. New York: Columbia University Press.
Serafine, M. L., Glassman, N. & Overbeeke, C. (1989). The Cognitive Reality of Hierarchic Structure in
Music. Music Perception, 6 (4), 397-430.
Terry, P. (1994). Musical Notation in Secondary Education: Some Aspects of Theory and Practice. British
Journal of Music Education, 11, 99-111.
Thorpe, L. A. & Trehub, S. E. (1989). Duration illusion and auditory grouping in infancy. Developmental
Thorpe, L. A., Trehub, S. E., Morrongiello, B. A. & Bull, D. (1988). Perceptual grouping by infants and
preschool children. Developmental Psychology, 24, 484-91.
Trehub, S., Schellenberg, E. & Hill, D. (1997). The origins of music perception and cognition: A
developmental perspective. In I. Deliège and J. A. Sloboda (Eds.) Perception and Cognition of Music, pp.
103-28. Hove: Psychology Press.
Upitis, R. (1987). Children’s Understanding of Rhythm: The Relationship between Development and Music
Training. Psychomusicology, 7 (1), 41-60.

Nerland
Proceedings paper
APPRENTICESHIP, CULTURAL PRACTICES AND RESEARCH ON

INSTRUMENTAL TEACHING IN HIGHER MUSIC EDUCATION: A
THEORETICAL PERSPECTIVE
Monika Nerland, The Norwegian State Academy of Music
Introduction
An accepted approach to the study of instrumental teaching is to investigate the interaction between
teacher and student. Some research has focused on the student and teacher behaviour in this situation
(e.g. Hepler 1986, Persson 1994a, 1994b), others have been studying the teaching of expert teachers,
for the purpose of describing methods and strategies of practice being used in their instruction (e.g.
Kennel 1997, Gholson 1998). Moreover, some studies have dealt with effectiveness and evaluation
where instrumental teaching is concerned (e.g. Abeles 1975, Rosenthal 1984).
This research has contributed with interesting perspectives and insight regarding what is happening in
the teaching-learning situation. However, what these studies have in common is that they deal with
the student-teacher-interaction as kind of an isolated practice. Teacher and student behaviour are
analysed as a two-part relationship, and the strategies of teacher practice are investigated through
what may be called an individualistic, single context model. From my point of view, what has not
been much focused upon is how this kind of teaching practice is socially and historically bounded.
We know little about what shapes the teaching activity, what cultural factors are determining the
choices made - or, in other words, what makes the teaching be the way it is. The understanding of the
student and teacher positions in this kind of pedagogic practice, as well as the logic and the
intentionality in this interaction, depends upon viewing this activity in a relational perspective. By
understanding instrumental teaching as cultural practice, what takes place in the teaching-learning
situation will be looked upon in relation to professional standards, values and practices in the music
community in other respects. This paper discusses a theoretical basis in order to understand
instrumental teaching as cultural practice, and gives examples on research questions which are current
to elucidate within this kind of theoretical framework.
To me, the later research on apprenticeship and learning as situated in social practices (Lave &
Wenger 1991, Nielsen & Kvale 1997) has been an important gateway to this field. Hence, I will start
by giving a short description of learning within a master-apprentice construction, whereupon I will
bring these concepts into the further discussion of instrumental teaching as cultural practice.
file:///g|/poster1/Nerland.htm (1 of 9) [18/07/2000 00:29:39]

Nerland
Apprenticeship and learning as social practice

Instrumental teaching in the higher education of music may be regarded as an apprenticeship-like
education, among other things because the core of the study is the guidance the students are given by
their teachers of principal instrument, who predominantly are musicians of the highest level
themselves (Nielsen 1998). The teaching-learning situation is, however, about a lot more than the
relationship between student and teacher; it is among other things also about an education of
practitioners that intend to lead the student towards the initiation to a profession by making them
internalise essential values and traditions in the profession. Inspired by the work of the anthropologist
Jean Lave, Nielsen and Kvale (1997) describes four main aspects of apprenticeship:
● Learning takes place within a community of practice, where the student through participating in
the professional practice approaches the professional skills, knowledge and values,
● The process of acquiring a professional identity is very essential for the development towards
becoming professional performer. This process is integrated in the apprentice's participation in
the community of practice,
● Learning by doing: The profession is learned through practice, where observing, imitation of
and identification with experienced performers are the main components,
● Evaluation in practice: The apprentice receives direct guidance in his exercise of the
profession. The visualisation of the profession also provides good opportunities for self
evaluation.
These points represent a general description of how learning is socially situated, in learning situations
that are organised on the basis of a master-apprentice relationship. To the extent that this is also
descriptive for the higher education of music, clearly instrumental teaching is not only a question of
the social or pedagogic interaction in itself, but also about mechanisms and strategies to recruit new
practitioners to a certain profession. Teaching and learning should then be understood to be social
practices, where the interaction between student and teacher gets its shape and meaning in relation to
other practices and contexts within the education and the music society in other respects.
By approaching instrumental teaching through the concept of apprenticeship, it seems possible to
connect several levels which are often treated separately, from the level of practical teaching to
questions concerning the culture of institutions, the traditions of the discipline, and the history of
education. Research on higher music education may focus upon different research questions, for
example:
● How students learn through participation in communities of practice, in which communities
their participation takes place and the interrelationship between these,
● The construction of a professional identity, and the values and traditions constituting this,
● The teacher-student relationship, and thereby dimensions of authority, guidance, dependence -

independence, the placement of responsibility, the power of definitions etc.,
● The teaching-learning situation, through e.g. the teachers' choice and use of strategies of
practice, and, moreover, how the teaching is constructed through the interaction between
teacher and student,
● The interrelationship between different institutional levels, e.g. between the practical teaching
and the management of the institution, or between the educational system and the professional
practices of music in general.

Nerland
Within music education there will be a need for studies within all fields and in different contexts.
Naturally, one has to make some choices and limitations for the individual research projects, but an
important challenge is to be concerned for the relational and the view for totality also when focusing
on certain parts of the activity. From now on I will mainly focus on the teaching within this kind of
pedagogical construction, to be more specific on the principal instrument teaching within higher
music education. From my own background and experience, I have primarily the Norwegian
education of classical musicians in mind.
Instrumental teaching as cultural practice

From the concept of apprenticeship and an anthropological perspective on teaching and learning,
instrumental teaching may be understood as a kind of cultural practice, or also as an activity which is
produced in the meeting between different cultural practices. The notion of teaching then does not
only refer to the behaviour of the teacher and the student, but also to the way the teaching-learning
situation and the interaction are constructed in the meeting between the participants and the practices
they bring into the situation. Their views of the world and their experience will have influence on their
expectations of the teaching as well as how they understand, interpret and participate in the
interaction. Both the student and the teacher are, and have been taking part in parallel musical
practices, which represent a basis for their actions in the actual situation:
The teachers of principal instrument in higher music education are professional musicians themselves,
they often have an active professional life as a musician parallel to their teaching activity, and achieve
a great part of their professional authority through their profession as musicians. Those musical
practices being the basis of the teacher's professional experience will thus often be rooted outside the
educational institution, in other communities of practice within the music society. To exemplify this,
the professional practice as an orchestra musician will be a very central reference for those who teach
students aiming at the same profession. At the same time, the teacher will be carrying his experience
as a novice in the profession; this may be from his own studying time, from the interaction with the
teachers s/he was taught by himself, looked up to, and with whom he identified himself. This gives
wider relations to former generations of musicians and their history, what could be called the
genealogy of the profession. Thus, historical as well as contemporary conceptions and ways of
thinking about the profession will be inscribed in the teacher's way of teaching, and shape a structure
of action in the actual teaching situation.
With regard to the student, the other communities of practice and the contexts for musical work in
which the student takes part will be of high significance. This concerns mainly the contexts within the
study, like practising situations, concerts, orchestra and ensemble activities, group tuition,
interpretation, etc. Which assumptions does the student meet in these learning arenas? Which values
and standards are being emphasised as important here, which challenges does s/he relate his
experiences to, and which ones will s/he not be facing? Additionally come the student's experiences
from musical practices outside the educational institution, as many music students work
professionally parallel to their studies by taking vacancies in orchestras and other professional
ensembles, or as free-lancers.
Bourdieu's notion of social field

The ideas and conceptions of the profession and about the reality in general that the student and the

Nerland
teacher bring into the teaching-learning situation will be determining for what may happen there.
Among other things, the extent of unanimity between the references of the participants will be
important for their communication. My experience is that there is often a remarkable agreement
between student and teacher with respect to their understanding and expectations of the activity of
instrumental teaching. This may be telling that many aspects of the practice are taken for granted by
the persons involved, which again may be referable to established cultural practices that have
developed over time and thereby achieved a degree of stability in the community.
According to the theories of Bourdieu, cultural practices live within social fields, through ways of
thinking and behaving, rituals, myths and standards inscribed in the everyday practice. A social field
may be understood as a network of relations between single practitioners, groups of practitioners and
institutions, and is recognised by an ongoing social fight between parts and groups in the field about
the power of definition in the relation to which values, standards and rules should be regarded as
fundamental for the social practice (Bourdieu and Wacquant 1992). As an example, the professional
music society can be regarded as such a field, where already established ensembles, concert
organisers, the record industry and educational institutions are among the central instances. To view
higher music education in this perspective actualises the question of how the structure and dynamics
of this field imply possibilities and limitations for the content and form of the education (compare
Krüger 1994). Which parts and groups seem to have the legitimacy to make an influence on the
education? How do different practices in the field play a part in the producing and maintenance of the
conception of e.g. good music and professional performance? In our current topic, how do these
mechanisms add form to instrumental teaching? And, in turn, how does this teaching practice
influence other practices in the field?
To Bourdieu, central premises are that actions and social practices should be regarded as basically
relational, and that the individual is being constituted through cultural practices. The individual is also
an agent for such practices. This means that the individual acts "on behalf of " a wider structure, and
that the culture in a way operates through its participants and is kept alive or continued through their
actions. Instrumental teaching may be seen as a micro practice within a social field, where the student
and teacher also act as cultural agents who, through their teaching-learning activity, contribute to the
furtherance of particular cultural practices. A master-apprentice relationship will often be asymmetric,
considering the teacher serves as an agent for a professional practice in which the student wants to
participate. For all one can see, it may be this familiarity with the culture that constitutes the teacher's
authority and position as a master of the discipline. In apprenticeship learning the visualisation of the
profession is regarded as an important advantage, as the apprentice gets the opportunity of observing
professional practitioners of the discipline as basis for his own process of learning (Nielsen & Kvale
1999).
Institutional culture and educational codes

Cultural practices operate in our everyday styles of reasoning and behaving. To a great extent, though,
they also take place within institutions. Any formal educational system must necessarily involve an
institutionalisation of cultural practices, through the establishment of rules, management,
administrative routines and other frames which construct the contexts for exercising the discipline.
Such systems and mechanisms will carry basic assumptions about the work as a professional
musician, where certain presumptions are communicated whilst others are excluded. For this reason,
the culture within a particular institution, for example a conservatoire, should be regarded as an
important dimension through which the practice of instrumental teaching is shaped.

Nerland
Among other things, the institutional culture operates through the language used, in spoken as well as
in written curricula and other documents (Säljö 1992). Moreover, the culture may come into view
through the procedures of employment, for example in the description of the position. The didactical
structure of the education, for instance the apprenticeship way of organising teaching and learning,
will furthermore both be an effect and a producer of the institutional culture. Higher music education
can be viewed as organised trajectories of participation, where aspects concerning the culture of the
institution may be determining for the choice of social practices to which the students could take part
and relate their experiences.
In order to grasp the mechanisms of institutional cultures, Bernsteins notion of educational codes may
be a useful approach. He uses the concepts of framing and classification as central dimensions in an
educational code. Classification refers to the degree of separation between the different subjects,
whilst framing concerns to what extent the teacher and student themselves may control the selection
and organisation of subject matters in the teaching-learning process (Bernstein 1971). He also uses the
concepts of integrated and collective codes as two general codes of knowledge, respectively
recognised by weak and strong classification. Such concepts may be useful for investigating the
horizontal and vertical relations of the discipline of instrumental teaching. Aspects that are relevant to
study in this connection are the preservation of division lines toward other subjects in the education,
and the particular teaching-learning practice in relation to the degree of external influence through the
written curriculum, other administrative documents and rules. As an example, Kingsbury (1988)
asserts that the teaching at an American conservatoire to a great extent seemed to be a matter between
the individual teacher and his / her students, where the staff or other teachers only infrequently
interfered. To what extent is this also descriptive for European music education? Further, it seems
possible to elaborate this way of thinking into a wider social field: How is the relationship between
the musical profession in general and the teaching practice within the education? There may be a
connection between these levels, for instance that the teaching could be kind of private just because
the professional practice for orchestra musicians is quite strongly regulated. If there is a high degree
of congruence concerning the cultural practices for which the teachers serve as agents, this might
reduce the need for institutional regulation of the individual practice. A space for random occurrences
and variations will arise; when there is a unity concerning the content of the message, teaching
practices that appear as different could nevertheless socialise the students into similar professional
practices.
To understand instrumental teaching as cultural practice will in effect actualise research questions
concerning what kind of relationship there is between the participants in the teaching-learning
process, how this practice is situated within an institutional culture, and how the instrumental teaching
in turn gets its shape through interrelations to other institutions and to musical and social practices
within the social field. It is, however, hard to believe that it would be possible to deal with empirical
data from all these arenas in a single research project. An important challenge is, thus, to develop
analytical approaches by which it is possible to grasp such research questions. We need to find a way
to "open up" micro practices within a social field, for example the instrumental teaching, in order to
"read" relational mechanisms out of the interactions that take place in this practice.
Focusing on the relations: Discourse as an analytical concept

One possible approach is to understand instrumental teaching in terms of discursive practice.
Discourses can be understood as ways of thinking, talking about and understanding the world that
shape actions (Jørgensen & Phillips 1999). The conceptions of the music profession that the
participants bring to the teaching-learning situation, can be regarded as discourses. These are being

Nerland
kept alive through social practice. Such conceptions may concern ideas about mechanisms for
inclusion into the professional community, about central knowledge of the discipline, and about how
to acquire and develop professional competency. They may include ideas about the need for a special
kind of talent, about criteria for assessing repertoire and performance and about assessing
performance ability compared to creative ability. Thus, the discourses make up a repertoire for action.
They set boundaries for what gives meaning in the various situations, and therefore also for what may
happen in a teaching-learning situation.
Säljö (1999) asserts that human knowledge development is largely a question of learning to master
discourses. As to the education of musicians, one can say that the students develop professional
competency by relating to, internalising and continuing to develop the dominating discourses of their
profession. They gain access to these discourses through their interaction with the teacher of principal
instrument, but also through their encounter with other students and actors and practices within the
musical community. The discourses they encounter and through which meaning is generated from
experiences are quite determining for the students' learning and professional growth. Given a focus
upon the teaching situation itself, this is a question of what discursive practice the teacher brings to
the teaching. The discourses the teacher draws upon are manifested in his or her actions - verbal or
non-verbal - and in the relationship between them; the way s/he gives instructions, her or his use of
musical examples, the way s/he positions her or himself physically in relation to the student can be
also regarded as "staging" of discourses. Thus, the teaching can be regarded as "an ensemble of
discursive practices" (Krüger 1999). However, pedagogical practice in such a one-to-one situation
should be understood as social interaction. From the perspective of instrumental teaching on a higher
level, it is of special interest to see how this interaction is being constructed in the encounter between
the discursive practice of the teacher and that of the student. Both the fact that the teaching takes place
individually, and also that it is about an education of practitioners where mature students are involved,
may make the student's influence upon this construction be stronger than if teaching took place in an
ordinary classroom situation.
Using discourse as an analytical concept in research implies investigating how the participants
construct the didactical space, within which learning can occur. It implies making explicit the
discursive rules and the structure that organise the practice. How does one proceed in doing this?
First, it is necessary to gain insight into the actual practice the way it is being performed, and not only
through the way it is talked about and described by the participants. Therefore observation is
fundamental. However, interviews may also provide an important insight, especially into how those
involved think about and experience their situation. Therefore, a combination of interview and
observational data may provide a sound ground for the analysis. Both interviews and observation
should be relatively open in form so that the persons interviewed and observed get an opportunity to
express themselves and act as naturally as possible. The researcher should try to minimise her or his
forcing upon the material pre-determined categories that have little grounding in the practice under
study.
Traditionally, discourse analysis as a methodological approach has focused on verbal language and
other sign systems of social practice. Epistemologically, the method is grounded in social
constructionism and in more recent philosophy of language. The argument has been put forward that
language is not only being shaped on the basis of our conceptions of reality,
but also that the language itself, in fact, shapes this reality. Linguistic analysis is one possible
approach. For example, concepts that appear central to the interaction, the use of metaphors and use of
personal pronouns, may all express basic conceptions of the subject and of the participants' experience
of their position in the actual community of practice.

Nerland
However, in research on instrumental teaching a focus only upon verbal language may imply a
significant limitation to an understanding of what actually takes place in the interaction. In such
practice, verbal comments, gestures and musical expressions are typically woven together to form
meaningful statements. Therefore, to understand how meaning is being shaped in the
teaching-learning situation and how this interaction is being constructed, it is important to consider all
kinds of social actions. In order to reveal discursive patterns the following questions should be raised:
What topics can be identified in the interaction? What are the repeated actions and what are the
patterns that organise them? When, and how, does the teacher intervene in the student's playing? What
examples are being used, and what narratives from the professional practice are brought into the
situation? What are being made the objects for explicit discussions and choices, and what seem to be
taken for granted by the participants? Such questions may reveal basic ideas and styles of reasoning
about the subject and about the musical profession. They may open instrumental teaching as a micro
practice within a social field, to be explored in relation to other practices and discourses that operate
within the music society in general.
Epilogue
Within the discipline of instrumental teaching there will be a need for research on all levels and
arenas. The theoretical perspective discussed in this paper might primarily contribute by opening up
cultural practices and showing aspects and mechanisms that are taken for granted, as a basis for
reflection and refinement of the discipline. For some time now there has been a focus on the
importance of educating reflective practitioners, and of encouraging professional practitioners to
reflect upon the knowledge, values and norms that are embedded in their work (Schön 1983, 1987).
There are, however, reasons for questioning whether it is possible or not to grasp the determining
mechanisms for this kind of social practice through individually based reflection. In many cases this is
not only a question about the individual's personal theory of practice, but also about ideas and
conceptions that operate within the institutions, the systems, the culture:
"If we are to engage in something called action research or "reflective teaching", we need
to ask what systems of ideas organize how we construct the objects that we are calling
schooling, children, teaching, learning, and so on. (...) The ordering of reason through the
commonsense and ordinary languages of schooling needs to be brought into focus."
(Popkewitz 1993: 27)
By focusing on collective mechanisms instead of on the individual, we might also gain new insight
into the individual participant's actions in the teaching-learning situation. As an example, we could
better understand how the instrumental teacher constructs the didactical space through his practising
of cultural or discursive practices. Returning to the matter of apprenticeship learning, the following
questions arise: How do the social practices of the music education create possibilities and limitations
concerning the students a) participation in professional communities of practice, b) access to
examples, narratives and models regarding professional practice, and c) process of acquiring a
professional identity? This is not only a question of how to learn a musical profession, but also about
what kind of a musician one becomes; what kind of participation, knowledge and musical preferences
that are given legitimacy in the education and thereby are communicated in the social practices of
instrumental teaching. From my point of view, this approach seems important as a supplement to the
studies where teacher and student behaviour are investigated on basis of a single context model.

Nerland
References
Abeles, H.F. (1975): Student perceptions of characteristics of effective applied music instructors.
Journal of Research in Music Education, 23, 2, 147-154
Bernstein, B. (1971): On the classification and framing of educational knowledge. In Class, Codes
and Control. London: Routledge and Kegan Paul Ltd.
Bourdieu, P. & Wacquant, L. (1992): An Invitation to Reflexive Sociology. Chicago: The University of
Chicago Press
Gholson, S. (1998): Proximal Positioning: A Strategy of Practice in Violin Pedagogy. Journal of
Research in Music Education, 46, 4, 535-545
Hepler, L.E. (1986): The measurement of teacher / student interaction in private music lessons and its
relations to teacher field dependence / independence. Dissertation Abstracts International 47, 2939-A
Jørgensen, M.W. & Phillips, L. (1999): Diskursanalyse som teori og metode [Discourse analysis as
theory and method] Frederiksberg: Roskilde Universitetsforlag
Kingsbury, H. (1988): Music, Talent and performance. A Conservatory Cultural System. Philadelphia:
Temple university press
Kennel, R. (1997): Teaching music one-to-one: A case study. Dialogue, Spring 97, 21, 69-81
Krüger, T. (1994): Musikklærerutdanningen som et sosialt felt. [Music teacher education as a social
field]. In P. Dyndahl & Ø. Varkøy (Eds): Musikkpedagogiske perspektiver. Oslo: Ad Notam
Gyldendal
Krüger, T. (1999): Undervisning som et ensemble av diskursive praksiser [Teaching as an ensemble
of discursive practices]
In Pedagogikk - normalvitenskap eller lappeteppe? Rapport: 7. nasjonale fagkonferanse i
pedagogikk, bind 2. Forskningsrapport 43/1999, p. 219-225 Lillehammer: Lillehammer College
Lave, J. & Wenger, E. (1991): Situated Learning. Legitimate Peripheral Participation. Cambridge
University press.
Nielsen, K. & Kvale, S. (1997): Current issues of apprenticeship Nordic Journal of Educational
Research, 17, 130-139.
Nielsen, K. (1998): Apprenticeship in Music. Learning at the Academy of Music as Socially Situated.
Doctoral Dissertation. Aarhus University, Institute of Psychology.
Nielsen, K. & Kvale, S. (Eds.) (1999): Mesterlære: Læring som sosial praksis [Apprenticeship:
Learning as social practice] Oslo: Ad Notam Gyldendal
Persson, R. S. (1994a): Concert musicians as teachers: On good intentions falling short European
Journal for High Ability, 5, 1, 79-91.
Persson, R. S. (1994b): Control before shape - on mastering the clarinet: A case study on
commonsense teaching. British Journal of Music Education, 11, 223-238
Popkewitz, T. S. (1993): Professionalization in teaching and teacher education: Some notes on its
history, ideology and potential. Paper, also published in Teacher and Teacher Education, 1994, 10, 1.

Nerland
Rosenthal, R.K. (1984): The relative effects of guided model, model only, guide only, and practice
only treatments on the accuracy of advanced instrumentalists musical performance. Journal for
Research in Music Education, 32, 4, 265 - 73
Schön, D. A. (1983): The Reflective Practitioner. How professionals think in action. Arena. Ashgate
Publishing Limited.
Schön, D. A. (1987): Educating the Reflective Practitioner. Towards a new design for teaching and
learning in the professions. San Fransisco: Jossey-Bass
Säljö, R. (1992): Institutioner, professioner och språk. [Institutions, professions and language]
Unpublished paper.
Säljö, R. (1999): Kommunikation som arena för handling - lärande i ett diskursivt perspektiv.
[Communication as an arena for action. Learning in a discursive perspective.] In C.A. Säfström og L.
Östman (Eds.): Textanalys. Introduktion till syftesrelaterad kritik. Lund: Studentlitteratur
Back to index

Starting to play a musical instrument: parents' and children's expectations of the learning process
Proceedings paper
Starting to play a musical instrument: parents' and children's

expectations of the learning process
Dr. Stephanie E. Pitts
Department of Music, University of Sheffield, S10 2TN, UK
and
Associate Professor Gary E. McPherson
School of Music and Music Education, University of New South Wales, Sydney, Australia
2052
Abstract
Recent research in the psychology of music has helped to form a clearer picture of the processes involved in learning
a musical instrument, in particular of the need for both extrinsic support and intrinsic motivation to sustain
commitment to learning. The research presented in this paper brings a new dimension to that understanding, as it
captures the thoughts and ambitions of children and their parents before the child starts lessons, drawing on
qualitative interview data to gain a clearer picture of what inspires a child to start learning an instrument in the first
place.
Introduction
Research in recent years has advanced our understanding of the processes involved in learning a musical instrument,
with notable investigations into motivation (O'Neill, 1996), practising strategies (Gruson, 1988; Hallam, 1997; 1998),
and the involvement of parents and teachers (Davidson, Howe & Sloboda, 1997). Much of this research has centered
on the established student population, profiling those children who have already begun lessons, through interview,
observation and skills analysis. As a result, we now have a clearer picture of the complex social, musical and
developmental questions involved in musical instrument learning, even though investigation of the initial motivations
and ambitions of learners has previously been largely retrospective.
The data presented here offer a new insight on the stage just before learning commences, in which parents and
children are considering the task that they are about to undertake and predicting the level of commitment that learning
a musical instrument will involve. Quantitative analysis of data from the same longitudinal study has already revealed
that children bring to their music instruction expectations and values that potentially shape and influence their
subsequent development. If this preliminary evidence (McPherson, 2000) is correct, then children as young as eight
are able to differentiate between their interest in learning a musical instrument, the importance to them of being good
at music, whether they think that their learning will be useful to their short and long-term goals, and also the cost of
their participation, in terms of the effort needed to continue improving. Interestingly, children who display short-term
commitment to learning their instrument achieved at a lower level, irrespective of whether they were undertaking
file:///g|/poster1/Pitts.htm (1 of 6) [18/07/2000 00:29:41]

low, moderate or high levels of musical practice; students who express medium-term commitment achieved higher
average scores which increased according to the amount of their practice during the period studied. The highest
achieving students were those who displayed long-term commitment to playing coupled with high levels of practice.
These results are consistent with findings in more general educational research (Eccles, Wigfield & Schiefele, 1998;
Wigfield et al., 1997), which show that young children's beliefs about their own personal competence and valuing of
an activity predict how much effort they will exert on the task, their subsequent performance, and their feeling of
self-worth, even after previous performance is controlled (Eccles, Wigfield & Schiefele, 1998; Wigfield et al, 1997).
Evidence to date shows that children are able to distinguish between what they like or think is important for them
against perceptions of their own competence in a particular field (Eccles et al., 1993; Wigfield, 1994), and that
expectancy beliefs, such as self-concept, ability perceptions, and expectations for success, are effective predictors of
achievement (Wigfield & Eccles, 1992).
Methodology
The families participating in this longitudinal study include 156 young brass and woodwind players taken from eight
different primary schools in Sydney, each of which has an established instrumental teaching and school band
programme. The cohort was chosen with a balance of gender, socio-economic status and school background, and
reflects the variety of backgrounds and experiences that children bring to musical instrument learning. Over the three
year study, parents and children participated in regular interviews. The children also worked with the investigator in
research sessions that mapped out their progress across a number of performance areas, including sightreading,
performing rehearsed repertoire, memorising, playing by ear and improvisation. For the purposes of this paper,
qualitative data are drawn from the initial interviews with parents and children conducted immediately before the
child started instrumental lessons. In these interviews, participants were asked to state their reasons for beginning
musical instrument tuition, and to predict the effort and involvement that this tuition would require.
The results have been analysed for emergent themes, allowing an exploration of the different perceptions held by
parents and children about what learning an instrument and participating in the school band actually entails. For the
purposes of this paper, the focus is on the types of reasons given by parents and children, rather than their validity as
a predictor of success. It will become apparent that the perspective on learning held by the two generations can often
be significantly different, but further analysis will be necessary to assess whether these differences are in any way
responsible for dropout rates and loss of motivation.
Results
Taking an overall view of children's and parents' responses to questions about motivation and expectations on starting
to play an instrument reveals many emerging links across and within the two groups. Unsurprisingly, parents tend to
be more articulate about their educational reasons for encouraging their child to start instrumental lessons and join the
band - two activities that are connected for the majority of the children in the survey - whereas children's comments
are more focused around the ideas of having fun and enjoying the experience of being in the school band.
Extracting themes by frequency of occurrence, the responses can be broadly grouped in the following categories:
Table A: Children's responses
CATEGORY EXAMPLES
1. Enjoyment and satisfaction 'It's a fun activity'

'I thought it would be exciting'
'It makes you feel good when you get up to a certain level and you've
done it'

2. Following a role model or peer 'My sister is in school band and she says it's really fun'
group
'Lots of other people are doing it - my friends are all in band'
'My brother suggested it because he plays drum kit and we could form a
band'
3. Fulfilling an ambition 'Since year one I've always wanted to play trumpet'
'I wanted to play it since last year when I saw the band playing'
4. Following parental advice or 'Mum thought it would be really nice - she likes hearing instruments and
expectations she likes teaching them'
'My last year's school report said my music was bad so Mum said I should
do the band to help me'
5. Extra-musical benefits 'The band camps sounded interesting - being away from home and trying
out different sorts of foods'
'I like the way that everyone has to wake up early [for band rehearsals]
and we don't have an excuse to wake up early in the morning unless I
have to go somewhere'
'It's a nice heavy instrument so you can like get muscles and stuff carrying
it around'
'I wanted to get one of those little trolleys so that I could use it for other
stuff too'
Table B: Parents' responses

CATEGORY EXAMPLES
1. Decision generated by the child 'We're not a music learning family - I'd never offered it to him'
'He was hanging out to be old enough to join. We discussed it with him
and decided to let him, to help boost his confidence, which has worked'
'It's not something I would have chosen for her, but she convinced me. I'm
happy and I hope it lasts'
2. Following a role model or peer 'He knew what band was about because his brother's been in it for three
group years, so he's been looking forward to joining'
'She's been to many band practices with her brothers and always assumed
she'd join too'
'It came about through her sister learning, and seeing Lisa Simpson play
sax'
3. Following school expectations 'There's never been any question about it, it's always been assumed she'd
join when she got to Year 3 - it's so much a part of the school'
'No real decision - it's part of the school ethos'

4. Following parental expectations 'We talked about it as a family, the cost and commitment, and said we
wanted her to stick with it for a year'
'We just presumed she would, and she just expected to go in the band'
'It was always expected by her and by us. If she'd said she didn't want to,
we would have had to look seriously at the family dynamics'
Discussion
A surface-level analysis reveals immediately that for children who are starting musical instrument tuition and band
membership the focus is on having fun and being with friends, whereas for parents, a longer-term perspective on the
educational value is prominent, along with a much clearer acceptance that band participation is an accepted part of
the primary school programme. In other words, the decision seems more momentous to the child, who has often been
waiting for some time to join the band, and anticipates gaining enjoyment and satisfaction from doing so. For parents,
though, working from the broader perspective of what other children and offspring have previously done, learning an
instrument is often seen as a 'foregone conclusion', expected by them or by the school. Thus parents and their children
perceive the significance of the decision differently, creating a potential conflict before the instrumental tuition has
even begun. Typically, parents are more aware of the more serious implications of learning, including costs and
commitment to practice and rehearsals, whereas children are looking forward to gaining enjoyment from being a
member of the band. Once again, the two parties are starting out with different expectations, where the child
anticipates fairly immediate, pleasurable results, whilst the parent looks to the possible challenges of sustaining
motivation beyond the initial enthusiasm. In some families that were interviewed, deliberate efforts had been made by
the parents to point out these longer term consequences to the child through negotiating expected practice times, or
stipulating that the instrument should be tried for at least a year.
There are also differences in the way that parents and children perceive the school opportunities, which are usually
centred around band rehearsals, often held weekly before school. Many children cite the experience of having heard
the band as being a motivation for taking up an instrument, and whether they have connected with the sounds they
have heard or the prospect of going on band camp, it is clear that membership of the band is an important part of
school for them. Parental reactions differ within group on that question, with some parents sharing the children's
impatience to be in the right year for joining the band, and others feeling somewhat pressured into allowing their
child to participate. Particularly where the child is the first of the family to be involved, some parents claimed to be
unaware of the opportunities available until their child came home with information, and as a result those parents
expressed concerns about the costs and commitment involved. Others acknowledge the prominence of the band in the
school's ethos, and appear to have received clear information from an induction evening held in school. Getting
accurate information across to all parents is always difficult, but in this case it serves as an illustration of the different
expectations and levels of support that different children and their families bring to the band.
Children and parents involved in our study were also asked about the child's choice of instrument, and their responses
demonstrated that children, in particular, often had very clear ideas about the instrument that they wanted to play.
Their reasons ranged from having seen a sibling, friend or public figure play the instrument, to liking the sound or
some physical feature of the instrument. Parents' reasons ranged across many categories, but were often more
practical than the children's, including considerations of how loud the instrument would sound in a small flat, or how
easily the child would be able to carry it on public transport. All these reasons were overshadowed by the audition
process in operation at most of the primary schools involved, where children would be tested on different instruments
and informed of their suitability. Many children therefore ending up playing instruments other than the one they had
been hoping for, and although most parents and children seemed to be resigned to the school's choice, it is possible
that the mismatch between the ideal choice and the reality could put a dampener on their initial enthusiasm.
Most parents anticipated that their child's 'worst thing' would be practising, and indeed, quantitative analysis of data
from the same study has revealed that unrealistic expectations about practice can predict failure to continue beyond
the first nine months of tuition on the instrument (McPherson, 2000). These effects were particularly marked where
the mother had not previously learnt a musical instrument, suggesting that parents learn from their own and other
children's experiences, placing only and eldest children from 'non-performing' families at something of a
disadvantage. The reluctance of some of these mothers to allow their children to become involved in band comes

across clearly in the qualitative data, where they express concerns about the pressures and commitments involved in
learning, contrasting with those parents who are more relaxed about letting their children take the opportunities in the
hope of them being of benefit.
Conclusions
From the data presented here, parental and child attitudes to learning can be seen to be varied and strongly held,
painting a vivid picture of the different expectations that children and their families bring to instrumental learning.
These results do not attempt to analyse the direct effects of different attitudes upon children's success in learning, as
the relationship between such complex factors is beyond the scope of this paper. However, it is evident that this
previously neglected area of research has much to offer our understanding of instrumental learning, motivation and
musical perception amongst young children.
Broadly speaking, it is apparent that the majority of parents have a clearer idea of the potentially negative aspects of
instrumental learning, such as cost, commitment and practice, than do their children, who are more aware of the
elements of sociability and fun involved in being in the band. It could be argued that to make children more aware of
the realities of learning an instrument, as some of the parents in our study tried to do, could be potentially off-putting,
squashing the enthusiasm that sustains most of the children through the initial stages of learning. We would suggest
that more useful intervention could be made after the first few months of learning, to address the difficulties that arise
when the less committed novices encounter the inevitable need for increased effort for less reward, having got past
the initial stages of learning. Identifying and making goals more manageable at this stage would help to give children
a clearer perspective of the task that they had embarked upon, whilst educating parents in strategies for supporting
practice would create closer links between parental and child expectations of learning. It is certainly the case that
further investigation is necessary in order to foster the ideals and hopes that children bring to learning, and to sustain
their enjoyment and development of their instrument, and of music.
References
Davidson, J. W., Howe, M. J. A. & Sloboda, J. A. (1997) 'Environmental factors in the development of musical
performance skill in the first twenty years of life', in Hargreaves & North (Eds), The Social Psychology of
Music (pp. 188-206). Oxford: Oxford University Press.
Eccles, J., Wigfield, A., Harold, R. D., Blumenfeld, P. (1993). Age and gender differences in children's self
and task perceptions during elementary school. Child Development, 64, 830-847.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In W. Damon (Series Ed.) and N.
Eisenberg (Ed.), Handbook of child psychology (5th ed., Vol. 3): Social, emotional and personality
development. (pp. 1017-1095). New York: Wiley.
Gruson, M. L. (1988) 'Rehearsal skill and musical competence: Does practice make perfect?', in Sloboda (Ed.)
Generative Processes in Music: The psychology of performance, improvisation and composition. Oxford:
Clarendon Press.
Hallam, S. (1997) 'Approaches to instrumental practice of experts and novices: Implications for education', in
Jørgensen & Lehmann (Eds), Does practice make perfect? Current theory and research on instrumental music
practice (pp. 89-107). Oslo: Norges Musikkhøgskole.
Hallam, S. (1998) Instrumental Teaching: A practical guide to better teaching and learning. Oxford:
Heinemann.
McPherson, G. E. (2000). Commitment and practice: Key ingredients for achievement during the early stages
of learning a musical instrument. Paper to be presented at the Eighteenth International Research Seminar of the
International Society for Music Education, Salt Lake City, July, 2000.
O'Neill, S. A. (1996) Factors influencing children's motivation and achievement during the first year of
instrumental music tuition. PhD thesis, University of Keele.
Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical analysis.
Developmental Review, 12, 265-310.

Wigfield, A. (1994). Expectancy-value theory of achievement motivation: A developmental perspective.

Educational Psychology Review, 6, 49-78.
Wigfield, A., Eccles, J. S., Suk Yoon, K., Harold, R. D., Arbreton, A. J. A., Freedman-Doan, C., &
Blumenfeld, P. C. (1997). Change in children's competence beliefs and subjective task values across the
elementary school years: A 3-year study. Journal of Educational Psychology, 89(3), 451-469.
Back to index

Frances H Rauscher
Is the mozart effect "debunked"?
Frances H. Rauscher
Background:
The finding of a significant difference in spatial-temporal task scores for college students who listened
to ten minutes of a Mozart sonata compared to relaxation instructions or silence has been challenged.
Although the effect has been replicated ten times in seven autonomous laboratories, three recent
reports by Steele and his colleagues have generated media accounts claiming that the "Mozart effect"
is "debunked."
Aims:
The goal is to review the literature on the Mozart effect, and to place the work in context.
Main contributions:
The Mozart effect has been investigated by neuroscientists, psychologists, and educators. The original
studies were motivated by a neural network model of higher brain function developed by Gordon
Shaw and his colleagues. Shaw's model predicted that specific music might excite cortical firing
patterns used for spatial-temporal task performance. Some psychologists attempting to replicate the
work, however, were largely unfamiliar with Shaw's model and its predictions, and mistakenly
employed a battery of inappropriate dependent measures. Relevant experimental considerations (i.e.,
practice effects, task difficulty, experimenter effects, task-to-task priming, and attention) were also
largely overlooked. Researchers using appropriate dependent measures and experimental designs have
largely succeeded in replicating the effect. The Mozart effect is further supported by research on the
effects of early music instruction on spatial-temporal task performance, studies with Alzheimer
patients, rats, epileptics, and studies using EEG and fMRI.
Implications:
The Mozart effect is not "debunked." It is based on sound neuroscientific principles and is supported
by psychological and educational research. Researchers should examine the converging evidence
before attempting replications.
Back to index
file:///g|/poster1/Rauscher.htm [18/07/2000 00:29:42]

THE INFLUENCE OF PARENTAL ATTITUDES AND SUPPORT ON CHILDREN'S ENGAGEMENT IN INSTRUMENTAL MUSIC
Proceedings paper
The Influence of Parental Attitudes and Support on Children's Engagement in Instrumental Music
Katherine J. Ryan and Susan A. O'Neill
Department of Psychology, Keele University
Introduction
Although there has only been limited research performed in the music domain investigating participation and motivation,
this area has received a growing amount of interest across other domains. During the last fifty years a substantial amount of
research has been performed examining the role of specific parenting practices and beliefs on children's motivation and
performance outcomes (for an overview see Eccles, Wigfield & Schiefele, 1998). The finding that parents' educational
expectations and beliefs affect their children's educational aspirations has been replicated on a variety of age groups,
nationalities, and races (Seginer, 1983). Researchers who have assessed parents' beliefs about their children's abilities have
found that parents are reasonably accurate at estimating their children's general abilities (Galper, Wigfield & Seefeldt
1997). Furthermore, parents' beliefs about their children's abilities have been found to influence children's performance in
school. Hess, Holloway, Dickson, and Price (1984) found that mothers' expectations for their children's academic
performance predicted their children's reading readiness scores. Other research has established a positive relationship
between parents' expectations for their children's achievement behaviours and children's actual behaviours (e.g. Crandall
1969; Winterbottom, 1958).
Eccles suggests that parents' have the biggest impact as conveyors of expectancies regarding their children's abilities. Eccles
[Parsons], Adler & Kaczala (1982) found that parents' beliefs about their children's maths ability had a stronger influence
on children's own maths ability beliefs than either parents' role modelling of different activities or the children's own grades
in school. These findings were repeated in a more recent study (Frome & Eccles, 1998) confirming earlier hypotheses that
parents act as expectancy socialisers for their children and that children's self-perceptions reflect children's and parents'
interpretation of reality in addition to reality itself. These findings highlight how influential parents' beliefs and expectations
are to their children's own beliefs. It also suggests that in the case of a voluntary activity, such as instrumental music, these
beliefs may have a considerable impact on the level of support provided.
Much of the related research in the music domain has focused on understanding and predicting the development of musical
achievement and performance skills. Research suggests that parents provide an important source of motivation, by
generating and sustaining children's interest and commitment to music lessons, both initially and in the long-term (e.g.
Davidson, Howe, Moore & Sloboda, 1996; O'Neill, 1994; Sloboda & Howe, 1991; Sosniak, 1985, 1990). The active
support provided by parents, such as attending lessons, obtaining feedback from teachers and supervising daily practice has
been identified as a major contributory factor in the development of musical performance skills in successful musicians
(Davidson et al, 1996). Many high-achieving musicians reported that without this level of active parental intervention they
would not have spent so many hours practising. Indeed, in a study investigating children's motivation for instrumental
music, Yoon (1997) suggests that the parents' level of involvement in their children's musical activities can influence
children's perceptions of their parents' valuing of music, and their motivation for participating in the activity. The majority
of past research in this area has tended to focus on key behaviours (such as supervision of practice) rather than the influence
parental attitudes and expectations may have on young people's involvement in music. Yet according to Csikzentmihalyi,
Rathunde, and Whalen, (1993), for young people to transform their potential into actuality, 'the support of family, in both
material and psychological terms, is essential They found that one of the key factors which separated 'talented' from
'average' young people in a variety of domains, including music, was that talented individuals tended to come from families
which provided both a stable environment where individuals felt a sense of support, whilst at the same time, members were
encouraged to develop their individuality by seeking out new challenges and opportunities. These findings are similar to
other research regarding the importance of parental influence in children's development across a variety of domains, such as
sport and maths (e.g. Eccles et al, 1998).
There is no doubt that parents play a significant role in children's motivation and participation in activities. Previous
research has demonstrated that the motivation to take up and persist with playing an instrument is inextricably linked to the
social and educational environment. However very little empirical work has been done to investigate the interrelations of
parental attitudes and support on children's engagement in instrumental music. The present study aims to examine these
influences and the types of support most likely to lead to successful outcomes and high levels of engagement.
file:///g|/poster1/Ryan.htm (1 of 7) [18/07/2000 00:29:44]

Method
Participants
The present study is linked to a longitudinal project investigating the social and motivational factors influencing young
people's participation and achievement in music. Parents of Year 6 children (aged 10-11years) were recruited through the
33 primary schools in North Staffordshire taking part in the larger project.
Procedure
The parents completed a questionnaire designed to assess the influence of their child-specific beliefs and their support on
children's engagement in instrumental music. All items were answered using 7 point Likert scales, except where categorical
responses were required, such as "Does your child play a musical instrument? yes or no". The questionnaires were taken
home and later returned to school by the children. Five hundred and six questionnaires were completed and returned for
collection.
Measures
Musical Engagement
The children's level of musical engagement was measured by two factors. Firstly, the child's level of participation for which
parents were asked whether their child currently played, or had ever given up, a musical instrument. Secondly, parents of
children who currently played an instrument were asked to indicate the number of hours their child spent playing an
instrument in an average week, e.g. 1 hour or less, 2-3 hours, etc. The responses to these questions were categorical. (See
Appendix A for full list of items).
Child-Specific Beliefs
The parents' child-specific beliefs about music were measured by asking parents to rate their child's 'competence' at, and
'value' of, instrumental music, along with perceived 'effort' and their own 'expectation' for the child over the next year. For
example, 'How good do you think your child is at instrumental music? with scale anchors of (1) not at all good to (7) very
good. All were single items, except for 'value' which had two items (liking and importance). These two items were
collapsed into a mean composite score for all analyses. (See Appendix B for scale and full anchor details ).
Parent-Specific Beliefs
The parents' beliefs about the support they offered was measured by asking parents to rate how much they felt their support
had improved their child's performance during the last year, and how confident they felt about their future support. For
example, 'How much do you think you have improved your child's performance in instrumental music during the last year?
with scale anchors of (1) very little to (7) a great deal. (See Appendix C for scale and full anchor details).
Level and Type of Support Provided
Parents of children who presently played an instrument were asked to rate the level of involvement and type of support they
provided for their child on a real time scale. The 'Support' scale began with the phrase "How often do you...." with 7
real-time scale anchors from (1) never to (7) everyday (30 minutes or more). A principal component analysis with varimax
rotation confirmed that there were three factors, accounting for 66% of the variance. The three factors were interpreted as
encouragement, (e.g. 'How often do you encourage your child to practise?') negotiation, (e.g. How often do you reward
your child for practising?') and active involvement (e.g. How often do you play an instrument with your child?'). Scale
scores were created from the mean composite scores for 'Encouragement' (6 items, α =.8775), 'Negotiation' (3 items, α
=.5739) and 'Active involvement' (2 items, α =.7231) and used in all further analyses. (See Appendix D for scale and full
anchor details)
Results
The results are presented in four sections. First, descriptives of the children's level of engagement are presented. Secondly,
the influence of parent's child-specific beliefs on their child's level of engagement are described, followed by the influence
of the parent's beliefs about their support. Finally, details of the level and type of support provided by parents and its
relation to children's level of engagement are presented.
Children's Level of Engagement
The children were assigned to a categorical cohort based on their parent's designation. The three cohorts are (1) Players
(children who presently play an instrument (n =267)), (2) Gave ups (children who had previously played an instrument but
given up (n=131)), and (3) Non-players (children who have never played an instrument (n=108)). These cohorts were used

in further analyses.
Parents' Child-specific Beliefs and Children's Level of Engagement
A series of one-way analyses of variance (ANOVA) identified significant differences between the parents' child-specific
beliefs for children in the different cohorts. A summary of the results, with means and standard deviations, are displayed
below in Table 1.
Table 1
Mean Scores and Standard Deviations for Parents' Child-Specific Beliefs and Cohort
Players Gave ups Non-players
Variable Mean SD Mean SD Mean SD F (df) p
Value 5.53 1.4 3.53 1.7 3.14 1.7 121.678 2 .0001
Competence 5.20 1.3 3.44 1.5 2.74 1.5 138.224 2 .0001
Effort 5.34 1.7 3.33 1.9 2.82 1.8 98.839 2 .0001
Expectation 5.47 1.4 3.53 1.6 3.21 1.6 106.296 2 .0001
Further post-hoc analyses of cohort identified that parents of children who currently played an instrument believed their
child to be significantly more competent than children who had given up or never played. Yet parents of children who had
given up still perceived their children to be more competent than children who have never played. Parents of children who
currently play also perceived their child as having a higher value for instrumental music, applying more effort and predicted
that they would do better in the following year than children who had given up or never played.
Parent-Specific Beliefs about Their Own Support and Children's Musical Participation
One-way analyses of variance (ANOVA) identified significant differences between the parents' beliefs about their support
for children in different cohorts. A summary of the results, with means and standard deviations, are displayed below in
Table 2.
Table 2
Mean Scores and Standard Deviations for Parent-Specific Beliefs of Support and Cohort
Players Gave ups Non-player
Variable Mean SD Mean SD Mean SD F (df) p
Current support 4.45 1.8 2.68 1.7 2.14 1.6 85.232 2 .0001
Future support 4.52 1.9 3.56 2.1 2.01 2.0 28.085 2 .0001
Further post-hoc analyses of cohort identified that parents of children who currently played an instrument believed they had
improved their child's performance at instrumental music significantly more than parents of children who had given up or
never played. They were also more confident in their ability to improve their child's performance in instrumental music
over the next year.
The Type and Level of Parental Support Provided and Children's Level of Engagement
To examine how the type of parental support provided influences children's level of engagement, correlations were
performed between the composite 'Support' scales and the number of hours children were reported to play during an
average week. 'Encouragement' support was found to be the most highly correlated to children's level of engagement in
playing an instrument ( r = .408, p<.0001). Correlations between the number of hours played and the two other types of
support were very low, with only one significant finding ('Active involvement' r = .140, p<.05 and 'Negotiation' r = .051,
p<.432). This suggests that the 'Encouragement' type of support will be more likely to lead to successful outcomes and
higher levels of engagement.

Discussion
The significant effect of cohort identified in parent's child-specific beliefs confirms the important role of parental beliefs
and expectations on children's level of engagement. Parents of children who currently play instruments believe their child is
significantly more competent than parents of children who have given up or never played. It is to be expected that the
child's current, or previous, level of participation will influence the parents' beliefs about ability. This was demonstrated by
the parents of children who have given up also perceiving their children to be more competent than parents of children who
have never played. However, it could be argued that parents who have high beliefs about their child's abilities will convey a
high value for music and therefore indirectly, as well as actively, encourage the child in playing an instrument, resulting in
them applying more effort and becoming more competent. This is in line with previous research in other domains which
found that parent's beliefs and values have a strong influence on children's own beliefs and competence. It is also supported
here by the findings that parents of children who currently play also perceive their child to have a higher value for
instrumental music, apply more effort and predict them to do better in the following year than children who have given up
or never played.
Parents of children who currently play an instrument also believe they have improved their child's performance at
instrumental music during the last year significantly more than parents of children who have given up or never played. Due
to their children being currently involved it is not surprising that parents of players should hold this belief more strongly
than other parents. Parents who are actively involved in supporting their children are more likely to believe that their
support has improved their child's ability than parents who do not have the chance, or choose not to be involved. Parents of
children who currently play are also more confident in their ability to improve their child's performance in instrumental
music over the next year. As previous research has found that parents' level of involvement is perceived as an indication of
their valuing of music, which influences their child's motivation for participating, it is likely that children with parents who
are confident about their ability to provide support and assistance will continue playing longer than those whose parents are
not involved or are less confident in their ability to provide support.
Upon examination of the types of support parents provide, 'Encouragement' was found to be the most highly correlated to
children's level of engagement in playing an instrument. This suggest that support involving encouragement will be more
likely to lead to successful outcomes and higher levels of engagement. 'Encouragement' support included behaviours such
as providing praise, helping the child find time to practice, listening to practice, and attending performances. This finding
suggests that general support, such as encouraging practice, promotes the highest levels of engagement, and is in line with
previous research on higher achieving musicians. As this style of support does not require any musical skills, such as
reading music or playing an instrument, it can be provided by any parent, suggesting that it is possible for all parents to
support their children in a manner which can lead to successful outcomes. The findings contribute to the development of
theory by increasing our understanding of the ways in which parental attitudes and support might influence children's
motivation and engagement during the early stages of learning to play a musical instrument and indicate a direction for
future research in this area.
References
Crandall, V.C. (1969). Sex differences in expectancy of intellectual and academic reinforcement. In C.P. Smith (Ed.).
Achievement-related behaviours in children. New York: Russell Sage.
Csikzentmihalyi, M., Rathunde, K., and Whalen, S. (1993). Talented teenagers: The roots of success and failure.
Cambridge: Cambridge University Press.
Davidson, J.W., Howe, M.J.A., Moore, D.G, and Sloboda, J.A. (1996). The role of parental influences in the development
of musical ability. British Journal of Developmental Psychology, 14, 399-412.
Eccles Parsons, J.A., Adler, T.F., and Kaczala, C.M. (1982). Socialization of achievement attitudes and beliefs: Parental
influences. Child Development, 53, 310-321.s
Eccles, J.A., Wigfield, A. and Schiefele, U. (1998). Motivation to succeed. In Damon, W. (Ed.). The Handbook of Child
Psychology, Vol. 3, 1017-1095.
Frome, P.M. and Eccles, J.A. (1998). Parents' influence on children's achievement-related perceptions. Journal of
Personality and Social Psychology, Vol. 74, 2, 435-452.
Galper, A., Wigfield, A., and Seefeldt, C. (1997). Head start parents' beliefs about their children's abilities, task values, and
performances on different activities. Child Development, Vol. 68, 5, 897-907.
Hess, R.D., Holloway, S.D., Dickson, W.P., and Price, G.L. (1984). Maternal variables as predictors of children's school
reading and later achievement in vocabulary and mathematics in sixth grade. Child Development, 59, 259-285.

O'Neill, S.A.(1994). Musical development: Aural. In A. Kemp (Ed.). Principles and processes of music teaching. Reading:
International Centre for Research in Music Education. pp. 1043.
Seginer, R. (1983). Parents' educational expectations and children's academic achievements: A literature review.
Merrill-Palmer Quarterly, Vol. 29, 1,1-23.
Sloboda, J.A. and Howe, M.J.A. (1991). Biographical precursors of musical excellence: An interview study. Psychology of
Music, 19, 3-21.
Sosniak, L.A.(1985). Learning to be a concert pianist. In B.S. Bloom (Ed.). Developing Talent in Young People. New York:
Ballentine.
Sosniak, L.A.(1990). The tortoise, the hare, and the development of talent. In M.J.A. Howe (Ed.). Encouraging the
Development of Exceptional Abilities and Talents. Leicester: The British Psychological Society.
Yoon, K.S. (1997). Exploring children's motivation for instrumental music. Paper presented at the biennial meeting of the
Society for Research in Child Development, Washington.
Winterbottom, M.R. (1958). The relation of need for achievement to learning experiences in independence and mastery. In
J.W. Atkinson (Ed.). Motives in fantasy, action, and society. Princeton, N.J.: Van Nostrant.
Back to index
Appendix A
Musical Engagement Items
Does your child play a musical instrument now? (please circle) YES NO
Has your child ever played a musical instrument in the past,

but gave up playing it? (please circle) YES NO
On average, how many hours each week does your child spend playing an instrument?

1 hour 2-3 4-6 7-10 11-15 16-20 21 or

or less hours hours hours hours hours more hours
Appendix B
Child-Specific Beliefs
Not at all good Very good
How good do you think your child is at 1 2 3 4 5 6 7

instrumental music?
Not at all well Very well
How well do you think your child will 1 2 3 4 5 6 7

do in instrumental music next year?
Not very much Very much
How much does your child like doing 1 2 3 4 5 6 7

instrumental music?
Not at all important Very important
How important is instrumental music 1 2 3 4 5 6 7

to your child?
Not very much A lot
How much effort does your child put 1 2 3 4 5 6 7

into instrumental music?
Appendix C
Parent-Specific Beliefs
Very little A great deal
How much do you think you have improved your child's 1 2 3 4 5 6 7

performance in instrumental music over the last year?
Very little A great deal
How confident do you feel in your ability to improve your 1 2 3 4 5 6 7

child's performance in instrumental music over the next
year?

Appendix D
Level and Type of Support Provided
How often do you........ Never A few A few Weekly A few Everyday Everyday
times a times a times a (less than (30 mins
year month week 30 mins) or more)
1) encourage your child to practise? 1 2 3 4 5 6 7

2) help your child find time to 1 2 3 4 5 6 7
practise?
3) reward your child for practising? 1 2 3 4 5 6 7

(e.g. give sweets, toys)
4) praise your child for practising? 1 2 3 4 5 6 7
5) discuss with your child the 2 3 4 5 6 7

possibility of lessons being
stopped if they do not practise? 1
6) discuss taking away privileges if 1 2 3 4 5 6 7

they do not practise?
7) listen to your child practise? 1 2 3 4 5 6 7
8) play an instrument with your 1 2 3 4 5 6 7

child?
9) help your child play an 1 2 3 4 5 6 7
instrument?
10) ask your child to play for visitors 1 2 3 4 5 6 7
to the home?
11) attend concerts your child takes 1 2 3 4 5 6 7

part in?
Key words: parents, attitudes, support, children, participation.

The social cost of expertise: auditioning orchestra musicians
Proceedings paper
The social cost of expertise: personality differences in string players and their implications for the audition
process and musical training
Daina Stepanauskas
Background
Not only the number of applicants for places in an orchestra (Rinderspacher, 2000), but also their standard of
competence is rising from year to year. This study will examine personality differences in string players and the
consequences for the audition process and musical training which arise from these. The violin professor H.
Schneeberger (Noltensmeier, 1997) comments: "Twenty to twenty-five years ago, we could quite happily advise
violin players to apply for positions as concertmasters, but today we should really sometimes be telling them that
they should be glad to find a place in one of the last seats of a good orchestra". As those musicians who excel in
solo performance tend to be selected for the orchestra, good preparation for solo audition playing is the most
important part of musical training (Griffing, 1994).
However, orchestra members criticise the fact that auditionees neglect to prepare the excerpts (extracts from
orchestral works, employed in the audition process) thoroughly. In the final round of auditioning, an alarming
discrepancy emerges: musicians with highly-trained abilities in solo playing - indeed, the very best players, who
make it through to the final round of auditioning - are startlingly unprepared for the practical demands of
orchestral playing. Such orchestral work may at the first glance appear less demanding than solo pieces; in
reality, however, the body of orchestral literature contains works of as great a difficulty and demanding of just as
much skill (albeit of a somewhat different kind ) as solo pieces.
The majority of successful auditionees fill tutti positions within the orchestra and are hence no longer required to
perform solos. Musicians experience this transition from an individualistic to a group role in very different ways.
Some are relieved at having successfully mastered the stressful audition period, whereas others have difficulties
integrating into their section and miss the challenge of solos. The award of a place in an orchestra is succeeded
by a one-year trial period, during which failure is more often due to such problems of integration than to
insufficient ability. Such cases of failure for these reasons demonstrate the importance of adaptation to the
orchestral group situation for successful and permanent integration into the orchestra.
Aims of the study
Popular belief among orchestra musicians has it that problems of integration and adaptation and indeed greater
individualism occur more evidently and frequently among violinists, than among other string players. I supposed
that such differences, if they exist, would surely be most evident between violinists and double bassists. The
reason for this supposition lies in the different roles these two instruments tend to take in orchestral music. The
parts double-bassists are required to play in art music are by no means solos, but rather supporting parts. The
instrument has a primarily accompanying function. A double-bassist will not generally have ambitions to be a
soloist. He or she is required to acclimatise him/herself to a supporting role within the group from the very
beginning. Are violinists, then, different from double-bassists? Or are particularly good musicians, regardless of
which instrument they play, more individualistic than their colleagues? Are there thus no differences between
players of each instrument? Or is it possibly the case that these differences are not to be found between either
instrument groups or different levels of musical competence, but that they rather occur equally distributed across
all instruments and competence levels? Each of these three possibilities has various arguments speaking for it.
But which one is in fact accurate? This question is the subject of my TICOM study (Test of Individualism and
Collectivism of Orchestra Musicians). The study examines personality traits which are closely connected to
individualism and collectivism. I here define individualism as a person's viewing liberty of self-development and
variety of life choices as a basic requirement for personal happiness and satisfaction. Collectivism is defined as
file:///g|/poster1/Stepanau.htm (1 of 6) [18/07/2000 00:29:46]

the sense of belonging to a group and requiring membership of this community as the basis for personal
happiness and satisfaction (Triandis, 1995; Triandis & Gelfand 1998). Individualism, then, would imply a greater
reluctance to work simply as a member of the group, here: the orchestra, and a pursuit of one's own personal
development.
Method
The final sample of the TICOM study, 121 music students from 12 German music academies, took part in the
study in the winter semester 1998/99 and the summer semester 1999, their participation taking the form of the
completion of a questionnaire. The students were divided into groups according to whether their main instrument
was violin or double-bass, and according to level of competence into two further groups, the "best" and the
"good" group. The criterion for the "best" group was that a student had in the last three years, with their present
main instrument, taken part in at least one national or international competition, in which s/he had successfully
reached at least the second round. This resulted in four groups: best violinists, good violinists, best
double-bassists, and good double-bassists. The questionnaire consisted of particular scales based on the German
version of the Personality Research Form (PRF), one of the most commonly used personality questionnaires;
scales from the ITAM (International Test of Achievement Motivation), a newly developed personality
questionnaire; and scales based on Instrument 1 (I1) Triandis, 1995. These scales measure particular personality
traits (see results). Each of these scales, contained various items, i.e. statements the participant is required to rate
on a seven-point Likert scale of agreement, running from 1 (strongly disagree) to 7 (strongly agree). Some details
of the data taken into the final analysis: I finally examined data from 4 categories: best violinists, good violinists,
best double-bassists and good double-bassists. Subjects were grouped by two factors, each factor measuring two
levels: the factors "instrument" (violin/double-bass) and "competence" (best/good). For each of the scales (such
as affiliation, dominance, etc), a separate two-way AOV was carried out. The variance analysis posed the
following questions for each scale (dependent variable):
1. Are there significant differences in the mean scores for the personality traits between violinists and
double-bassists? (effect of factor "instrument")
2. Are there significant differences in the mean scores in the personality traits between the good and the best
students? (effect of factor "competence")
3. Finally, are there interaction between the two main effects of instrument and competence?
Results
An effect of competence appeared in every trait where significant differences were discernible (see below). The
"best" students demonstrate higher mean scores in every trait except affiliation, in which they had lower mean
scores. Affiliation bears an inverse relation to the other traits studied; it implies a seeking out of the company of
other people and an ability to co-operate well with them, while the other traits measured imply higher
individualism and the assumption or acceptance of a competitive or hierarchical relation.
Traits, where significant differences were discernible:
Affiliation (PRF): high scores in this trait indicate a tendency to seek the company of other people.
Dominance (PRF): high scores indicate the tendency to aim to reach the top of a given hierarchy or occupy a leading
position in the group.
Exhibition (PRF): high scores indicate that the person enjoys being in the lime-light, receiving favourable attention,
etc.
Status orientation (ITAM): a sense or consciousness of, consequently self-orientation according to, status.
Competitiveness (ITAM): a need to compare oneself with, and the desire to achieve more than, others.
Vertical individualism (I1): emphasis on free choices and liberty of development, differing from horizontal individualism in
that vertical individualism implies an acceptance of hierarchical relations between individuals.

The analysed results were as follows: affiliation (p<0.05), dominance (p<0.01), exhibition (p<0.01), status
orientation (p<0.01), competitiveness (p<0.05) and vertical individualism (p<0.05). However, a main effect of
instrument occurred in only two traits: dominance (p<0.05) and exhibition (p<0.05). The third effect examined,
the effect of interaction between the two main effects of competence and instrument, likewise appeared in only
two traits, namely dominance (p<0.05) and status orientation (p<0.05). Interpreting the results one should be
aware that there is no one-to-one matching of statistical findings and substantial effects. To illustrate this we shall
take as an example the mean values for dominance. The F-Test yields significant values for the instrument, the
competence, and the interaction term. Examination of the mean values, however, gives us a different impression
(see figure 1). Further analysis by the Newman Keulls procedure suggests that a contrast appears between best
violinists, and the remaining three groups which seem to be homogeneous.
Figure 1 Data are dominance means for 121 instrumentalists,

violinists and double-bassists, in two ability groups.
The associated contrast vector is c=(3,-1,-1,-1), where the co-ordinates correspond to the groups in the order
given in the bar chart. It is plausible that this contrast is reflected by effects in all subtests of the two-way AOV
since their corresponding contrast vectors are (1,1,-1,-1,) - instrument, (1,-1,1,-1) - competence, and (1,-1,-1,1) -
interaction, and they correlate positively with the vector c. At the first glance, the significant result in the
instrument contrast seems to support the general belief among music students that violinists are less co-operative
than others. Obviously, this observation is made only in the best group, and subsequently attributed to all violin
players. But the group of good violinists has the lowest mean score, hence the instrument contrast is not
interpretable as main effect. The question arises as to the origin of the difference. If it is due to the competence
level the same effect should be observed in the double-bassists, but there the difference is insignificant. Male and
female violinists show similar patterns in this scale, so the different proportions of the genders in the two
instrument groups can not be made responsible for our finding. My interpretative approach to this question was to
examine the age at which the musicians began to play their instrument. The violinists began at an average age of
6 years with a variance of 1.3 years. The earliest starting age in my sample was 3, the latest was 9. The
double-bassists began at an average age of 10 years with a variance of 4.2, a variance, we will observe, much
larger than that for the violinists. The earliest starting age for the double-bassists in my sample was 3, the latest
20. It is important to note that these figures do not refer to the double-bass itself, but to the first instrument they

played. This first instrument could be anything at all. The double-bassists in my sample played instruments
ranging from the drum to the trombone, flute, violin, saxophone. Many of the girls played the harp. The average
age at which they started to play the double-bass itself was 16 years. The variance of 2.5 for the figure "starting
the double-bass" is small in comparison to that for "starting to play an instrument", but still much larger than that
for the violinists. The minimum and maximum age exhibited quite a large gap: the youngest was 11 years old, the
oldest 30. Ericsson et. al (1991, 1993) have established that the "best" violinists differ from the "good" violinists
in that they do a much larger amount of practice during early adolescence. The role of teachers, parents, peers in
encouraging them in these activities is a very significant one (Davidson et. al, 1996; Sloboda 1997; Sloboda &
Howe, 1991). If progression in expertise brings praise from such people, it in turn increases self-confidence and
possibly also the level of dedication, leading to a form of "virtuous circle". During this long period such
development likewise goes hand in hand with increased levels of dominance, tendency to exhibitionism,
consciousness of status, competitiveness, etc. The violinists had already been undergoing this development fur a
substantial period in early adolescence, a very sensitive phase of personality development, before the
double-bassists had even started to play their "main instrument".
Conclusion
The TICOM study established that the best violin students (those who had the best chances of gaining a place in
an orchestra later on) exhibited differences from "good" violin students in personality traits which are closely
connected to greater degrees of individualism. These differences, however, are at odds with the requirements of
the profession of orchestra musician.
This expertise effect on personality, along with the heavy emphasis on solo work at the music academy, can lead
to many an orchestra violinist coming down to earth with a bump once successfully past the audition stage and
hence past the point at which success is measured almost exclusively by solo ability. Carl Flesch (1929), the
doyen of modern violin pedagogy, comments as follows on the rude awakening of the orchestra violinist: "Nearly
every orchestra violinist, once upon a time, has dreamed of becoming a celebrated soloist. [...] An orchestra
violinist of this type, therefore, will and must always be discontented with his lot."
The dilemma for the training of professional musicians, then, is: how to devote adequate attention, within violin
lessons, both to perfecting musical / technical skills on the instrument and to the demands of the students' later
working life? Studying orchestral playing as a subsidiary subject within the academic music course cannot
replace the systematic promotion of difficult orchestral works by the violin professor in a one-to-one teaching
situation. There exists the need to arouse interest among music students for orchestral as well as solo repertoire,
as it is orchestral literature with which most of them will primarily occupy themselves in the course of their work
in the orchestra. A further problem facing musical training is the common belief or prejudice in music academies
that only solo works, such as violin concertos, are suited to provide the musical and technical training
professional musicians require. In fact, orchestral repertoire contains numerous works more than suitable for the
training of such technical and musical skills; one example of such a suitable work would be Strauss'
"Rosenkavalier". We observe within the training of musicians, then, a form of "two-tier system", the two
"classes" being soloists and orchestra musicians. The tendency is to view orchestra musicians as "the less
competent soloists", whereas in fact these two areas represent two different tasks which require somewhat
different skills. A potent illustration of this prejudice is provided by Flesch's defensive acknowledgement of these
difficulties: "Although the frequent incidence of teachers neglecting to disabuse the student of his illusions is
from an ethical point of view not altogether above criticism, we cannot, at least not from a practical standpoint,
reproach teachers with this fact, as they are, after all, principally charged with the task of providing our orchestras
with suitable new talent." We see, then, that Flesch, while he is aware of the ethical problems involved in the
training by solo pieces of the musician almost certainly destined for orchestral work, is himself of the opinion
that the only "proper" way to train violinists is by the use of solo works. This conservative mode of thought
regarding musical training is certainly a difficulty to be overcome if these problems are to be solved. However, it
would constitute an injustice to place all the blame at the door of the music academies; the orchestras themselves
appear to continue to serve this prejudice, basing their auditions on the playing of solo pieces. In this sense,
music academies and other musical training institutions are simply carrying out their task - preparing students for
the demands of the application and audition process, i.e. enabling them to have a realistic chance of a place in the
orchestra - in precisely the correct manner. The onus, then, would be on the orchestras to change their practices.

The training could only then represent a better preparation for the actual profession of orchestra musician, in
which most students who go through it will eventually earn their living (as opposed to being soloists), if the
orchestras' requirements and practices at audition were not so at odds with the reality of working life for most
musicians who enter this profession.
The study has demonstrated that the practice of searching for the conventionally "best" violinists, relying on
audition methods which concentrate on solo playing and to the exclusion of all other skills, naturally increases
the likelihood that those players eventually selected for the orchestra will, similar to the "best violinists" group in
our study, be those who tend to have difficulties in adapting to a group situation and be, to refer again to Flesch,
"discontented with their lot". Nevertheless, the study works with average values. This means that within this
"best violinists" group there will be violinists who have more and less difficulty in adapting to a group, i.e. more
"individualist" and more "collectivist" types: that is to say, there do exist violinists who combine very high levels
of musical and technical competence with skills of social competence in group conditions. Thus it would make
sense to introduce a systematic element into the audition process that is capable of selecting those with both the
highest level of competence and the greatest ability to adapt to groups. My suggestion, therefore, is to include in
the audition process a test of ability to integrate speedily into a group. One possible way of testing such ability,
which would not at all be problematic or difficult to organise, would be to organise a quartet from the orchestra,
remove the second violinist, and replace him/her with the applicant, then presenting the applicant with a piece of
music with which s/he is unfamiliar. This procedure would test both sight-reading ability and competence in
adaptation to the group, which are precisely the abilities required in the everyday working life of an orchestra
member. Old habits die hard: conservative modes of thought which suggest the solo work is the only adequate
means of training a future orchestra member, or testing the orchestra applicant on his/her competence at audition,
are clearly still very powerful within the musical world in Europe. On the other hand the Metropolitan Opera in
New York has already effected radical changes in the audition process. Aware that is it not necessarily "soloist"
types who are most fitted for orchestral work, it now completely leaves solo works out of the audition procedure,
instead preferring to test applicants on orchestral excerpts at each stage of the audition. Should a musician not
show sufficient interest in such excerpts to prepare them properly for audition, s/he finds him/herself failing the
audition in the first round.
If even the majority of orchestras were to take up a similar practice, musical training would be freed from the
necessity of preparing students for an audition solely concentrating on solo works, and would consequently be
able to provide the students with a training more closely resembling the reality of their future orchestra career.
Likewise, the procedure I am suggesting would bring the requirements of the audition closer to the demands of
the real work situation, which would in turn have a positive effect on musical training. However, music
academies cannot be merely passive recipients of "knock-on" effects from changes in the orchestras. They have
the potential and the duty to arouse students' interest in orchestral works, which they could achieve by widening
the training repertoire beyond solo concertos. A concluding remark on musical training's paradoxical and
somewhat bizarre attitude towards orchestra playing, which demonstrates beyond doubt how much change in
attitudes continues to be necessary: Particularly good students are excused from having to take part in the
orchestra - presumably to allow them more time to perfect their solo skills.
__________________________________________________________________________
Davidson, J. W., Howe, M. J. A., Moore, D. G., & Sloboda, J. A. (1996) The role of parental influences in the development of
musical ability. British Journal of Developmental Psychology, 14, 399-412.
Ericson, K.A., Krampe, R., & Tesch-Römer, C. (1991). The role of deliberate practice in the acquisition of expert performance.
(Technical Report 91-06). University of Colorado at Boulder, CO 80309: Institut of Cognitve Sciences.
Ericson, K.A., Krampe, R., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance.
Psychological Review, 100, 363-406
Flesch, C. (1929) Die Kunst des Violinspiels. Berlin: Ries & Erler.
Griffing, J. (1994) Audition procedures and advice from concertmasters of American orchestras. Ohio State University, UMI ON:
9427653
Noltensmeier, R. (1997) Grosse Geigenpädagogen im Interview. Kiel: Götzelmann.

Rinderspacher, A. (2000) Zum Thema Nachwuchs. Orchester, 4, 10-15
Sloboda, J. A. (1997) Begabung und Hochbegabung. In H. Bruhn, R. Oerter & H. Rosing (Eds.)
Musikpsychologie: Ein Handbuch. (3. Aufl.) Reinbek: Rowohlt. 565-578
Sloboda, J. A. & Howe, M.J. A. (1991). Biographical precursors of musical excellence: An interview study. Psychology of Music,
19, 3-21.
Triandis, H.C. (1995) Individualism & Collectivism. Boulder, CO: Westview Press.
Triandis, H.C. & Gelfand, M.J. (1998) Converging Measurement of Horizontal and Vertical Individualism and Collectivism. Journal
of Personality and Social Psychology, 74, 118-128
Back to index

Proceedings paper
Rhythmic organization in young children's vocal performances

Katherine Strand
Northwestern University
k-strand@nwu.edu
There are two theories about the way we perceive and represent rhythm in music. One theory identifies an "internal clock", activated at
the onset of musical sounds. Grouping is created when note events map onto a pulse-grid. (Clynes & Walker, 1982; Drake & Gerard,
[989; Handel & Lawson, 1983; Pouthais, 1996; Pouthais, Provasi, & Droit, 1996; Povel, 1981; Povel & Essens, 1985). A second theory
states that we organize durations into figural groups, based on the structural aspects of the durations themselves, possibly unrelated to an
underlying pulse. Boundaries between groups are formed by elongated note events, changes in loudness, texture, melodic direction, or
by the repetition of a pattern. (Bamberger, 1978, 1991; Deutsch, 1980, 1982; Lerdahl & Jackendoff. 1983; Upitis, 1987)
Studies of pre-natal and infant children point toward a natural tendency to organize sounds around an internal clock mechanism.
Pouthais (Pouthais et al., 1996) explored biobehavioral rhythms in infants and found several behaviors that are performed on a steady
pulse. Infants, furthermore, could manipulate these behaviors, altering the tempo of performance without losing consistency, in order to
control their environments. Other studies record rhythmic behavior and young children's to move in synchronicity to music for short
periods of time. (For a discussion of such studies, see Pouthais, 1996)
However, young children have not demonstrated an ability to sustain synchronized movement with music, or to perform a steady beat
for any length of time, until they are between the ages of three- and five- years. (Davidson, McKernon, & Gardner, 1977; Dowling &
I-Harwood, 1986; Frega, 1979; Jones, 1976; Moog, 1976; Perney, 1976; Rainbow & Owen, 1979; Serafine, 1979; Upitis, 1987) It is
possible that this inability could be attributed to undeveloped gross motor coordination, or it may be possible that there is a
developmental trend in the induction of an internal clock in the performance of musical rhythm.
Bamberger (1978, 1991) and Upitis( 1987) suggested that there is a developmental trend in young children, from grouping through
rhythmic figure to figural grouping plus internal clock (formal) grouping. Evidence of discrimination between different rhythmic figures
is evident in many studies, even when tempo and pitch are altered. (Chang & Trehub, 1977; Clifton, 1974; Donohue & Berg, 1991;
Stamps, 1977) Bamberger's explorations of graphic representation (1978, 1991) and Upitis's examinations of rhythmic development
(1987) indicate the presence of grouping by structural boundaries in younger children, moving to grouping by metric characteristics in
older children. Their findings invite investigation into other musical behaviors.
The primary goal of this pilot study was to develop a tool to examine the vocal performances of young children to determine whether
Bamberger and Upitis's findings are supported through performance behaviors. The secondary goal of the research was to begin to
explore the characteristics of the performances for indications of other tendencies that may help to categorize the representation of
rhythm by young children. Three research questions guided the investigation:
1) Is there any evidence of a steady pulse in pre-school children's vocal performances?
2) Is there evidence of organization into rhythmic phrases?
3) Can evidence for either type of rhythmic behavior be categorized for further investigation?
Method
A pilot test in vocal performance was developed for three-to five-year-old children. Vocal performance was chosen because preschool
children seem to be able to sing before they are capable of performance tasks which require gross motor coordination. (Drake, Dowling,
& Palmer, 1991; Hargreaves, 1986; McDonald & Simons, 1989; Webster & Schlendrich, 1982)
A stimulus tune was constructed to allow for grouping by figure or by internal clock. A two- phrase tune with the longest duration
between similar phrases was judged to allow for examination of timing in performance. Note values within sub-phrases had a 2:1
proportion, (quarter note = 563ms, eighth note = 281ms), the end of sub-phrases with a 3:1 ratio, and finally a 4:1 ratio between the
phrases. (end of the sub-phrase =818ms, end of phrase = 1094ms) If it was true that young children organize rhythmic information
figurally, then they would accurately recreate the ratios within the phrases, (Fraisse, 1982) but would produce variable results between
phrases.
file:///g|/poster1/Strand.htm (1 of 6) [18/07/2000 00:29:55]

Children who map the tune onto a steady pulse would recreate the tune, holding the phrase breaks at a constant tempo. In addition, pitch
was altered every beat to help induce both internal clock and perceptual grouping. (Deutsch, 1982)
The stimulus was recorded on a MIDI device to ensure an accurate metric performance. All note-on characteristics were equalized to
remove loudness accents. The melody was recorded at 110mm. The range of the melody was constructed to fall within a comfortable
range for children as well.
The participants were three- to five-year-old children in a daycare center in a suburban town. The children received weekly musical
training from the researcher for several weeks prior to the study. The purpose for instruction prior to testing was three-fold: it allowed
access to the site, it allowed for concept training prior to testing, and allowed them to become familiar with the researcher, alleviating
possible confounds from fear of testing.
The stimulus tune was introduced several weeks prior to the test. On the day of the study, each child listened to the stimulus tune several
times until he/she could reproduce it vocally. When they felt ready to sing without the tape recording, they sang the tune on a neutral
syllable (koo). A set of four performance tasks and one cross-modal task were requested. First, children were asked to sing the learned
tune twice, and then once fast, and once slower. This was done in order to explore the possibility of holding note relationships constant
at different tempi. Finally, the children were asked to create a graphic representation of the stimulus tune, using paper and crayons to
"create something that would help them remember the tune, for the next time they would want to sing it". The children were asked to
explain their drawings as they created them or to demonstrate the meanings of the drawings to the researcher. All of the performances
and explanations were tape recorded.
Analysis
All examples of children's singing were transferred from to tape recording into digital analogue form. The files were analyzed for IOI or
inter-onset-intervals, to capture the length of each performed note event. Durations were calculated by marking the start-time of each
tone. Reliability of decisions regarding the placement of IOIs by millisecond was tested by asking a second judge to analyze random
examples of performances. Inter-judge reliability between the first and second judge was correlated at .99.
The overall length of each performance was recorded as the measure for performance tempo. From the performance tempo, mean values
for each of the note values was calculated. These values were subtracted from the actual performance values, then divided by the total
performance length to create a profile of when the performance deviated from a strict metric performance.
The graphic representations of the tunes drawn by the children were examined for similarities to the developmental sequence proposed
by Bamberger (1991) and expanded by Upitis in Can I Play You my Song (1992). Explanations of the children while they were drawing
were considered, and the drawings themselves were categorized according to
Results
Several children did not successfully reproduce the entire tune. A count of successful performance revealed that out of the 117 total
performances by the children, 52 (44%) were completed successfully. Of the incomplete performances, 17(15%) but left out the first
eighth-note of the second sub-phrases, 24 (12%) completed only the first phrase, and 14(12%) were of the first sub-phrase - repeated two
or more times. 10 of the performances (8%) had no relation to the stimulus tune.
An examination of the incomplete performances indicated that the children who did not complete the tune successfully were less likely
to perform within 5% of the overall tempo. This indicates that there may be a correlation between correct performance and the activation
of an internal clock, as proposed by Povel and Essens (1985).

Tempo curves of the completed performances indicated that the performers, regardless of tempo, varied more from the steady pulse of
the performance at the endings of the sub-phrases and phrases. Averages of the completed performances for the four tasks provided
similar results.
The overall tempo was held fairly constant within phrases, but there is a noticeable acceleration at the first phrase ending, as well as
lesser accelerations at the sub-phrase endings. A deceleration to the ends of the phrases called for further analysis.
A second analysis of performance time against a metric performance was initiated once the tempo graphs provided indications of
acceleration between phrases and rubato toward the ends of phrases. Timing charts were constructed by comparing individual note
events of each performance against their accurate (steady pulse) counterparts for each performance tempo, but this time as a function of
the accurate note events instead of the overall tempo. The timing charts created in this way allowed for a more thorough comparison of
the child's performance to one with a steady pulse.
As expected, the timing charts indicate that performances were more internally inconsistent than indicated by the tempo graphs. Timing
of the sub-phrases was clearly more accurate than phrase endings, and it also became clear that the eighth notes in each phrase become
progressively longer to the ends of each phrase. Individual performance timing charts revealed a tendency to overcompensate for

elongated note-events by shortening the next note event, as seen in chart 3.
Graphic representations, along with the tape recordings of the children's descriptions provided information to link this study to previous
explorations of cross modal mapping by Bamberger and Upitis. Most of the children drew pictures related to episodic memory of the
testing event - pictures of the testing room, the experimenter, or the tape recorders, as did the youngest children in Upitis's studies. Some
children drew pictures relating to the syllable they were asked to sing with, thus several of them drew coo-coo clocks, birds, or the
words "coo-coo". Some children seemed to provide a global picture of the tune - a continuous line which the traced with their fingers as
they sang the tune. One child created a picture of two identical coo-coo clocks, stating that there are two parts, and that they were the
same. This indicated a sense of structure as well as perception of the tune as a gestalt.

Chart 3: timing charts - two children's performances
Discussion
The analyses of inaccurate performances and of the tempo curves of accurate performances both provided evidence for figural grouping.
In 77% of all complete performances the largest deviations from steady pulse occurred at phrase breaks, while within the phrases the
ratio of 2:1 was held constant. Of the incomplete performances, 80% were repetitions of the first rhythmic figure or ended at the
mid-point of the tune. It is possible that some of these children experienced a primacy effect, or were unable to recreate the correct
temporal order of the entire tune. The repetitive nature of the stimulus was potentially biasing, and could have caused some of the
inaccuracies.
Both the tempo curves and the timing charts provided evidence that some of the children utilized an internal pulse in their performances.
Performers who were able to recreate the complete tune were more likely to perform with less than 5% deviation from the overall tempo
throughout the performance (roughly 150-200ms). The increased accuracy indicates that there may be a relationship between
representing temporal order and an internal clock mechanism. Additionally, the timing charts suggested a tendency to compensate for
deviations from the performance tempo by overcorrecting on the next note.
The tendency for performers to employ rubato at the end of phrases introduces the question of the representation of musical structure in
young children's minds. Repeated eighth notes at the ends of phrases may have been the cause of gradual slowing, or the performers may
have anticipated the overall structure as they performed, employing expressive musical gestures common to experienced performers.
These results invite further study with varying stimuli to clarify the performance tendencies. The richness of the information collected
supports continued refinement of the data collection process and categorization of rhythmic performance behaviors.
The pilot investigation was successful in providing information about children's musical performance behaviors that may indicate the
induction of two types of representations to organize rhythmic durations. The results support the work of Bamberger (1978) and Upitis
(1987) in demonstrating performance that point toward the representation of figural grouping. The results of this study indicate that
figural representations appear in the performance behaviors of young children before representations based on an internal clock. Success
of performance for these children was related to performance with a steady pulse, suggesting that the induction of an internal clock was
linked to the success of reproduction.
The use of a graphic representation task provided evidence that these children were similar to others their age in other studies. It is
possible that the verbal directions may have hindered the children more than provided direction, and that the novelty of drawing was
more enticing than the task itself. A more specific direction, such as "draw a picture of the music you just heard", instead of "draw a
picture to help you remember the tune you just sang" may be more appropriate for this age group. It is also suggested that the children be
given training and practice before testing.
Finally, the success rate of children in the performance of the tasks, while providing some interesting information about the correlation
between induction of an internal clock and success level, was less than optimal for collecting information about performance behaviors
of this age group. Additional rehearsal time may be needed to help the children learn stimulus tunes. Presentation of the stimulus vocally
may also be more successful for teaching the children the stimulus tunes (Sims, Moore, & Kuhn, 1982).
The representation of children's vocal performances in relation to metrically accurate performances proved a successful means by which
to capture some information about the nature of rhythmic representation. Further testing will provide several stimuli of varying durations
to explore the relationship between the internal clock and figural representations as a function of vocal performance. In addition, various
grouping boundaries, such as melodic direction, accents, and alterations in note length, can be explored to discover more about the
processing of musical information.

References
Bamberger,J. (1991). The mind behind the musical ear: How children develop musical intelligence. Cambridge,
Massachusetts: Harvard University Press.
Chang, H. W., & Trehub, S. E. (1977). Infants' perception of temporal grouping in auditory patterns. Child Development,
48(4), 1666-1670.
Clifton, R. K. (1974). Heart rate conditioning in the newborn infant. Journal of Experimentai Child Psychology.; 18, 9-2 1.
Clynes, M., & Walker, J. (1982). Neurobiologic functions of rhythm, time, and pulse in music. In M. Clynes (Ed.), Music,
mind and brain (pp. 171-216). New York: Plenum Press.
Davidson, L., McKernon, P., & Gardner, H. (1977). The acquisition of song: A developmental approach. Paper presented at
the Documentary Report of the Ann Arbor Symposium, Reston, VA.
Desain, P., Honig. H., & Deutsch, D. (1993). Tempo curves considered harmful. In J. D. Kramer (Ed.), Time in
contemporary musical thought: The processing of structured and unstructured tonal sequences (Vol. 7, pp. 123-138).
Deutsch, D. (1980). The processing of structured and unstructured tonal sequences. Perception and Psychophysics, 28(5),
381-389.
Deutsch, D. (1982). Grouping mechanisms in music. In D. Deutsch (Ed.), The psychology of music (pp. 99-134). New
York: Academic Press.
Donohue, R. L., & Berg, W. K. (1991). Infant heart rate responses to temporally predictable and unpredictable events.
Developmental Psychology 27, 5 9-66.
Dowling, W. J., & Harwood, D. L. (1986). Music cognition. Orlando: Academic press.
Drake, C., Dowling, W. J., & Palmer, C. (1991). Accent structures in the reproduction of simple tunes by children and adult
pianists. Music Perception, 8(3), 315-334.
Drake, C., & Gerard, C. (1989). A psychological pulse train: How young children use this cognitive framework to structure
simple patterns. Psychological Research, 51(16), 16-22.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psvchology of music (pp. 149-180). New York: Academic
Press.
Frega, A. L. (1979). Rhythmic tasks with 3-, 4-, and 5-year-old children: A study made in Argentine Republic. Council for
Handel, S., & Lawson, G. R. (1983). The contextual nature of rhythmic interpretation. Perception and Psychophysics, 30,
1-9.
Hargreaves, D. J. (1986). The developmental psychology of music. Cambridge: University Press.
Jones, R. L. (1976). The development of the child's conception of meter in music. Journal of Research in Music Education,
24(3), 142-153.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, Massachusetts: MIT Press.
McDonald, D. T., & Simons, G. M. (1989). Musical growth and development. New York: Schirmer Books.
Back to index

Measurement of Creative Potential in Music:
Proceedings paper
Broadening the Concept of Music Aptitude: New Approaches to Theory and Practice
Peter Webster
School of Music
Northwestern University
I am quite interested in the study of children's musical thinking, especially the kind of thinking that is generative (Sloboda,
1988). I believe that, as researchers, we should study both the natural occurrence of such thinking and its development. Because
of this, I am naturally interested in music aptitude as a focus of study.
In the first portion of the paper, I argue for a broadening of the traditional view of music aptitude and its assessment. Next, I offer
examples of research that show promise for such expansion; this work is experimental and awaits substantial, sustained study. I
place special emphasis on creative thinking in music as an important avenue for extending our view of music aptitude. Finally,
the paper concludes with some speculation about music technology's role in expanding our conception of how aptitude might be
assessed procedurally.
Expanding the Traditional View of Music Aptitude

For many, "music aptitude" is a term that refers to learning potential in music, especially for developing musical skills. (Boyle,
1992, p. 249). Most believe that these abilities are a combination of innate abilities that have come from genetic coding and the
interaction of this genetic coding with the constructed world of the individual. This interaction becomes dominated by
environment as the individual gets older and the notion of aptitude begins to give way to learned experience.
There has been a long and distinguished history in developing discrimination and preference tasks to assess music aptitude and
this has resulted in published measures by Seashore, Drake, Gaston, Wing and-most recently-Gordon.
The content of these efforts has been documented and reviewed in many sources (Lehman, 1968; Colwell, 1970; George, 1980;
Shuter-Dyson and Gabriel, 1981; and Webster, 1988.) Discrimination tasks center around paired comparisons of pitch, loudness,
duration, and timbre. More recent work has used comparisons of pitch and rhythm patterns. Preference tasks have asked subjects
to decide which of two excerpts is better in terms of its phrasing, rhythm, harmony, balance, or style.
Most experts agree that the more recent achievements in this tradition, particularly the measures by Gordon (Music Aptitude
Profile, Primary Measures of Music Auditaion, Intermediate Measures of Music Audiation) represent the most important and
meaningful work. Of special importance in Gordon's theoretical structure is a concept of inner hearing that he terms "audiation."
(Gordon, 1986.) He builds an elaborate and complicated theory around this construct and, in turn, advocates the use of specific
tonal and rhythmic patterns as part of a pedagogy. He also advances the notion that music aptitude is dynamic and does not
become stable until the age of nine years and that music experiences prior to that year can significantly increase aptitude.
This important work by Gordon is a major contribution to our study of music aptitude and takes the early work by Seashore and
other pioneers to new levels of sophistication. But our understanding of music aptitude must not rest only on this line of study.
There is evidence from research that brings into question the validity of discrimination and preference tasks as the only
appropriate measures of music aptitude (Rainbow, 1965; Hedden, 1982; Karma, 1982, 1983). Predictors such as music and
academic achievement appear to explain significant amounts of variance in music success in addition to scores on these
traditional measures of music aptitude. This suggests that there is a continued need to expand and refine the techniques of music
aptitude assessment. Even the nature of such tasks themselves suggests a need for continued study. Boyle states this well:
file:///g|/poster1/Webster.htm (1 of 8) [18/07/2000 00:30:00]

When music aptitude is considered an attribute that people possess in varying degrees, it has strong overtones
of being more "mentalistic" than "behavorial." A characteristic that music ability tends to suggest. Perhaps
this is because most music aptitude tests primarily involve discrimination tasks, which rely on perception and
cognition of differences between pairs of musical stimuli. (Boyle, 1992, p. 249)
Perhaps the answer lies in a refinement of our understanding of music aptitude. For me, music aptitude represents a set of
constructs that relate not only to the ability to "audiate" tonal and rhythmic patterns and to make simple preference choices but
also the ability to think with and to manipulate larger musical wholes. I am also convinced that such a construct set must also
account for the ability of the individual to manipulate expressive elements of music. At the core of these notions is that success in
music can best be predicted if we design musical tasks that tap a deeper level of "thinking in sound" than the present
discrimination and preference tasks. The full story of what music aptitude is cannot be fully understood until we know more
about both the traditional convergent tasks as well as the more divergent, problem-solving tasks that require more holistic mental
processing. This combination of skills is what our art is all about.
I believe that Seashore knew this. In his writings about the factors that comprised music aptitude, he indicates this clearly, along
with other comments about music memory, imagination, emotional reaction and expressivity. (Seashore, 1919, pp. 211-235) His
Seashore Measures of Musical Talents (Seashore, 1960), first developed in 1919, did not reflect these beliefs in part because of
the dominance of sensation as a major approach to psychological measurement at the time (Humphreys, 1993). He was taken to
task by Mursell (1937) within the context of the famous "omnibus" vs. "theory of specifics" debate, but this was less about the
nature of the exact constructs and more about (1) whether there was a general factor of music ability versus a constellation of
factors and (2) if it was wise to isolate factors from musical context. Mursell did not suggest an alternative to music aptitude tests
of specific skills, but his criticisms did raise interesting questions about how we should think about identifying potential. In the
years that followed this debate, we have had refinements of Seashore's work with more context-sensitive test items, but no real
fundamental change in the basic approach of discrimination and preference of relatively simple tasks.
Other Factors That Merit Attention

Interest in the study of music aptitude has dropped off considerably in recent years. Perhaps this is because much of the
psychometric field has shifted primarily to achievement testing. Much of the interest in standardized testing has been dropped in
favor of "authentic" evaluation strategies that involve situated assessment such as portfolios. Yet another reason may be that
music educators have been content to accept the extensive work of Gordon and his followers as the definitive word in music
aptitude.
The continued study and refinement of our ability to discover potential in music should not be eschewed. It may never be
possible to identify all of the potential traits that can predict achievement in music, but we certainly cannot be satisfied with only
discrimination and preference tasks in paired comparisons-regardless of how well designed they may be.
A few examples of new ways to broaden our conception of music aptitude might be found within the work of Karma, Nelson et.
al., and Rodriguez. Each of these research efforts takes a different approach to music aptitude by asking children to do somewhat
more complicated tasks. I include these examples here not to replace discrimination and preference tasks, but simply to argue for
expansion of both the content and the type of tasks we ask individuals to do to show music potential.
Structuring Tasks
Karma has advanced a theory of music aptitude that is based on the ability to structure acoustic material (Karma, 1973). Arguing
that current music aptitude tests tend to be both too atomistic and culture-bound, Karma suggests the use of a repeated set of 2-6
tone patterns. These patterns might repeat three times without pause. Following an interval of silence, the pattern could be
repeated in either the same or varied form and the individual must determine if the pattern after the silence is the same or
different than the pattern embedded in the sequence heard before. Examples of these patterns are noted in Figure 1.

Figure 1
The repeated theme forms a primitive hierarchic structure: the whole is formed by grouping of groups. This is thought to
be a general and basic musical property. Only structural changes are made in the 'answer' part of an item, there may be
changes in the sequence or amount of tones, but the same tones are used. If, for instance, there were new, different pitches
in the answer part it would be easy to notice the change by mere recognition, without conceiving the structure. (Karma,
1984, p.28)
This assessment approach celebrates a somewhat higher level of mental processing than comparing two separate patterns. What
is interesting about this work is that it represents not only a higher level of musical thinking, it also allows for the inclusion of
other musical elements such as dynamics (accents). Further extension of this approach might include experimentation with tone
color and articulation.
Analogical Thinking
Nelson, Barresi and Barrett (1992) were interested in testing children's ability to solve musical problems in an analogical setting.
Citing the importance of analogical reasoning in general intelligence, this group reasoned that the same approach might be
applied to auditory processing. The research questions in this study centered on the developmental trends in musical and spatial
analogical thinking tasks and how these results compared with Gordon's measures.
What is fascinating about this study is the construction of the analogical tasks in music. Figure 2 represents one kind of spatial
task that uses analogical thinking. Here, the child must encode, compare, and evaluate identical features of shape. Size and color
were used in more advanced tasks. Figure 3 represents the same kind of mental operations but applied to musical material.

Figure 2
Figure 3
As with Karma's work, the researchers here tax a more holistic process than simple discrimination of patterns. The musical tasks
demand that the individual encode, compare and evaluate one to four perceptual changes in the auditory examples. Melodic
patterns are used in Figure 3, but one can imagine tasks like this that include rhythm, tone color, harmony, dynamics and other
kinds of musical material.
Sensitivity to Expression
The importance of expression as an important part of musicality was noted by Seashore in his text on the psychology of music
(Seashore, 1938). Gordon devoted a section of his Musical Aptitude Profile (Gordon, 1965) to a group of three subtests that
evaluated discriminations of phrasing, balance, and style. Each task asks an individual to choose between a pair of short musical
examples.
Little work to expand the study of expression as part of aptitude has been done until recently. Rodriguez (1995) reviewed what
few studies do exist, particularly in adults, and designed a series of interesting experiments with young children. Using
MIDI-based technology, Rodriguez (1998) placed expressive and mechanical performances of tunes side by side for children to
compare. He used a "2+1 oddity paradigm" as a presentation strategy in which either one mechanical and two expressive
versions of the same tune was used or two mechanical and one expressive were used. The children interacted with a
touch-sensitive computer screen to audit the three versions, then selected the one that was different in the set of three. Twenty
sets of tunes were given to 60 children (20 Kindergarten, 20 Grade 2, and 20 Grade 4). Each age group showed evidence of being
able to do the task (above chance) with the ability rising by age level.
Rodriguez also experimented with production tasks by having the children guide the performance of well known tunes by
controlling the tempo and loudness using a simple MIDI trigger that played successive musical events of the song. These
performances were saved and judged by experts for their expressiveness. He also engaged the children in a discussion of what
they did.
Creative Thinking as a Part of Music Aptitude

The research efforts noted above are designed, in part, to expand our understanding of music aptitude. Another important effort
is the work to address the identification of creative thinking in music. More holistic processing of musical information, music
problem solving, and sensitivity to music expression are all possible avenues for expanding music aptitude because each is part
of the mental operations associated with music making. Certainly the same can be said for creative thinking in music.
Some might argue that creative thinking does not occur until much later in music development-once a comprehensive
understanding of music is achieved. I reject this view, maintaining that the processes of creative thinking begin early in life and
are worthy of identification as part of our continuing effort to understand music aptitude in its fullest sense.
Just what is it that develops when we speak of creative thinking in music? One central notion is aural imagery or inner hearing.
Gordon has defined this in a very specific way with the term "audiation" as was noted earlier. My own conception of this notion
is broader and includes the possibility of aural images that involve much more complicated and subtle structures than Gordon has
documented.
This ability to internally imagine sound is not only important for music tasks designed to yield a single right answer (convergent
tasks), but is also critical for tasks for which several answers are possible (divergent tasks). Traditional and less traditional
approaches to the measurement of music aptitude have celebrated convergent tasks and rightly so. I believe that we need also to
tap the more imaginative, divergent side of music potential as well.

Work in the study of creative thinking ability in school-aged children is a relatively new venture. I have summarized much of the
empirical work by dividing studies into those that address content (product and process) and those that follow the psychometric
tradition (Webster, 1992). Since 1992, the work has continued with greater emphasis on compositional and improvisational
thinking. Not all of this work is intended at broadening the assessment of music aptitude per se, but much of it has underscored
the importance in understanding creative thinking as a vital part of music teaching and learning.
How does this imaginative thinking relate to the big picture? I have found the model pictured in Figure 4 to be useful in my
thinking about creative thinking in music and attempts to measure it as a function of music aptitude (Webster, 1987a). Such
attempts at conceptual modeling are useful for teachers and researchers. They suggest relationships between variables which
imply possible teaching strategy and give direction to research. They can also generate a platform for debate in the profession
which is always a healthy sign. This model is designed to be representative of both child and adult creative thinking, although
certain aspects of the model might be qualitatively different at various stages of development.
I have presented a description of the model elsewhere (Webster, 1987a). Important in this paper is the notion that enabling skills
include aptitudes that are both convergent and divergent in nature. To operationalize these ideas, I have created a measure that
uses a microphone for amplifying the voice, a round sponge ball with a piano, and a set of temple blocks to engage children in
musical imagery (Webster, 1987b) (see Figure 5). Called Measures of Creative Thinking in Music (MCTM), the activities begin
very simply and progress to higher levels of difficulty in terms of divergent thinking. There are no right or wrong answers
expected.

Figure 4
Figure 5
The first section is designed to help the children become familiar with the instruments used and how they are arranged. The
musical parameters of "high/low", "fast/slow", and "loud/soft" are explored in this section, as well as throughout the measure.
The way the children manipulate these parameters is, in turn, used as one of the bases for scoring. Tasks involve images of rain
in a water bucket, magical elevators, and the sounds of trucks.
The middle section asks the children to do more challenging activities with the instruments and focus on the creation of music
using each of the instruments singly. Children enter into a kind of musical question/answer dialogue with the mallet and temple
blocks and the creation of songs with the round ball on the piano and with the voice and the microphone. Images used include the
concept of "frog" music (ball hopping and rolling on the piano) and of a robot singing in the shower (microphone and voice).
In the final section, the children are encouraged to use multiple instruments in tasks whose settings are less structured. A space
story is told is sounds, using drawings as a visual aid. The final task asks the children to create a composition that uses all the
instruments and that has a beginning, a middle, and an end.
This measure, and others like it, yield scores for such factors as musical originality, extensiveness, and flexibility, as well as
musical syntax. Measurement strategies are based on the careful analysis of video tapes of children actually engaged in the
activities. Objective criteria as well as rating scales are used.
Results based on over three hundred children have been encouraging. MCTM reliability and validity data seem to suggest
consistent patterns of response and appropriate task content. The tasks are not measuring the same skills as traditional musical
aptitude tests (tonal and rhythmic imagery) nor are they significantly related to general intelligence. There seem to be no
differences in scores in terms of gender, race, or socioeconomic background.
Perhaps the most important point surrounding this work is that what was once thought to be unapproachable and mysterious is
now being studied. We are beginning to have been tools to evaluate the development of creative thinking musical aptitude as
well as aptitudes that are based on convergent perception skills.
New Directions for the Future

This paper has summarized a few efforts to extend and expand the content of music aptitude testing. This work and efforts like it
must continue until we have a much more comprehensive understanding of music potential in children by creating tasks that tap
far more than simple discrimination and preference.
We must also begin to look toward new ways to assess music aptitude procedurally. I have in mind here the use of technology to

provide better assessment systems. In the work of Rodriguez, we saw the use of a computer to provide children with expressive
and non-expressive musical examples. The computer allowed the child to hear and re-hear these examples and to make a
judgment about the content in a non-timed environment. Today's music technology provide rapid access to CD-quality examples
that can be heard again and again. It does not take much imagination to sense the way that the work of Karma and Nelson, et. al.
could be enhanced with the use of such an approach.
Computers provide a variety of possibilities for assessing aptitude by providing many different kinds of musical problems to
solve. My own work with the MCTM could be designed as a set of tasks delivered by computer with MIDI equipment. Hickey
(1995), Daignault (1997), and Younker (1997) have all used MIDI equipment and computer software to study children's
composition in terms of product and process. Using the computer, these researchers were able to record children's responses to
musical tasks in the form of MIDI data which could be studied carefully for both product and processes variables. Although their
work was not focused on aptitude as a construct, the techniques employed could easily be adapted for this purpose.
Commercial software already exists that holds enormous potential for the study of children's music aptitude with
performance-based tasks. Music Mouse (Spiegel, 1990) and Morton Subotnick's Making Music (Subotnick, 1995) are two
programs that offer children a chance to make music without any previous experience. Such tools allow children to demonstrate
a great deal of natural ability to "think in sound" as they rely on their inner imaginations.
I can imagine an Internet-based assessment tool that would use these techniques and others to provide children with interesting
music problems to solve when recording the gestures and responses to questions. If done well with reliable and valid design,
such a tool could provide invaluable help for music teachers and researchers. Its existence would be a logical extension of the
psychometric tradition, beliefs in cognitive science, current philosophies of learning, and technological resources.
Summary
The psychology of music, indeed all of psychology, has grown enormously in method and substance since Seashore's death. I
think, if he were alive today, Seashore would be fascinated with the tools and methods that we now have to study musicality, but
would be impatient with our slowness to explore these possibilities. I believe he would be urging us to push forward with the
study of the "higher-level" music abilities such as those connected with creative thinking.
The present interest in cognitive science, in studying mental representations and mental structures by presenting problems using
more complex abilities then those required for Seashore's early measures, would have intrigued Seashore the scholar. He
certainly would be swept away by the role of technology: video tape, digital sound, the speed and affordability of personal
computers, MIDI, simulations, AI programming, multimedia, and the Internet. I believe that Seashore would return to his
measures and write far different ones than those of his time.
References Cited
Boyle, D. (1992). Evaluation of music ability. In: R. Colwell (Ed.), Handbook of research on music teaching and
learning. (pp. 247-265). New York: Schirmer Books.
Colwell, R. (1970). The evaluation of music teaching and learning. Englewood Cliffs: Prentice-Hall.
Daignault, L. (1997). Children's creative musical thinking within the context of a computer-supported
improvisational approach to composition. (Doctoral dissertation, Northwestern University, 1997). Dissertation
Abstracts International, 57, 4681.
George, W. (1980). Measurement and evaluation of musical behavior. In D. A. Hodges (Ed.), Handbook of music
psychology, (pp.291-340). Lawrence, Kansas: National Association for Music Therapy.
Hedden, S. (1982). Predicition of music achievement in the elementary school. Journal of Research in Music
Education. 30, 61-68.
Hickey, M. (1995). Qualitative and quantitative relationships between children's creative musical thinking processes
and products. (Doctoral dissertation, Northwestern University, 1995). Dissertation Abstracts International, 57, 145.
Humphreys, J. (1993). Presursors of musical aptitude testing: From the Greeks through the work of Francis Galton,
Karma, K. (1973). The ability to structure acoustic material as a measure of musical apitutde. 1. Background theory
and pilot studies. Research Bulletin, 38, Institute of Education, University of Helsinki.
_________ (1982). Validating tests of musical aptitude. Psychology of Music, 10 (1), 33-36.
_________ (1983). Selecting students to music instruction. Bulletin of the Council for Research in Music Education.

75, 23-32.
_________ (1984). Musical aptitude as the ability to structure acoustic material. International Journal of Music
Lehman, P. (1968). Tests and measurements in music. Englewood Cliffs: Prentice-Hall.
Mursell, J. (1937). What about music tests? Music Educators Journal, 24 (2), 16-18.
Nelson, (1992). Musical cognition within an analogical setting: Toward a cognitive component of musical aptitude
in children" Psychology of Music, 20
Rainbow, E. (1965). A pilot study to investigate the constructs of musical aptitude. Journal of Research in Music
Education. 13, 3-14.
Rodriguez, C. (1995). Children's perception, production, and description of musical expression (Doctoral
dissertation, Northwestern University, 1995). Dissertation Abstracts International, 56, 2602.
_____________ (1998). Children's perception, production, and description of musical expression. Journal of
Research in Music Education. 46, 48-61.
Seashore, C. (1919). The psychology of musical talent. Boston: Silver, Burdett and Company.
__________ (1938). Psychology of Music. New York: McGraw-Hill Book Company.
Shuter-Dyson, R. & Gabriel, C. (1981). The psychology of musical ability (2nd ed.) London: Methuen.
Sloboda, J. (Ed.) (1988). Generative processes in music. Oxford: Clarendon Press.
Spiegel, L. (1990). Music mouse [computer software]. Personal Distribution.
Subotnick. M. (1995). Morton Subotnick's making music [computer software]. New York: Learn Technologies
Interactive (formally published by Voyager).
Webster, P. (1987a). Conceptual bases for creative thinking in music. In Peery, J., Peery, I. & Draper, T. (Editors).
Music and child development, (pp. 158-174). New York: Springer-Verlag.
_________ (1987b). Refinement of a measure of creative thinking in music. In C. Madsen and C. Prickett (Eds.)
Applications of research in music behavior. (pp. 257-271). Tuscaloosa: University of Alabama Press.
_________ (1988). New perspectives on music aptitude and achievement. Psychomusicology, 7, 177-194.
_________ (1992). Research on creative thinking in music: The assessment literature. In: R. Colwell (Ed.),
Handbook of research on music teaching and learning. (pp. 266-280). New York: Schirmer Books.
Younker, B. (1998). Thought processes and strategies of eight, eleven, and fourteen-year old students while engaged
in music composition. (Doctoral dissertation, Northwestern University, 1998). Dissertation Abstracts International,
58, 4217.
Back to index

A MULTIDIMENSIONAL MODEL OF PERFORMANCE
Jolanta Welbel , Chair of the Psychology of Music
Chopin Academy of Music
Warszaw, Poland
psyche@chopin.edu.pl
Background.
Past reasearch has shown that to apply the effective form of management of
music performance anxiety , especially the psychotherapy or pharmacotherapy,
there is a need for a differential diagnosis of this phenomenon in relation to
thr temperamental and personality characterisic as same as to the musical
background of musician.
Aims.
The main goal of the study was to find out the nature and the dimensional
structure of performance anxiety.
Method.
The participants were 145 students from Chopin Academy of Music in Warsaw, all
of them after 12-14 years of intensive training in special music schools. All
students were administered The Check List of
Symptoms and Impact of Performance Anxiety - (LOST-J.Welbel)
Stait-Trait Anxiety Inventory - ( STAI - C.D>Spelberger)
Aural Comprehension Test - ( ATAMS - D.Bridges) consists of 3 parts which

include following musical skills and abilities: aural imagery and memory; the
ability to read and understand musical notation and hear in mind the sound
represented by visual symbols; comprehension and application of learned skill
on examples from music masterpices.
Results.
In the effect of factor and correlation analysis the structure of music

performance anxiety was established. The main axis of the model consit of
following bipolar dimensions:
Activity vs. Passivity
Music oriented vs, ego oriented Concentration
Positive vs. Negative Impact on Peformance
Different categories of performance anxiety reached significant correlation

with the results of aural comprehension test and anxiety.
file:///g|/poster1/Welbel.htm (1 of 2) [18/07/2000 00:30:01]

Back to index
file:///g|/poster1/Welbel.htm (2 of 2) [18/07/2000 00:30:01]

Dr DeFonso
VOCAL QUALITY OF SINGING VOICES AS A FUNCTION OF ACOUSTIC PROPERTIES
Dr. Lenore E. DeFonso
defonso@ipfw.edu
Background:
Previous research (DeFonso & Kelley, 1994) found that aesthetic (like-dislike)
ratings as well as semantic differential-type judgments of singing voices were
related to the degree and rate of vibrato and the distribution of harmonics
within the vocal range. These judgments were made for samples from different
singers who were given no specific instructions on how they should produce the
excerpts.
Aims:
For the present study, acoustic factors contributing to the listeners'
judgments are more closely examined. The variability in the between-singer
samples is more effectively controlled by using just four singers, one for each
voice range (soprano, alto, tenor, bass), and then varying the vocal quality
within each voice.
Method:
The participants were 120 Introductory Psychology students with limited musical
background (as determined by questionnaire). Listeners made aesthetic and
descriptive judgments of 24 excerpts of America the Beautiful. Each of the four
singers produced six versions of the tune using two pitch levels, three types
of vibrato, an "edge" or not, and full-voice or not.
Results:
The acoustic properties of the voices were analyzed using Hypersignal software.
A discriminant analysis will be used to determine which descriptor properties
and acoustic properties contribute to judgments of liking/not liking. Based on
the results of the previous study, it is anticipated that aesthetic judgments
of different voicing styles within and between singers will be reflected in
differences in acoustic properties.
Conclusions:
These results can be used in training singers in the manipulation of the

acoustic properties of their voice to achieve specific expressive effects as
well as to maximize the overall pleasingness of the voice.
Back to index
file:///g|/poster1/DeFonso.htm [18/07/2000 00:30:01]

Mr Tobias Egner
e-mail contact
Abstract title:
EEG OPERANT CONDITIONING AS A MEANS OF ENHANCING PERFORMANCE SKILLS IN

MUSICIANS
Mr Tobias Egner
t.egner@ic.ac.uk
Background:
Background. Operant conditioning of frequency components of the human

electroencephalogram (EEG) teaches voluntary control over relative power in
specific frequency bands of the EEG. This is achieved by displaying to the
subject a graphic or auditory representation of on-going electrical activity
recorded non-invasively via scalp-electrodes. A given frequency component can
be reinforced by awarding "points" for any relative power increase within the
chosen band. This technique, also coined "EEG Biofeedback or Neurofeedback",
has a range of clinical applications, notably improving sustained attentional
focus and behaviour in children diagnosed with attention deficit hyperactivity
disorder. It has been found to decrease reaction times on visual and auditory
processing tasks in college students, and enhance performance in athletes.
Aims:
Aims. The investigation aims at applying the technique of EEG operant

conditioning to the enhancement of musical performance.
method:
24 undergraduate and postgraduate students at the Royal College of Music

(London) are participating in a 20-session training program of EEG operant
conditioning. 14 receive also receive a mixed program of structured exercise
and mental rehearsal with performance profiling. Ten 24-minutes sessions each
are dedicated to the operant increment of activity in the beta band (12-15 Hz
and 15-18 Hz)and activity in the alpha and theta bands (8-12Hz and 4-8 Hz,
respectively). In order to test efficacy of the intervention, performance is
assessed by having 10-minute performances, video-recorded prior and subsequent
to training, rated by two internationally known professional musicians. Changes
in pre-performance state-anxiety are measured by the Spielberger's State-Trait
Anxiety Inventory. Mood in each session is assessed with the Thayer Activation
Deactivation checklist. Furthermore, potentialcognitive mediators of improved
musical performance are investigated by taking EEG evoked potential (EP)
measures reflecting speed and accuracy of sensory processing, and by testing
performance on a sustained attention task (TOVA) prior and subsequent to
training.
Results:
To be presented at the conference.
Conclusions:
file:///g|/poster1/Egner.htm (1 of 2) [18/07/2000 00:30:02]

Mr Tobias Egner
To be discussed at the conference.
Back to index
file:///g|/poster1/Egner.htm (2 of 2) [18/07/2000 00:30:02]

HARMONIC AND MELODIC CUES IN SIGHT-SINGING
Proceedings paper
THE EFFECTS OF MELODY AND HARMONY ON PITCHING ABILITY IN SIGHT-SINGING

Philip Fine, Department of Psychology, University of Buckingham
Anna Berry, Department of Psychology, University of Oxford
Address for correspondence:

Dr. Philip Fine, Department of Psychology,
University of Buckingham, Buckingham,
MK18 1EG
Email philip.fine@buck.ac.uk
Abstract
This study investigated the effects of melodic and harmonic coherence on sight-singing ability. Twenty-four experienced singers performed an interval singing task, and then sang at sight four novel pieces of music twice each, containing either easy or
hard melody and easy or hard harmony. Both harder melody and harder harmony increased errors. Error rate correlated with interval singing performance, indicating the importance of both pattern-recognition and harmonic prediction in sight-singing.
Singers made fewer errors on the second reading, indicating the importance of familiarity. A significant correlation between hesitation and overall error rate suggests an increasing role for internal auditory representations with increasing expertise.
Finally, less skilled sight-singers were significantly more affected by a disruption in harmony than better sight-singers. Auditory representations seem more important for sight-singing than for most instrumental sight-reading. The findings are discussed
in terms of a cognitive framework for pitch determination in sight-singing.
Introduction
The volume of research into sight-reading novel music has increased markedly over the last 15 years (see Sloboda 1984, 1985, Gabrielsson 1999 for reviews). Although sight-reading ability does not necessarily correlate with musical expertise (Wolf
1976, Waters, Townsend & Underwood 1998), it is nevertheless an important, even essential, skill for professional musicians to acquire (Sloboda 1978), particularly orchestral players, accompanists (Ericsson & Lehmann 1994) and choral singers.
Most sight-reading research has been carried out on pianists, few studies investigating other instruments, examples including the flute (Thompson 1987) and strings (Salzberg & Wang 1989). With the exception of Goolsby's research on eye movements
(Goolsby 1994a, 1994b), the handful of papers on sight-singers (e.g. Sheldon 1998, Demorest 1998) have generally investigated the effects of rather than the processes involved in sight-singing.
Sight-singing and piano sight-reading differ, however, in a number of important ways. One main difference is the extent to which pitch production is internalised. Despite findings that internal auditory representations are involved in piano sight-reading
(Waters et al 1998, Townsend 1997), pianists can still translate the visual stimulus (score) into a motor response (pressing the correct piano key) without knowing the note's pitch internally. However, singers must know the sound of any note before its
production, and this presumably involves working out its pitch internally. Indeed, singers need a starting note when performing (in the absence of absolute pitch), whereas pianists do not. Another important difference is that, unlike pianists, singers
rarely perform by themselves, but more often with other singers, orchestral players or a piano. These other parts therefore potentially provide cues to the pitches to be sung. Certainly, there are no studies in the sight-reading literature where the pianist
hears other lines of music simultaneously. Personal experience and anecdotal evidence from singers suggests that these other parts are of great importance in determining the notes to be sung. The present paper is concerned specifically with the
influence of other parts on sight-singing ability.
Music sight-reading is normally thought of as a transcription task (cf. Shaffer 1978, 1982), such as copy-typing or reading aloud. Like reading text, it can be fractionated into cognitive sub-tasks and operations: perceptual processes (pattern-recognition,
expectation); translation processes from visual / auditory to motor responses; and the formation of auditory representations (e.g. Waters et al 1998). Pattern-recognition ability has been shown to be important in sight-reading. There is a strong
correlation between sight-reading skill and the ability to report groups of briefly presented notes (Bean 1938, Salis 1980, Sloboda 1976, 1978), perhaps akin to chess masters' ability to recall board positions (Chase & Simon 1973). This seems due to the
use of a more efficient and possibly qualitatively different mechanism (Sloboda 1984). Waters et al (1998) demonstrated correlations between pianists' sight-reading skill with single note recognition speed, with recall of briefly presented music and
with pattern matching ability of music (Waters, Underwood & Findlay 1997). Pianists' eye-hand span has also been shown to be related to sight-reading ability (Furneaux & Land 1999, Sloboda 1974, 1977), as has flautists' eye-performance span
(Thompson 1987). Expert sight-readers' eye-hand span tends not just to be larger but also expands or contracts to coincide with phrase boundaries, suggesting the processing of higher-order structures (Sloboda 1984). Further evidence from
eye-movement research (e.g. Waters et al 1997, Waters & Underwood 1998, Goolsby 1994a, 1994b, Rayner & Pollatsek 1997, Truitt, Clifton, Pollatsek & Rayner 1997, Kinsler & Carpenter 1995) indicates that highly skilled readers scan larger units,
sometimes with briefer fixations, and that their fixation pattern is more likely to depend on the type of music being read.
Prediction and expectation of the subsequent note(s) to be performed are important cues. As musicians become more knowledgeable of the conventions of harmony and musical structure, they are more likely to employ prediction. In so called
proof-reading errors, skilled pianists play what they expect rather than printed errors (Wolf 1976, Sloboda 1976). Priming studies have shown that one chord can influence the processing of subsequent chords (Waters et al 1998, Bharucha 1987,
Bharucha & Stoekig 1986). Furthermore, the formation of internal auditory representations is likely to be important in sight-reading, as shown by correlations between sight-reading ability and ability to memorise music from notation (Eaton 1978, Nuki
1984) and improvisation ability (McPherson 1995). Waters et al (1998) suggest firstly that auditory imaging may allow performers to monitor their reading and secondly that this auditory representation is needed for the priming and predictive ability
already mentioned. Ward & Burns (1978) have provided some evidence for the first of these suggestions by showing that auditory feedback is used by singers to keep themselves in tune, although removing auditory feedback only impairs pianists'
expressive aspects of performance (Repp 1999) and not their sight-reading ability (Banton 1995). It seems, then, that pattern-recognition, predictive ability and internal auditory representations are all integral to sight-reading ability.
file:///g|/poster1/Fine.htm (1 of 6) [18/07/2000 00:30:05]

The two principle aims of this paper, then, are to investigate a) the influence of other lines of music on sight-singing ability, and b) whether pattern-recognition and predictive ability are as important in sight-singing as in piano sight-reading. Melody
was disrupted in an attempt to interfere with pattern recognition. Disruption of harmony was done to interfere with prediction. The present experiment is a necessary preliminary to future research on the importance of auditory representations in
sight-singing.
Method
Subjects
Twenty-four experienced choral singers, mostly members of an accomplished choir, took part. Their experience of singing regularly in choirs ranged between 2 and 44 years, but, by their own estimation, they were all of a fairly high sight-singing
standard. The number of instruments played and years of piano playing experience were also recorded.
Stimuli
The interval singing test consisted of 10 written pairs of notes, of different intervals, both ascending and descending, constructed so that consecutive intervals were as unrelated as possible. There was one set of stimuli for each SATB (soprano, alto,
tenor, bass) voice range. The initial stimuli in the sight-singing tests were 4 different Bach chorales, all in 4/4 time. They were between 9 and 15 bars long , were homophonic in nature and had fairly simple rhythm. Each original chorale was then
manipulated as follows. In the harmony manipulation the subject's part was left unchanged, but the other three parts were altered so as to make the harmony discordant and unpredictable. In the melody manipulation, the subject's part was made visually
more random and jumpy but it still fitted harmonically with the other three parts, which were unchanged. Each harmony or melody manipulation was carried out for each of the SATB parts. In the hardest condition, both the harmony and melody were
disrupted so that the piece was quasi-random, but with the same rhythmic structure and sometimes the same contour structure as the original chorale. This resulted in a total of 40 stimuli: 4 original chorales, 4 hard harmony / hard melody stimuli, and
16 each of hard harmony or hard melody stimuli.
The interval singing test involved responding to a heard tone, and the sight-singing test required subjects to hear the other parts. All these auditory stimuli were played on a Clavinova and recorded onto cassette tapes, which the subjects heard whilst
responding. The tapes also contained full verbal instructions. Subjects' responses on both tasks were recorded with a second tape machine for subsequent scoring.
Procedure
First, each subject performed the interval singing test. On each trial the subject was shown the written interval and heard the first note from the tape, and then sang the interval. Subsequent trials were obscured so that intervals could not be worked out in
advance. For the sight-singing tests, the subjects sang the four chorales, one for each condition, with a second (consecutive) attempt at each. The chorales' speed was 50 beats per minute and the other 3 parts were heard during the singing. The order of
chorales and order of conditions (original chorale, hard harmony, hard melody, and both hard harmony and hard melody) were counterbalanced between subjects. Before each chorale, one bar of beats (or three beats if the music began on the up-beat)
was given on the tone which was the subject's first note. Hence the subject knew when and on what note to come in.
Results
The interval singing test was scored out of twenty. Two marks were awarded for a correct interval produced within 1.5 seconds. A correct interval but with a delay of more than 1.5 seconds gained one mark, a delay indicating less developed
pattern-recognition, albeit with some melodic ability. An incorrect interval or no response gained no marks. Performance was wide ranging with a mean of 9.52 ± 4.45.
In the sight-singing test, rhythmic errors were ignored, and marks were awarded on the basis of pitch alone. Each note was marked out of three: one mark for singing anything, i.e. not getting lost; two marks for singing the correct note but with
hesitation or correction; and three marks for a perfect performance. One type of error, especially prevalent in the hard harmony condition, was an initial mistake followed by singing the next section correctly within itself but transposed up or down.
When this occurred, productions were marked twice, once indicating how many incorrect intervals were sung (interval errors EI) and once measuring how many incorrect notes were sung (note errors EN). A percentage error score for each attempt at
each piece (for both EI and EN scores) was calculated. Results appear in Table 1.
Table 1
Percentage errors on sight-singing tasks as a function of harmony, melody and attempt.
Easy harmony Hard harmony
Attempt Easy melody Hard melody Easy melody Hard melody
EN EI EN EI EN EI EN EI
1 7.8 7.2 27.8 27.1 28.3 24.5 49.4 47.4
2 4.5 4.5 25.0 25.0 25.2 19.8 46.0 44.4
Arcsin transformations were carried out (Hays 1993) on the proportion error data prior to running two 3-way (harmony x melody x attempt) repeated-measures ANOVAs, one for EI and one for EN. Harmony was highly significant for both EI (F(1,23)
= 42.91, p < 0.001) and EN (F(1,23) = 51.66, p < 0.001), less coherent harmony leading to more errors. Melody was also highly significant for both EI (F(1,23) = 62.71, p < 0.001) and EN (F(1,23) = 53.28, p < 0.001), with harder melody leading to
more errors. Significantly fewer errors were made on the second attempt than the first for EI (F(1,23) = 5.93, p < 0.05) and EN (F(1,23) = 5.46, p < 0.05). There were no significant interactions.
On the basis of overall error scores, the subjects were divided into two groups for sight-singing ability. Results are shown in Table 2. Two four-way (harmony x melody x attempt x group) mixed ANOVAs were now carried out, again on arcsin
transformed data, for EI and EN. Group was highly significant in EI (F(1,22) = 42.81, p < 0.001) and EN (F(1,22) = 40.24, p < 0.001), indicating a meaningful group division. Group by harmony was significant for EI (F(1,22) = 7.77, p < 0.05) and EN
(F(1,22) = 4.65, p < 0.001), less skilled readers being more affected by hard harmony than better readers. No other interactions approached significance.

Table 2
Percentage errors on sight-singing tasks, splitting subjects into two groups according to sight-singing ability.
More skilled readers Less skilled readers
Easy melody Hard melody Easy melody Hard melody
Harmony Attempt EN EI EN EI EN EI EN EI
Easy 1 0.7 0.7 19.1 19.1 15.0 13.6 36.5 35.5

2 0.3 0.3 16.3 16.3 8.7 8.7 33.8 33.8
Hard 1 12.5 9.6 40.8 36.8 44.2 39.3 57.0 58.0

2 11.3 5.0 34.9 32.9 39.0 34.6 57.0 55.8
Two Spearman correlations compared the number of hesitations with overall sight-singing performance, for EI and EN. Unfortunately the hesitation data for two subjects were lost, so this analysis used data from 22 subjects. This correlation proved to
be significant for both EI (rs = 0.4363, p < 0.05) and EN (rs = 0.5129, p < 0.02). Less skilled sight-singers made more hesitations.
Two final sets of Spearman correlations were carried out, comparing interval test scores, overall sight-singing error scores, the number of years' singing experience, the number of years playing the piano and the total number of instruments played. The
results are shown in Table 3. More skilled sight-singers tended to perform better on the interval test (EI rs = .5820, p < 0.01; EN rs = .5644, p < 0.01) and have more singing experience (EI rs = .4466, p < 0.02; EN rs = .4486, p < 0.02). The number of
years playing the piano correlated with the total number of instruments played (rs = .4723, p < 0.02).
Table 3
Spearman correlation coefficients rs between test performance and musical training.
Singing Interval Piano No. of

experience test experience instruments
Interval test .1761
Piano experience .0339 .1025

No. of instruments -.1736 -.1023 .4723*
Sight-singing errors EN -.4486* -.5644** -.2003 -.0410
Sight-singing errors EI -.4466* -.5820** -.1714 .0270
* p < .02; ** p < .01
Discussion
This study investigated sight-singing performance as a function of melodic and harmonic difficulty. Our results indicate that interval singing performance correlates with sight-singing performance, and disrupting the sung line (i.e. harder melody)
impairs sight-singing performance. This melodic disruption tended to increase the size and variability of the presented intervals, interfering with more extended pattern recognition, such as scales or repeating motifs, and hence decreasing sight-singing
performance. In agreement with previous findings (Waters et al 1998, Sloboda 1977, 1984) that more skilled sight-readers have better pattern matching abilities, the findings suggest either that certain intervals are easier to read than others or that
prediction improves with increasing sight-singing skill. Disrupting the line even without altering the harmonic constraints of the piece impairs these predictive abilities (e.g. Waters et al 1998).
Disrupting harmonic coherence also impaired sight-singing ability. Modern, atonal music is much harder to sing, especially at sight, than tonal music from the classical or earlier periods, due to the feeling that the singer "knows where the piece is
going". This is due to the composers' use of harmonic cadences and structures together with the readers' predictive ability based on priming (Waters et al 1998). Prediction becomes harder in atonal harmony, and there is a greater likelihood of singing
incorrect notes. Furthermore, auditory feedback cannot be used to correct the note if the singer is either unaware of the error or does not have enough time available. Indeed, singers skilled at interval (pattern) recognition can transpose an extended
section of a piece (an EI error), sometimes without realising. This can only be corrected when the music reaches a point at which prediction makes the difference between the sung pitch and the required pitch clear and explicit. With the exception of
three subjects, all errors of this type were made in hard harmony conditions.
Auditory feedback can be seen both in increased hesitation in less skilled sight-singers and in the "swoops" that less skilled sight-singers sometimes produce before eventually landing on the right note, suggesting that the formation of the auditory
representation develops with increasing experience and skill. Singers become less accurate in their tuning, more so for less experienced singers, when deprived of feedback from their voices by white noise delivered to the ears (Ward & Burns 1978,

Sundberg 1987). It is likely that auditory information from other parts is also used in the same way. In this study, the other three parts were both seen and heard. Their resulting auditory cues are probably more important than their visual cues for
providing feedback. When the subjects were split into two groups on the basis of their sight-singing performance, the less skilled group was significantly more affected by hard harmony than the more skilled group. This suggests that less skilled readers
rely more on harmonic cues than more skilled readers, or at least are less willing to ignore them in atonal circumstances. More skilled readers may also rely to a greater extent on their interval singing abilities.
Finally, each piece of music was sung through twice, in two successive attempts. The second reading was significantly better than the first, illustrating the importance of familiarity on sight-singing performance.
Sight-reading novel music involves many factors, and the precise nature of the tasks involved depends on the instrument in question. In singing, unlike keyboard instruments, a single note is produced at a time, and the level of proprioception involved
is much smaller for singing than for virtually any other instrument (Sundberg 1987). Although auditory representations have been shown to be involved in piano sight-reading (Waters et al 1998, Townsend 1997), we believe that they are perhaps of less
importance than in singing. Certainly pianists can play the first note of a piece cold, showing that piano playing does not in fact necessitate inwardly hearing the pitch of the notes being produced, but singers (without absolute pitch) must hear and have
named a note in order to start a piece correctly. In strings and woodwind instruments, tuning requires inwardly hearing the notes, but once the fingers (and bow for stringed instruments) are in the correct places, roughly the right note will be produced,
even in the absence of an auditory representation. However, for singers (and to an extent brass players), it is of vital importance to know what the next note will sound like before producing it. We have little conscious proprioceptive feedback from the
vocal cords, although unconscious feedback is developed naturally to assist us in speaking (Sundberg 1987). But singing training and experience do enable the development of a "muscle memory for pitch" in singers (Sundberg 1987). Indeed, singers
report that by using what is called "relative pitch" they can sometimes know if they are more than a tone or two out, sensing whether the note being sung "feels" too high or low in the vocal cords. However, this proprioceptive memory is rarely precise.
This argues for the necessity of a pre-formed internal auditory representation in enabling correct pitch production.
The framework shown in Figure 1 provides a succinct way of representing these processes. It is not explicitly based on previous models, but is designed to accommodate both the present data and anecdotal evidence from singers. It includes the
importance of interval recognition and prediction, and the role of both auditory representations and auditory feedback. It also shows the importance of harmony in multi-part music.
The individual processes, such as interval recognition and the formation of auditory representations, develop with experience. More skilled sight-singers tend to have better interval singing (i.e. pattern perception), are better at multimodal integration
(combining the auditory cues and the visual score), and have better predictive abilities than less skilled sight-singers, although they are also more able to ignore harmony when it is atonal and therefore not helpful or even disruptive to pitch
determination. More skilled sight-singers also tend to produce fewer swoops and hesitations, suggesting a more developed internal auditory representation, together with a more advanced proprioceptive "muscle memory" in the vocal cords. They
probably also make more use of auditory feedback, even in atonal music.
The present findings suggest that the formation of auditory representations is an important skill in the development of sight-singing. Falkner, in his book on voice training, notes "It is a good habit to hear a note before it is sung" and ear training and

playing by ear are important in the development of musical literacy, performance skill and creative musicianship (Seashore 1938 / 1967, Falkner 1983). Indeed the Kodaly method of music training emphasised the development of aural imagination or
the "inner ear" in teaching sight-reading skills (Hargreaves 1986). Thus perhaps the first step in acquiring sight-singing skill is to adopt an integrated auditory and notational form of music tuition.
References
Banton, L.J. (1995). The role of visual and auditory feedback during the sight-reading of music. Psychology of Music, 23(1), 3-16.
Bean, K.L. (1938). An experimental approach to the reading of music. Psychological Monographs, 50(6), 1-80.
Bharucha, J.J. (1987). Music cognition and perceptual facilitation: A connectionist framework. Music Perception, 5(1), 1-30.
Bharucha, J.J. & Stoekig, K. (1986). Reaction time and musical expectancy. Journal of Experimental Psychology: Human Perception and Performance, 12, 403-410.
Chase, W.G. & Simon, H.A. (1973). Perception in chess. Cognitive Psychology, 4, 55-81.
Demorest, S.M. (1998). Improving sight-singing performance in the choral ensemble: The effect of individual testing. Journal of Research in Music Education, 46(2), 182-192.
Eaton, J. (1978). A correlation study of keyboard sight-reading facility with previous training, note reading, psychomotor, and memorization skill. Doctoral dissertation, Indiana University.
Ericsson, K.A. & Lehmann, A.C. (1994). The acquisition of accompanying (sight reading) skills in expert pianists. Conference presentation at the 3rd International Conference in Music Perception and Cognition, University of Liege.
Falkner, K. (1983). Voice. Yehudi Menuhin Music Guide. London: Kahn & Averill.
Furneaux, S. & Land, M.F. (1999). The effects of skill on the eye-hand span during musical sight-reading. Proceedings of the Royal Society of London, Series B, 266, 2435-2440.
Gabrielsson, A. (1999). The performance of music. In D. Deutsch (Ed.), The Psychology of music, 2nd edition (pp. 501-602). San Diego, CA, USA: Academic Press.
Goolsby, T.W. (1994a). Eye movement in music reading: Effects of reading ability, notational complexity, and encounters. Music Perception, 12(1), 77-96.
Goolsby, T.W. (1994b). Profiles of processing: Eye movements during sightreading. Music Perception, 12(1), 97-123.
Hargreaves, D.J. (1986). The developmental psychology of music. Cambridge: Cambridge University Press.
Hays, W.L. (1993). Statistics, 5th edition. Academic Press, Inc.
Kinsler, V. & Carpenter, R.H.S. (1995). Saccadic eye movements while reading music. Vision Research, 35(10), 1447-1458.
McPherson, G.E. (1995). The assessment of musical performance: Development and validation of five new measures. Psychology of Music, 23, 142-161.
Nuki, M. (1984). Memorization of piano music. Psychologia, 27, 153-163.
Rayner, K. & Pollatsek, A. (1997). Eye movements, the eye-hand span, and the perceptual span during sight-reading of music. Current Directions in Psychological Science, 6(2), 49-53.
Repp, B.H. (1999). Effects of auditory feedback deprivation on expressive piano performance. Music Perception, 16(4), 409-438.
Salis, D.L. (1980). Laterality effects with visual perception of musical chords and dot patterns. Perception and Psychophysics, 28, 284-292.
Salzberg, R.S. & Wang, C.C. (1989). A comparison of prompts to aid rhythmic sight-reading of string students. Psychology of Music, 17(2), 123-131.
Seashore, C.E. (1938 / 1967). Psychology of music. New York, NY: Dover.
Shaffer, L.H. (1978). Timing in the motor programming of typing. Quarterly Journal of Experimental Psychology, 30(2), 333-345.
Shaffer, L.H. (1982). Rhythm and timing in skill. Psychological Review, 89(2), 109-122.
Sheldon, D.A. (1998). Effects of contextual sight-reading and aural skills training on error detection abilities. Journal of Research in Music Education, 46(3), 384-395.
Sloboda, J.A. (1974).The eye-hand span: An approach to the study of sight-reading. Psychology of Music, 2, 4-10.
Sloboda, J.A. (1976). Visual perception of musical notation: Registering pitch symbols in memory. Quarterly Journal of Experimental Psychology, 28, 1-16.
Sloboda, J.A. (1977). Phrase units as determinants of visual processing in music reading. British Journal of Psychology, 68, 117-124.
Sloboda, J.A. (1978). The psychology of music reading. Psychology of Music, 6, 3-20.
Sloboda, J.A. (1984). Experimental studies of music reading: A review. Music Perception, 2, 222-236.
Sloboda, J.A. (1985). The musical mind: The cognitive psychology of music. Oxford: Clarendon Press.
Sundberg, J. (1987). The science of the singing voice. Dekalb, Illinois, USA: Northern Illinois University Press.
Thompson, W.B. (1987) Music sight-reading skill in flute players. Journal of General Psychology, 114(4), 345-352.
Townsend, E., (1997). Sight-reading expertise: the role of auditory representations. Unpublished Ph.D. thesis, University of Nottingham.

Truitt, F.E., Clifton, C.Jr., Pollatsek, A. & Rayner, K. (1997). The perceptual span and eye-hand span in sight reading music. Visual Cognition, 4(2), 143-161.
Ward, D. & Burns, E. (1978). Singing without auditory feedback. Journal of Research in Singing, 1, 24-44.
Waters, A.J., Townsend, E. & Underwood, G. (1998). Expertise in musical sight reading: A study of pianists. British Journal of Psychology, 89, 123-149.
Waters, A.J. & Underwood, G. (1998). Eye movements in a simple music reading task: A study of expert and novice musicians. Psychology of Music, 26(1), 46-60.
Waters, A.J., Underwood, G. & Findlay, J.M. (1997). Studying expertise in music reading: Use of a pattern-matching paradigm. Perception and Psychophysics, 59(4), 477-488.
Wolf, T. (1976). A cognitive model of musical sight reading. Journal of Psycholinguistic Research, 5, 143-171.
Acknowledgement
We would like to thank Burton Rosner and Vincent Walsh for their valuable comments and assistance during the preparation of this paper.
Back to index

The unique feature of the South Indian Music system is its inimitble...acity to bring together two or more musicians on a concetrt platform
Proceedings paper
PERFORMANCE STUDIES ON THE MUSIC GENRE OF PERFORMANCE

ARTISTS IN SOUTH INDIAN MUSIC
Authors
Dr.M.HARIHARAN
Principal Bharatiar Palkalaikoodam
Center for Performing &Fine Arts
Pondicherry, India
and
Prof (Mrs.)GOWRI KUPPUSWAMY

Retd.Prof & Head, Dept.of Music
Mysore University, Mysore,India
The unique feature of the South Indian Music system is its inimitable capacity to bring together two or
more musicians on a concert platform. Though of a totally different mind and style, artistes perform
together in unison. A musical blend results. The contemporary performance genre is hardly 200 years
old.
The term performance is widely used to mean a Katcheri. This is a misnomer. A performance means
an agreement or a union by mutual agreement within which harmonious combination of voices or
instrumentation takes place. Sometimes within agreement there is disagreement among the performers
in the sense they freelance to an extent.
The period between 8th and 15th centuries saw the advent of several performance-oriented
composers. After that the delta known as Thanjavur in the southern part of India came under the sway
of domination of cultural transformation which has resulted in the current trends and settings. The first
concert hall known as Sangita Mahal was built at Thanjavur to foster performances.
It is interesting to note that only select instruments were used for the performances like the Flute,
Yazh, Vina etc. But the advent of the British saw the emergence of the Western dominated
instrument, Violin as an integral part of the Performance genre in the South Indian system of music.
During the end of the 18th and beginning of 19th centuries, ample opportunities were provided to
artistes in closed performance pattern at palaces and rich houses for exhibiting their individual talents.
The string and the percussion instruments emerged as an integral part of a performance.
file:///g|/poster1/Harihara.htm (1 of 4) [18/07/2000 00:30:06]

It was a psychological phenomenon that the two major aspects, the determinate and indeterminate
(the recitative and the interpretative) became the warp and woof of the texture of the modern
performance. The genre in the performance emerged as a well thought out arrangement in the
presentation of classical music in both these aspects, the performances being divided into segments
like pre-heavy pieces, heavy pieces and post-heavy pieces and so on. The first was a mixture of the
recitative and interpretative; the second being the section for high water mark of the artistes talent and
the final section being non-classical or semi-classical section to gain the appeal of the audience in a
lighter vein.
The present day performances, despite changes having taken place, has its root in the post-trinity era.
The enormous music compositions in different format and structure have helped the present day
musicians to shape their concerts in such a way as to please varying aesthetic levels of the audience.
Today's performance stages are packed to full house because of the liveliness provided by younger
generation artistes who strive for innovations and experimentation.
The Millennium Pattern:
The recent trend has been the exposition of a performance in the Jugalbandhi pattern. This is a
trendsetter in mixing two contrasting and contradicting traditions in a single flow of presentation. The
artistes of vocal tradition or instrumental segments join together in union for a presentation of their
own identical ideas and imagination to catch the imagination of two different type of audiences who
has a varying mind set. Here the musical dialogue between the South and the North meets and melts.
Yet another innovative presentation has been the Jazz Yatra where in the East meets the West or the
North meets the South. Among the two styles one will have an alien nature and will be totally
different in the cultural as well as the aesthetic content. The two presentations knit and bound together
will be explicitly open for presentation in their own style but at the same time keeping an identical
path common to both the system. But in Jazz Yatra concerts percussion takes the lead.
The Southern system has seen of all a cross cultural exchange in the performance genres. For example
the western system of music found its way into the South Indian musical system and became one
when presented together. The technique of presentation was such that the two varying systems crossed
its cultural limits and displayed the impact of the other. None other did this experimentation than
Internationally acclaimed musician of the Europe Dr.Yehudi Menuhin and M.S.Gopalakrishnan in
early 50's. Later several others like Pundit Ravi Sankar, L.Subramaniam etc followed suit.
The limited attendance in heavily weighed artistes gave way to the emergence of an alternate
performance pattern in more closeted Chamber music concerts. Here the audiences as well as the
artistes were very carefully chosen for a better emancipation of like-minded wavelength. The
interaction between the artistes and the audiences were well know and had long drawn impact and
influence on both. In other words Chamber music concerts became very successful with more
emphasis on curriculum learning activity.
Sometime later the Private Commercial oriented Organizations in the name of Sabhas that organized a
variety of performances towards innovation and improvement that gave fruition to the fusion and
ethnic exchange performances. In the former there was fusion of musical content and theme within the
same texture or type of musical system. In the latter, people of different ethnic origins joined together
to perform their own indigenous cultural talents but outside the same musical system.
Before the end of the millennium fusion concerts became very popular both in India and abroad. It
paved the way for innovative research in the performance standards both in appearance and
presentation.

But it was the musical festive seasons or melas that put the impetus on the wide acceptance of all
types of performances in both quality and quantity. In the garb of variations all admixtures were tried
and effected in the performance patterns. In other words inter disciplinary ideas crept into the
performance pattern overlooking tradition and history of the musical system. Audience voted for a
change from the existing system whether they wanted it or not. So several leading performers of today
earned name and fame during this experimentation phase. To cite an example in the city of Chennai
alone there are more than 100 cultural organizations come to life during music festivals which
otherwise will be sleeping during the erstwhile period of the year. So also the artistes. One single
artist will have more than a dozen performances during the music season, otherwise like the
organizations the performer will also be on sleeping business during the year. Such trend of overdose
of performance pattern both in content and style during a particular season should change. Because
the performance genre has lost its intrinsic and cultural values. It has become a non-serious affair with
non-serious audience on a non-serious conduct.
Thanks to the Curriculum portals that many of the universities and institutions have created
performance oriented courses and performance training areas to infuse serious trend both in
preparation and presentation. But there are several feats to achieve in this training area. Modifications
are necessary to the effect that trainees be given ample freedom in choosing their Gurus/Teachers to
become an accredited performer. The Organizations and audiences should accept and encourage only
those performers who are seriously and systematically trained through a proper method by a Guru or a
teacher. Self-trained and self-appointed performers must be ignored and discouraged from
performance.
The Celluloid scene is taking away most of the audience to its forte even if the enthusiastic listeners
are not willing to be addressed. The glamour and glory of the color and presentation in the film areas
had diluted the intrinsic interest of the audiences. It has become difficult to recapture them back to the
aesthetic circle. It is high time to preserve the purity of the performances on a traditional path, but at
the same time on a new wave of emotional and cultural content. The organizations, cultural
Institutions, spiritual leaders, elders, serious teachers, genuine students and the bureaucracy should
strive hard to keep up to the level of standardizing and upholding the intrinsic and innate values of
performances in the new Millennium.
Several senior performers have ventured upon starting their own schools of performance to train
serious students and their proteges as trained performers which reflects their own school of thought in
an international perspective. But it is saddening to note that in this area of institutionalization, the
teachers from the North have gained more success than from the south. The reason for this rather
uncomfortable inadequacy is not encouraging.
Some of the leading advocates of innovative performance genres are notably middle aged and
sometimes very young performers who have gained international acclaim. To name a few are:
Mandolin Srinivas (who performed in the Barcelona Olympics and a star performer in most of the
International festivals), Veena Gayatri (who was a child prodigy having gained immense popularity as
a child artist at the age of five), Master Sasank (a leading child artiste on the Flute), Ravikiran (who as
a child prodigy entered the performing stage at the tender age of three), Vocalist like Bombay Jaysree,
Sowmya, a leading Celluloid fame Nithyasree, K.J.Yesudas, ( who have attained fame both in the
traditional and celluloid field) , Sanjay Subramaniam, Vijaya Siva etc.

Back to index

Memory for unfamiliar melodies: Recall accuracy for pitches under three modes of presentation
Proceedings paper
SEQUENTIAL AND STRUCTURAL ANALYSIS OF RECALL PROTOCOLS DURING MELODY LEARNING

Christine K. Koh & Lola L. Cuddy
Queen's University
ckkoh@psyc.queensu.ca
Key Words: learning, performance, memory, modality, structure

Abstract
Ongoing work by the authors compares and contrasts various aspects of music learning and performance (e.g., pitch accuracy, errors, duration) across repeated learning trials. Previous findings
demonstrated systematic improvement of pianists' pitch recall across performance trials of unfamiliar folk melodies, and superior recall for visually presented melodies compared to aurally presented
melodies (Koh & Cuddy, 1999). The current study replicates these findings and adds new focus on the structural aspects of melodies that guide acquisition and performance of melodic material. Retention
of previously learned melodic material also was assessed in a delayed recall trial. A major finding was that pitch recall patterns adhered to both the sequential and the structural qualities of individual
melodies.
Introduction
Music processing clearly involves a number of features or dimensions (Krumhansl, 1990; Peretz, 1993), and while the last few decades have seen a swift expansion in research on the dimensions involved
in music perception and cognition (for reviews see Deutsch, 1999; Dowling & Harwood, 1986; Krumhansl, 1990, 1991, 2000) and the performance of music (for reviews, see Gabrielsson, 1999; Palmer
1997), one area in particular still lacks comprehensive investigation. This research domain involves learning and memory of music through direct studies of production (Crowder, 1993).
This deficit in the literature may be related to complications involved in the study of direct music production (e.g., lack of established methods and analytic techniques). The current study follows previous
work by the authors (Koh & Cuddy, 1999) that demonstrated a methodology to study musical recall. The authors targeted a number of problematic methodological issues in the music literature.
Methodological Issues
Stimuli. One critical problem in musical memory research involves choice of stimuli (Krumhansl, 1991). Previous research in memory for melodies generally has used short, artificially generated pitch
sequences (e.g., Cutietta & Booth, 1996; Ortmann, 1983; Taylor & Pembrook, 1984), rather than music from the existing repertoire. (Notable exceptions include Boltz, 1991, and Sloboda & Parker, 1985.)
These studies provide a useful knowledge base from which we may consider memory for pitch sequences. However, in order to draw inferences about how listeners in the real world perceive and
remember melodies, it is critical also to consider recall of pitches as they occur in their natural contextual framework (i.e., within a melody). As a result, stimuli for the current study are folk melodies
selected from the existing repertoire.
Paradigm. A second, frequently encountered issue in musical memory research involves selection of the testing paradigm. A substantial amount of previous musical memory research used a recognition
paradigm. This paradigm affords considerable experimental control, but also encompasses drawbacks. Since music studies using recognition paradigms generally are interested in testing participants'
memory for, or sensitivity to, changes in musical features (e.g., contour, interval), stimuli sequences are highly constrained. Each altered version (i.e., "foil") of a musical sequence must only change one
element. Target sequences must be short; otherwise, the number of foils becomes unwieldy. Since it is necessary to use short, controlled sequences in the recognition paradigm, artificially generated
stimuli (the first methodological issue) are favored. As a result, experimental methods that employ the recognition paradigm and short, artificially generated melodies to test memory receive criticism for
lack of practical application to "real life" musical situations (Davies, 1979; Sloboda & Parker, 1985).
Although the recognition paradigm has proven successful in targeting various music perception phenomena, this paradigm is not adequate for the purposes of investigating memory and learning of music.
As a result, a recall paradigm is used in the present study in order to record and analyze participants' responses via production.
Measurement. A third important consideration within musical memory work involves the manner in which music production is measured. One way in which researchers have measured music memory and
production is through hand transcriptions by the participants (e.g., Boltz, 1991; Ortmann, 1983; Taylor & Pembrook, 1984).
file:///g|/poster1/Koh.htm (1 of 8) [18/07/2000 00:30:15]

Hand transcription is an efficient technique for basic analysis of pitch or rhythmic recall (i.e., there is no ambiguity in the assessment of production of duration or pitch). However, memory loss during the
time involved in notation may be greater than during production on one's instrument of training. Further, given the present motivation to compare recall protocols between different modes of presentation,
transcription would not be appropriate. If musical material is presented visually, participants could simply reproduce the notation based on remembered visual patterns, whereas with auditory presentations
they would have the additional task of translating auditory input to visual output.
Involvement of structural components on recall
Aside from providing methodological validation, the current study will assess the structural aspects of a melody that may guide acquisition and performance of melodic material, and the retention of
previously learned melodic material in a delayed recall trial. Given that phrases are considered to be perceptual units in music (Chiappe & Schmuckler, 1997; for a summary, see Palmer & Krumhansl,
1987a), it is expected that these structural components will be involved in acquisition of musical material. Research has demonstrated that performance errors are reduced when phrase boundaries are
present (Palmer & van de Sande, 1995), that planning of music performance is influenced by these structural relations (Palmer & van de Sande, 1995), that phrase endings guide melodic expectations
(Boltz, 1993) and melody recall (Boltz, 1991), and that phrase judgements are based on both temporal and pitch information (Palmer & Krumhansl, 1987a). Further, findings have demonstrated that the
development of appreciation of thematic elements of natural music is enhanced with repeated listening (Pollard-Gott, 1983). Thus, the current study examines what structural features of music guide
acquisition.
Our concentration on assessing improvement in performance and long term pitch retention also was motivated by a study by Sloboda & Parker (1985). Sloboda & Parker (1985) examined repeated free
sung recall of folk melodies in trained and untrained singers. Two salient findings were: 1) recall across trials did not improve, but only got longer; and 2) "intersong contamination" occurred (i.e.,
segments of previous melodies intruded on recall protocols of current melodies).
Koh & Cuddy (1999) modified Sloboda & Parker's (1985) study by using trained pianists as participants, five instead of six recall trials, and new melodic materials. The authors also varied the mode of
presentation so that participants either received melodies aurally, visually, or aurally and visually simultaneously. Koh & Cuddy (1999) found that recall did indeed improve over trials for all conditions,
that recall was better in the visual conditions compared to the aural condition, and that there was no evidence of intersong contamination. Further, pitch recall occurred in a sequential fashion, such that
recall percentages were highest at the beginning of the melody and declined toward the conclusion of the melody. This sequential pattern was evidenced by negative correlations indicating that as pitch
number (i.e., order in sequence) increased, recall accuracy decreased.
The aims of the current study were to: 1) replicate Koh & Cuddy's (1999) finding that pitch recall is superior when presented visually; 2) examine the potential for intersong contamination during a delayed
recall trial; and 3) expand on Koh & Cuddy's (1999) finding that pianists used a sequential processing strategy in performance of unfamiliar melodies. While Koh & Cuddy's (1999) work demonstrated
sequential nature of recall (i.e., that pitch recall is highest at the beginning of melodies and declines toward the conclusion of the melody), the aim here is to examine variations from this sequential decline
in order to implicate the importance of phrase structure in acquisition and consolidation of pitches in memory.
Method
Participants
Twelve trained pianists (7 women, 5 men) were recruited from the Queen's University community. The mean age of participants was 19.45 years (SD=1.37). All participants were required to have a
minimum of Royal Conservatory Level VIII proficiency in piano, which is equivalent to the acquired performance repertoire and basic theory training of an incoming university undergraduate student. The
mean number of years of training was 11.83 (SD=1.64). All participants self-reported normal hearing, and were either paid $10 for their participation or received experimental credit for an introductory
psychology course at Queen's University.
Materials
The musical stimuli were six folk melodies selected from "The Folk Song Sight Singing Series" (Crowe, Lawton, & Whittaker, 1933), a graded series of melodies comprising twelve levels each of
approximately equal difficulty according to music education criteria. Melodies were selected from levels two and three, comprising simple melodies and matched for the following features: 1) major key;
2) no key with accidentals exceeding two sharps or flats; 3) complete melodies where the final note is the tonic; 4) similar length (36-48 beats); 5) similar pitch range (B3 - E5); 6) similar average semitone
intervals (1.93 - 2.69) and contour reversals (14 - 25). Three melodies were in 4/4 meter, and three melodies were in 3/4 meter. Melodies were selected (and in one case transposed) to either be in the keys
of G Major, D Major, or F Major. Two melodies represented each key, and within each key one of these melodies was 3/4 meter and one was 4/4 meter. All melodies were presented at 96 beats per minute.
Melodies were recorded for presentation using Finale 3.8 editing software. Each note was presented with exact notated duration and equal velocity per key stroke. As previous research has shown a
relationship between musical structure and elements of performance expression (for a review, see Palmer, 1997), stimuli were created with equal duration and velocity to avoid possible variations in recall
due to performance expression. (While the role of performance expression is important, given our goal of measuring recall of musical structure, we chose to eliminate the possible role of performance
expression at this point.)
The phrase structure of each melody was assessed by a composition and orchestration professor from the Queen's University School of Music. Three levels of phrase structure were determined: 1) main; 2)
subordinate; and 3) motivic. The main phrase groupings indicated the largest natural grouping of notes into a musical idea (generally spanning 12 - 18 beats). The subordinate level indicated the

subdivided structural groupings within the main phrase groupings (2 - 3 divisions), and the motivic level further subdivided within the subordinate phrase groupings (2 - 3 divisions).
Design & Procedure
Participants were presented a total of six folk melodies, two under each of the following presentation conditions: 1) Uncued Aural; 2) Cued Aural; and 3) Visual. In the Uncued Aural condition,
participants simply heard the melody and began recall following presentation (i.e., they were not told what the first pitch was). In the Cued Aural condition, participants also heard the melody and began
recall following presentation, but were cued with the first pitch. In the Visual condition, participants saw a printed version of the melody on a sheet in front of them for the duration of the Aural
presentation.
The three key signatures were equally represented in the different metrical and presentation conditions. Presentation of melodies was counterbalanced so that each melody occurred under each condition an
equal number of times and in each position order.
Participants began the experimental session by completing a music questionnaire to provide specific details about their training. Participants then received instructions outlining the tasks involved in the
study, and were asked for their informed consent to participate in the study.
Preliminary measures involved sight-reading a test melody (different from the experimental melodies), and identification of ten musical intervals (five ascending, five descending) presented aurally. These
measures were administered in order to determine whether potential performance differences might be related to sight-reading ability and/or ear training skills. Participants then practiced the experimental
task by performing one recall trial for each of the three types of presentation conditions. (The practice melodies were different from the stimuli melodies.) To control the pace of recall (i.e., to prevent a
participant from recalling more slowly but more accurately than one attempting to retain the original tempo of the melody), participants were instructed to approximate their recall to the pace of a visual
metronome (Seiko Quartz Metronome SQM-358, on blinking beat mode) set to the tempo of the presented melodies.
Participants then began the experimental session. To control for effects due to physical practice (e.g., Lim & Lippmann, 1991), participants were explicitly told to restrain kinesthetic movement during the
presentation periods. Participants simply were asked to recall as much of the melody as they could-no specific instructions were given to start at the beginning of the melody and work towards the end.
Each experimental block of trials involved: 1) presentation of a melody five times in succession. After each presentation participants were asked to reproduce the melody on the piano keyboard; 2) three
minutes of a musical distraction rating task, where participants rated the degree of happiness of sadness conveyed by brief musical segments with varied mode and tempo (Lantz, Cuddy, Cullimore, &
Castel, 1999); 3) a delayed recall trial. After every two experimental blocks, the experimenter asked the participant whether he/she needed a break. The entire experimental session lasted approximately
one and a quarter hours.
Participants performed melodies on a Roland FP-1 keyboard and a Power Macintosh 7200/90 recorded each response into MIDI files using MusicShop software. Following completion of the study, all
participants were asked whether any of the presented melodies were familiar to them (all participants reported all melodies as unfamiliar) and were debriefed as to the purpose of the study.
Results
No participants had difficulty sight-reading the test melody and no significant differences in recall were related to accuracy of interval judgment.
Pitch information was extracted from the MusicShop data files. Pitch accuracy was examined in several ways, of which three will be reported: 1) overall pitch recall; 2) motivic unit recall; 3) recall based
on pre/post performance breakdown.
Overall pitch recall
The pitch recall accuracy for the six trials and three presentation conditions replicated Koh & Cuddy's (1999) original findings. First, accuracy of pitch recall improved significantly across trials,
F(5,55)=100.76, p<.001. Second, supporting research indicating that the manner in which musical material is presented affects acquisition of material (Kendall, 1988; Shehan, 1987), recall accuracy
depended on presentation condition, F(2,22)=9.25, p<.01. Recall in the Visual condition was superior to either of the Aural conditions. However, recall did not depend on whether or not participants were
cued with the starting note in the Aural conditions. Correlations between pitch order and percent recalled were negative, indicating the sequential decline in recall found previously by Koh & Cuddy
(1999).
Performance in the delay condition was remarkably high. While a decline was evident from performance in Trial 5 to Trial 6, performance in Trial 6 was nearly equivalent to performance in Trial 4.
Again replicating previous findings by the authors (Koh & Cuddy, 1999), no intersong contamination was found among trial blocks. However, one occurrence of intersong contamination did occur during
a delayed recall trial. In this case, the first nine notes of the contamination melody were played in lieu of any material from the actual target melody. This contamination segment comprised the first
subordinate phrase grouping of the contamination melody. The target melody and contamination melody were both in the same key. The contamination melody was the first presented experimental melody
and represented the Visual condition, while the target melody was the fourth presented experimental melody and represented the Cued Aural condition.
Motivic unit recall
Recall accuracy was compiled according to the motivic phrase groupings provided by the composition and orchestration professor from the Queen's School of Music. The motivic groupings involved the
least amount of data reduction (compared to the larger subordinate and main phrase groupings), that provided recall patterns based on phrase structure groupings. Percent recall for these motivic groupings
was averaged across the three presentation conditions as the pattern of recall did not differ over condition.
Recall accuracy improved with each performance trial for each of the six melodies. However, recall patterns for the sequential order of motivic phrase groupings revealed very different profiles for
different types of overall melody structures, and patterns were related to the main phrase structures. Three different types of overall structural patterns were identified: 1) two of the six melodies lacked
melodic repetition ("no repetition"); that is, each melody had three main phrase groupings in which a new melodic idea was introduced; 2) two of the six melodies had instant repetition ("instant
repetition"), in which the first melodic idea was repeated instantly in the next main phrase grouping, and this repetition was then followed by new melodic ideas; and 3) two of the six melodies had a
separated repetition pattern ("separated repetition") in which the initial melodic idea was introduced in the first main phrase grouping, a new melodic idea was introduced in the next main phrase grouping,
and then the first melodic idea repeated and ended the melody.
The sequential pitch recall profile based on the motivic phrase structure groupings was then assessed. Variations from a decline in recall across each melody (i.e., from beginning to end of the melody)
were identified. These variations were identified as either: 1) an increase, rather than a decrease, in recall on the next sequential motivic phrase grouping; or 2) elevated or equivalent recall accuracy for the
next sequential motivic phrase grouping or cluster of motivic phrase groupings.
A number of musical elements characterized the variations from decline, and these elements clustered into three categories. The first category involved "pitch salience" and comprised of variations
represented by: 1) the introduction of a pitch not yet presented in the melody; 2) highest/lowest pitch of the melody; and 3) a long duration (which generally occurred at the end of a main phrase boundary).
Long durations represented half notes or dotted half notes, where all other duration units in the melodies were eighth, quarter, and dotted eighth notes. The second category involved "new ideas" and
comprised of variations represented by: 1) introduction of a new melodic idea; and 2) introduction of a new rhythmic pattern. The third category involved "repetition and grouping" and comprised of
variations represented by: 1) a motivic phrase grouping that repeats itself; 2) an introductory melodic idea that repeats itself; and 3) a clustering of a series of motivic phrase groupings that form a larger
(i.e., subordinate or main) phrase grouping structure.
These frequency counts differed based on the different overall melodic structures when summing across all trials. (Highest frequency counts per overall structure type are indicated in bold.)
Category of Musical Elements
Structure type Pitch salience New ideas Repetition &

grouping
No repetition 16 10 3
Instant repetition 10 4 12
Separated Repetition 4 6 15
For the "no repetition" melodies, the frequency count was highest for "pitch salience" variations (i.e., new pitch, long durations, highest/lowest pitch). For the "instant repetition" melodies, the frequency
counts were highest for "repetition and grouping" variations (i.e., larger grouping, repeat melodic, repeat motivic). For the "separated repetition" melodies, the frequency counts also were highest for
"repetition and grouping" variations. It appears that the melodies with some form of repetition had the highest counts for variations guided by structural characteristics (i.e., larger grouping) of the
melodies, while the melodies with no repetition had the highest count for variations guided by specific musical characteristics of the melodies (i.e., introduction of new pitch).
Subsequently, what emerges from these variations in decline of recall accuracy are overall profile differences for the 3 types of structural melodies. For the "no repetition" melodies, recall decreased from
the beginning to end of the passage. There were periodic variations that modulated the overall descent, but in general, the pattern overall descended. For the "instant repetition" melodies, a scalloping effect
(elevation at the beginning and end points) for the beginning and end motivic groupings of the first melodic idea and immediate repetition occurred, followed by a decline in recall to the end of the melody
(i.e., during the new melodic idea) with peaks occurring at major phrase boundary endings. An example of this type of recall may be seen in Figure 1. Finally, for the "separated repetition" melodies there
was high recall for the initial melodic idea followed by a decline for the secondary melodic idea, and then a sharp rise in recall when the initial melodic idea repeated.
These data illustrate that melodic recall is not simply sequential in nature, but rather, the structural characteristics of a particular melody (e.g., repetition, long duration) guide the consolidation and
subsequent performance of melodic material.

Pre/post performance breakdown

To provide converging evidence for the motivic recall analysis, and to further assess the structural nature of memory and performance of melodies, we assessed recall based on performance breakdown.
The interest here was to identify the point to which each participant successfully performed a melody before performance deteriorated, and to identify what pitch information is recalled following this
breakdown. Given the described influence of phrase boundaries and structure in music processing, it was posited that performance breakdown would occur at structural phrase boundaries, and that any

pitch information recalled following this breakdown may also reveal structural qualities as opposed to a completely random or non-existent pattern of pitch recall.
For each recall trial, the last pitch played before performance breakdown was determined. A breakdown in performance was characterized by increased slowing in tempo coupled with multiple errors, and
was conditional on incomplete recall (i.e., not reaching the end of the melody with no breakdown in performance). Immediate corrections of performance (e.g., playing a C# then immediately correcting to
a C natural in the key of C Major) were not considered performance breakdown. The frequency of pitches recalled after performance breakdown were counted to assess whether the pattern of recall
following performance breakdown was random or guided by the structural qualities of the melodies. Presumably, if pitch memory and performance operates only in a sequential pattern, there would not be
any systematic patterns in recall following performance breakdown.
For the two "no repetition" melodies, the most frequent endpoint before performance breakdown was at the conclusion of the first major melodic idea (encompassed by the first main phrase boundary
grouping). In assessing post-breakdown performance, the highest peaks corresponded to: 1) the introduction of a new melodic idea, and 2) the highest pitch of the melody.
For the "instant repetition" melodies, the most frequent endpoints before performance breakdown were more variable. For one melody, the most frequent endpoint before performance breakdown was at
the end of the initial melody plus the repetition. The highest peak in post-breakdown recall was at the beginning of the following new melodic idea. For the second "instant repetition" melody, the two most
frequent endpoints before performance breakdown were at the end of the first subordinate phrase grouping and at the end of the 3rd main phrase grouping (i.e., initial melody, repetition, next melodic idea).
The highest peaks in post-breakdown performance corresponded to: 1) the highest pitch of the melody; 2) the beginning of the repetition of the initial melodic idea; and 3) the end of the melody (the final
cadence).
For the "separated repetition" melodies, the most frequent endpoint before performance breakdown was the end of the first main phrase grouping (i.e., first melodic idea). The highest peaks in recall in
post-breakdown performance were: 1) at the introduction of the new melodic idea following the initial melodic idea; and 2) at the repeat of the initial melodic idea.
It is evident for all of these types of melodies that phrase structure contributes to consolidation and performance before and after performance breakdown. Further, for the melodies containing repetition, it
appears that this repetition is critical for complete consolidation of the musical material, as evidenced by the number of trials in which participants successfully reached the end of the melody with no
breakdown in performance (i.e., achieve 100% recall) compared to the "no repetition" melodies. "No repetition" melodies had far fewer trials in which 100% recall was achieved, compared with the
"instant repetition" and "separated repetition" melodies.
Discussion
The results demonstrate that pitch recall is not simply a sequential process, but rather, pitch memory is facilitated by phrase structure, as well as by musical elements that deviate from the general
characteristics of the piece (e.g., introduction of new pitches or rhythms). In comparing the pitch accuracy patterns from this study with classic recall studies investigating patterns in the serial recall of
unrelated words (e.g., Craik, 1970) or comparisons with random orderings of musical pitches (e.g., Roberts, 1986), it is clear that the contextual framework of melodies allows for acquisition and
consolidation of information. As well, it is noteworthy to consider that while participants were not given formal instructions as to how to recall, participants clearly attempted to consolidate melodic
information from beginning to end, while incorporating phrase structure and new musical elements into their recall.
Given this idea, it may be possible to account for the discrepancy in quantified recall improvement between the current work and Sloboda & Parker's (1985) exploratory study based on the structural nature
of the stimuli. Although the stimuli for the current study were longer (ranging from 36 - 48 beats) than Sloboda & Parker's (1985) targeted melody (24 beats), it is possible that the lack of a complete
structure in Sloboda & Parker's (1985) stimulus may have been responsible for the lack of improvement. Sloboda & Parker's (1985) stimulus included the first three phrases of an existing folk song, and
this excerpt ended on the dominant rather than on the tonic. It is possible that participants may have been waiting for the melody to end and resolve to the tonic and that this lack of resolution hindered
improvement across trials. This is a possibility, given work by Boltz (1991) who also made investigations based on Sloboda & Parker's (1985) work, showing that combinatory features of melodies
enhance recall at phrase endings.
Given the homogeneous nature of the six melodies (i.e., similar length, limited accidentals) it was expected that more "intersong contamination" might occur in the delay trial. It is possible that the three
key differences (and subsequent different motor programs to play the white and black notes on the keyboard for these different keys) in the stimuli set were sufficient to eradicate cross-melody
contamination. The sole occurrence of "intersong contamination" in one delay trial is suggestive: Perhaps standardization to a single key might lead to increased occurrences of intrusions during delayed
recall trials.
Clearly, memory and performance of melodies relies on the structure of the music, and not simply the reproduction of notes in a sequential order. This indeed has been shown to be the case with complex
music from the existing literature, where initial learning focuses on solidifying the formal structure of the music, while subsequent learning focuses on increasing speed, reliability, and the automaticity of
motor movement (Chaffrin & Imreh, 1997). Future empirical investigations to test the retention of these structural characteristics with increasingly complex material may yield knowledge as to the
potential differential importance of specific musical characteristics (e.g., introduction of new rhythmic elements) versus structural (e.g., phrase boundaries) components of music.
References
Boltz, M. (1991). Some structural determinants of melody recall. Memory & Cognition, 19, 239-251.

Boltz, M. (1993). The generation of temporal and melodic expectancies during musical listening. Perception and Psychophysics, 53(6), 585-600.
Chaffin, R., & Imreh, G. (1997). "Pulling teeth and torture": Musical memory and problem solving. Thinking and Reasoning, 3(4), 315-336.
Chiappe, P., & Schmuckler, M.A. (1997). Phrasing influences recognition of melodies. Psychonomic Bulletin and Review, 4(2), 254-259.
Craik, F.I.M. (1970). The fate of primary memory items in free recall. Journal of Verbal Learning and Verbal Behavior, 9, 143-148.
Crowder, R.G. (1993). Auditory memory. In S. McAdams & E. Bigand (Eds.), Thinking in sound: the cognitive psychology of human audition (pp. 113-145). Oxford University Press.
Crowe, E., Lawton, A., & Whittaker, W.G. (1933). The folk song sight singing series. London: Oxford University Press.
Cutietta, R.A., & Booth, G.D. (1996). The influence of metre, mode, interval type and contour in repeated melodic free recall. Psychology of Music, 24, 222-236.
Davies, J. (1979). Memory for melodies and tonal sequences: A theoretical note. British Journal of Psychology, 70, 205-210.
Deutsch, D. (1999). The psychology of music (2nd edition). San Diego: Academic Press.
Dowling, W.J., & Harwood, D.L. (1986). Music Cognition. New York: Academic Press.
Gabrielsson, A. (1999). The performance of music. In D. Deutsch (Ed.), The psychology of music (2nd edition) (pp. 501-602). San Diego: Academic Press.
Kendall, M.J. (1988). Two instructional approaches to the development of aural and instrumental performance skills. Journal of Research in Music Education, 36, 205-219.
Koh, C.K., & Cuddy, L.L. (1999, August). Modality differences and learning across trials in memory for unfamiliar melodies. Paper presented at the meeting of the Society for Music Perception and
Cognition, Evanston, IL.
Krumhansl, C.L. (1990). Cognitive foundations of musical pitch. New York, NY: Oxford University Press.
Krumhansl, C.L. (1991). Music psychology: Tonal structures in perception and memory. Annual Review of Psychology, 42, 277-303.
Krumhansl, C.L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126(1), 159-179.
Lantz, M.E., Cuddy, L.L., Cullimore, J.R., & Castel, A.D. (1999, August). The influence of tempo and mode on judgments of 'happy' and 'sad' emotions in music. Paper presented at the meeting of the
Society for Music Perception and Cognition, Evanston, IL.
Lim, S., & Lippman, L.G. (1991). Mental practice and memorization of piano music. The Journal of General Psychology, 118, 21-30.
Ortmann, O. (1983). Some tonal determinants of melodic memory. Psychomusicology, 3, 4-15.
Palmer, C., & Krumhansl, C.L. (1987a). Independent temporal and pitch structures in determination of musical phrases. Journal of Experimental Psychology: Human Perception and Performance, 13,
116-126.
Palmer, C., & van de Sande, C. (1995). Range of planning in music performance. Journal of Experimental Psychology: Human Perception and Performance, 21, 947-462.
Peretz, I. (1993). Auditory agnosia: A functional analysis. In S. McAdams & E. Bigand (Eds.), Thinking in sound: The cognitive psychology of human audition (pp. 199-230). Oxford, England UK:
Clarendon Press/Oxford University Press.
Pollard-Gott, L. (1983). Emergence of thematic concepts in repeated listening to music. Cognitive Psychology, 15, 66-94.
Roberts, L.A. (1986). Modality and suffix effects in memory for melodic and harmonic musical materials. Cognitive Psychology, 18, 123-157.
Shehan, P.K. (1987). Effects of rote versus note presentations on rhythm learning and retention. Journal of Research in Music Education, 35, 117-126.
Sloboda, J., & Parker, D. (1985). Immediate recall of melodies. In P. Howell, I. Cross, & R. West (Eds.), Musical structure and cognition (pp. 143-167). London: Academic Press.
Taylor, J.A., & Pembrook, R.G. (1984). Strategies in memory for short melodies: An extension of Otto Ortmann's 1983 study. Psychomusicology, 3, 16-35.

Back to index

MEASURING SLIDE MOVEMENT WITH YOUNG TROMBONE PLAYERS
Proceedings paper

Mark Kruger and Mark Lammers, Gustavus Adolphus College
Over the last several years, we have been examining the way in which trombone players move their right arm and
consequently the trombone slide to change the pitch of their instrument. In that work, we have found that
professional trombone players use less muscular activity than other adult performers to move the slide (Lammers,
1983), use their wrist more than nonprofessionals (Lammers & Kruger, 1991), and move the slide faster than
nonprofessionals (Kruger, Lammers, Stoner, Allyn, & Fuller, 1996). Professional and student performers were
also found to differ in the distance they moved the slide to reach each of the seven positions on the trombone
slide. Differences were most notable in the longest positions (Kruger, Lammers, Fuller, Allyn, & Stoner, 1997).
Professionals move the slide further and more accurately when reaching for the sixth and seventh positions. Both
professionals and students move the slide further between first and second position than is recommended by any
of the method texts we've examined.
We have been studying trombone performance with several goals in mind. One goal is to develop a better
understanding of expertise in skilled movements. Trombone performance is an excellent example of a natural
task with wide variability in performance. It is also interesting because accurate motion in itself does not
necessarily lead to musical performance. Consequently, it is interesting to see whether or not experts are able to
attend to musical demands more than other performers simply because they have automated the process of
moving the trombone slide to a greater degree. We have also been interested in using careful descriptions of what
skilled performers do in order to challenge or confirm the folk wisdom developed by teachers of trombone
performance. For example, we've found that performers with longer arms do not as a consequence perform better
than those with shorter arms. We hope to be able to develop a clear set of recommendations that will be
instructive to applied music teachers.
With this in mind the current study focuses on the performance of beginning trombone players. Their
performance will be compared to the performance of professional performers and college level musicians that
we've reported in our earlier work. We seek to compare the speed of their slide motion and their use of their
elbow and wrist with more skilled performers. The data we have collected is intended to allow us to explore the
extent to which differences between beginners and other performers is increased when additional demands are
placed on performance, e.g., by increasing tempo.
METHOD
Subjects
The present study was conducted as part of a larger program of research that examined expertise in trombone
performance. For that study, Forty-two trombonists were recruited to participate in a study of arm motion during
performance. Of these, seven performers were beginners ranging in age from 10 to 14. These performers had
played the trombone between one and four years. All subjects signed informed consent forms and were paid for
their participation. Information about the age, experience, and arm length for all performers can be seen in Table
1 below.
Table 1. Study Participants

Group Age (years) Hours/week of Arm length (cm)
performance
file:///g|/poster1/Kruger.htm (1 of 5) [18/07/2000 00:30:17]

Upper arm Lower arm
Professionals 38 18.5 37 27.5
Amateurs 40 9.4 36 27.5
College students 19.5 9.6 35 26
Beginners 12.8 5 32 24.9
Procedure
A special sleeve fitted with two Penny and Giles electrogoniometers was used to record movement in the elbow
and wrist. A 30.5 cm circular target made of poster board was attached to the end of each performer's trombone
slide. The motion detector was attached to a small stand placed in front of the performer and aimed at the modal
position of the target on the end of the slide. Each individual performed three study exercises and two musical
excerpts. For the present study, only performance data from the first exercise was utilized. This musical exercise
consisted of the B flat scale played ascending and descending at three different tempos: with the quarter note
equal to sixty beats per minute, with the half note equal to sixty beats per minute, and with the half note equal to
120 beats per minute. Additional information on the subjects and procedures can be obtained from Lammers et.
al. (1996).
RESULTS
Analyses of Variance were run to examine the effect of the distance the slide was moved, the tempo of the
exercise, and the direction of motion on the duration of the slide movement. Level of group (professsional,
college student, beginner) was included as a between subject variable. All other variables are within subject
variables. It should be noted that significance tests in all of the analyses reported below can only be taken as
suggestive because the sample sizes were very small. This was especially true for the beginner's group in which
there were a significant number of trials in which data was missing. Some of the performers in this group rotated
their torso and the trombone slide from side to side to the extent that the transducer failed to provide accurate
readings.
Level of expertise, tempo of performance, direction of motion, and length of motion all significantly influenced
the time required to move from one position to another. The tempo of the exercise, the distance moved, and the
direction (away from the face versus toward the face) moved each produced a significant difference in duration of
motion. Surprisingly there was not an overall effect due to level of expertise even though the means clearly show
that beginners move the slide more slowly. Examination of the standard deviations suggests this is most likely
due to high variability within the beginners group and the small sample size. Figure 1 illustrates the differences in
mean motion time found as a function of level of expertise and the direction (away from the performer versus
toward the performer) in which the slide was moved. In this figure the movement time is averaged across
motions of differing lengths (1st to 2nd, 1st to 4th, 1st to 6th, 6th to 1st, 4th to 1st, and 2nd to 1st). A significant
interaction (p < .05) was found between level of expertise and direction of slide movement. Examining Figure 1
suggests that the difference is movement duration between inward and outward motions decreases as expertise
increases.

A similar Analysis of Variance with one between factor (level of expertise) and three within variables (distance,
tempo, and direction of motion) was run on the relative change in elbow angle. Changes in elbow angle were
recorded as changes in milli-volts (mV) by the Penny & Giles instrument used in this study. Figure 2 shows that
the variability in elbow angle was greatest for the long motion for all three groups. Similarity among the groups
decreased with increasing length of the motion. However, there were significant effects of group or interactions
with the group variable found in these analyses.
Figure 2: Impact of level of expertise and length of arm on elbow angle.
Figure 3 illustrates what appears to be a notable difference in elbow angle between more experienced and
beginning players when they move from first to sixth position. However, analyses of variance focusing only on
the motions from first to sixth fail to find significant differences as a function of expertise, tempo, or the
interaction between them. Examination of the distance of motion data suggests that this difference is largely
because the beginning performers simply never get their slides all of the way to sixth position.

DISCUSSION
Perhaps the most striking finding to be reported here is that the variability of beginning performers was
sufficiently large as to overwhelm the transducers we used to measure speed and distance of slide movement. A
strategy for improving this measurement system will be described below. An unexpected finding in the analyses
reported above is that performers move the slide faster when making an outward motion than when making an
inward motion. In this case, they are likely avoiding forcing the mouthpiece of the instrument into their
embouchure. It is worth noting that first position is the only position where the slide comes to rest against a
physical barrier. Consequently, it should be the position where uncertainty about where to place the slide is
lowest.
As reported in our earlier work, tempo and distance of motion produced the largest effects on the speed of slide
motion. This suggests that each performer moves the slide as fast as they need to depending on the requirements
of the motion. This finding is inconsistent with the proposal that performers develop a single automated and
crystalized motion from one position to the next that they call upon to position the slide.
The trends found in the data above suggest strongly that further study of beginning performers is needed. It
appears that the primary differences between beginners and other performers occur in the longer motions.
However, it is difficult to draw conclusions from the data we have gathered because our younger performers
simply moved their bodies too much from side to side. While this observation is interesting because it suggests
that students may need to gain better control over their instruments, it should also be possible to examine slide
motion independent of this side to side motion. We are currently working on a strategy that mounts our ultrasonic
transducer directly onto the end of the slide so that it constantly measures the distance between the slide end and
the bell of the instrument. We believe this should allow us to differentiate variability in actual distance of slide
motion from variability in overall body motion, resulting in more accurate estimates of slide motion for all
performers.
REFERENCES
Kruger, M., Lammers, M., Stoner, L. J., Allyn, D., and Fuller R. Musical expertise: The dynamic
movement of the trombone slide. Found in Haake, S. J. (ed.) The Engineering of Sport. Amsterdam,
Netherlands, A. A. Balkema Publishers (1996).
Kruger, M. Lammers, M., Fuller, R., Allyn, D., and Stoner, L. J. Biomechanics of music
performance: Moving the trombone slide. National Association of College Wind and Percussion

Instructors Journal. Vol. 46, Winter 1997-1998, pg. 18-23.

Lammers, M. (1983) An electromyographical examination of selected muscles in the right arm
during trombone performance. Unpublished doctoral dissertation, University of Minnesota.
Lammers, M. and Kruger, M. (1991) The right arm of trombonists. International Trombone
Association Journal, Summer, 14-17.
Back to index

Titel
Proceedings paper
Expression of Emotions in Drumming Performance

Petri Laukka
Department of Psychology, Uppsala University, Uppsala, Sweden
Petri.Laukka@psyk.uu.se
Communication of emotions is important both in everyday life and in the performing arts. Music is widely
acknowledged as an effective means of emotional communication. Studies on music performance, though, have
historically dealt almost exclusively with various structural aspects of performance and largely ignored questions about
emotional expression (for reviews; see Gabrielsson, 1999; Palmer, 1997). More recently, a number of studies have
shown that performers are able to communicate specific emotions to listeners via their expressive performances (e.g.,
Balkwill & Thompson, 1999; Behrens & Green, 1993; Gabrielsson, 1995; Gabrielsson & Lindström, 1995; Gabrielsson
& Juslin, 1996; Juslin 1997a; Juslin 1997b; Juslin & Madison, 1999; Ohgushi & Hattori, 1996).
The previous studies on emotional communication in music performance suggest that the performer's expressive
intention influences practically all factors in the performance. Emotional expression in music thus involves a whole set
of cues that are used by both performers and listeners (e.g., Juslin, 1997a). Different instruments provide different
means to express various emotional characters. Therefore it is of importance to use a wide range of instrumental
settings in studies of emotional expression in music. Until now, there has been no study that has used purely percussive,
non-melodic instruments. Behrens and Green (1993) included timpani in their study, but did not report any
measurements of acoustic characteristics.
The musical material of the present study consists of short rhythm patterns played on a set of electronic drums. This
design offers a more limited range of expressive cues to the performer when compared with earlier studies. Also, the
possible influence of melody on the expression is excluded. The general aim of this study was to investigate if
communication of emotions to listeners is possible under these circumstances, and more specifically to study the
listeners' recognition of the intended emotions, as well as the performers' use of the acoustic cues.
Methods
Two professional drummers were instructed to play three simple rhythm patterns (swing, beat, and waltz) on a set of
electronic drums so as to communicate specific emotional characters (angry, fearful, happy, sad, solemn, tender, and
expressionless) to listeners. The performances were recorded on tape and stored in computer memory.
Thirteen university students listened to all performances and judged them with regard to happiness, sadness, anger,
fear, tenderness, solemnity, and expressiveness. The judgements were made on scales from 10 to 0, where 10
designated maximum and 0 minimum of the respective attribute.
Analyses of the performances were conducted with regard to their acoustic characteristics, e.g. tempo, dynamics, and
timing. All measurements were made with the Soundswell software (Ternström, 1996). The mean tempo was obtained
by dividing the total duration of the performance, until its final note, by the number of beats, and then calculating the
number of beats per minute. The tempo variability was obtained by calculating the tempo for each consecutive beat
(quarter note), and then calculating the standard deviation for the tempo distribution. A measure of the sound level was
obtained by measuring the loudness equivalent level, as provided in Soundswell. The onset times for each note of the
performances were measured, and these values were used to calculate the deviation from mechanical performance for
each note. A mechanical performance is a performance with absolutely constant tempo and strict adherence to the
nominal ratios between different note values.
file:///g|/poster1/Laukka.htm (1 of 4) [18/07/2000 00:30:18]

Titel
Results
Listening tests
Separate repeated measures ANOVAs in each emotion scale showed that the listeners' ratings differentiated in
accordance with the performers' intentions (F (6,72) = 4.83-32.91, p < .001). Multiple comparisons (Tukey's HSD)
showed significant differences (p < .05) for all pairwise comparisons between the intended emotion and the other
emotions within each emotion scale (except no expression). These comparisons show that the listeners in general
perceived the intended expressions.
Performance analyses
The two most successfully decoded (highest ranking of the adjective in question and low ranking of the
non-corresponding adjectives) performances of each of the intended emotional characters were chosen for analysis. The
results of the acoustic analyses are shown in Table 1.
Tempo
The happy versions were played the fastest and the sad versions the slowest. The fearful versions showed so much
variation of tempo that the mean tempo is less meaningful.
Dynamics
The angry versions were clearly the loudest followed by the solemn versions, while the sad and tender performances
had the softest sound level.
Timing
The performances were compared with regard to the amount of deviation from the note durations corresponding to the
nominal values as given in the notations. An overall measure of the amount of deviation was obtained by calculating
for each performance (a) the number of notes whose deviation was less than 5 per cent, (b) the number of notes whose
deviation was less than 10 per cent, and (c) the number of notes with more than 20 per cent deviation.
The fearful versions had the by far largest deviations followed by the happy and the solemn versions. Except for the
fearful versions, deviations from mechanical tempo were rather small; most deviations were less than 10 per cent or
even less than 5 per cent. The no expression versions showed the smallest deviations.
Further, the ratios of dotted rhythm patterns were compared. The performance of dotted rhythm patterns is known to
vary depending on, for instance, tempo, musical style, and the performer's expressive intentions. For both the swing
and the waltz rhythms, the "long - short" ratio was higher in the happy versions than in the sad versions.
Table 1
Mean tempo (beats per minute); tempo variability (standard deviation); relative sound level [dB Leq; decibels relative
to the reference level (0 dB) set for the softest performance]; percentages of notes deviating less than 5 per cent, less
than 10 per cent, and more than 20 per cent from the nominal values; and the means of the dotted note ratios for the
two most accurately decoded performances of each intended emotional expression.
Deviations (%)
Expression Performer/Rhythm Tempo Tempo Sound <5% <10% >20% Ratio
variability level
Happy A / Swing 192 5.80 7 48 61 12 2.71
B / Waltz 189 10.47 2 41 76 6 2.33

Sad A / Swing 61 1.58 1 74 100 0 1.88
A / Waltz 73 3.72 1 63 93 2 .98
Angry A / Waltz 116 9.71 11 63 90 0 2.17
B / Swing 98 3.42 10 57 83 0 2.16

Titel
Fearful A / Swing 178 42.01 4 9 18 58 2.80

A / Beat 154 31.87 4 13 39 33 -
Tender B / Waltz 92 3.64 2 48 80 6 1.89
A / Waltz 103 4.26 0 73 90 2 2.16
Solemn B / Waltz 95 4.74 7 36 82 6 1.77

A / Waltz 90 2.76 6 68 73 20 4.81
No A / Beat 126 3.82 3 96 100 0 -
expression B / Waltz 120 5.18 5 59 94 0 2.15
Discussion
The results show that listeners generally perceived the intended emotions. The different expressive intentions
influenced the variables available to the performer (tempo, dynamics, and timing) in ways characteristic for each
intended emotion. The results of this study agree well with earlier studies of emotion expression in music as regards
both the listeners' decoding accuracy and the performers' cue utilisation.
All of the cues measured in this study are also available in non-verbal vocal expression. The patterns of cue utilisation
for tempo and dynamics correspond closely with the results for speech rate and intensity reported in the literature on
non-verbal communication of emotions in speech (e.g., Pittam & Scherer, 1993; Scherer, 1986) as regards happiness,
sadness, anger, and fear. The fact that the patterns of cue utilisation are stable across modalities could point toward a
common origin and/or similar mechanisms in communication of emotions in music and speech (cf. Juslin, 1997a). It
must, however, be pointed out that not all cues and emotions used in this study have been thoroughly investigated in
both modalities.
Acknowledgements
This research was supported by the Bank of Sweden Tercentenary Foundation through a grant to Alf Gabrielsson.
References
Balkwill, L.-L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music:
Psychophysical and cultural cues. Music Perception,17, 43-64.
Behrens, G. A., & Green, S. B. (1993). The ability to identify emotional content of solo improvisations performed
vocally and on three different instruments. Psychology of Music, 21, 20-33.
Gabrielsson, A. (1995). Expressive intention and performance. In R. Steinberg (Ed.), Music and the mind machine. (pp.
35-47). New York: Springer.
Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance: Between the performer's intention
and the listener's experience. Psychology of Music, 24, 68-91.
Gabrielsson, A., & Lindström, E. (1995). Emotional expression in synthesizer and sentograph performance.
Psychomusicology, 14, 94-116.
Juslin, P. N. (1997a). Emotional communication in music performance: A functionalist perspective and some data.
Music Perception, 14, 383-418.
Juslin, P. N. (1997b). Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgment policy. Musicae Scientiae, 1, 225-256.
Juslin, P. N., & Madison, G. (1999). The role of timing patterns in recognition of emotional expression from musical
performance. Music Perception, 17, 197-221.

Titel
Ohgushi, K., & Hattori, M. (1996). Emotional communication in performance of vocal music. In: B. Pennycook, & E.
Costa-Giomi (Eds.), Proceedings of the Fourth International Conference on Music Perception and Cognition. (pp.
269-274).
Pittam, J., & Scherer, K. R. (1993). Vocal expression and communication of emotion. In: M. Lewis, & J. M. Haviland
(Eds.), Handbook of emotions. (pp. 185-197). New York: Guildford Press.
Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99,
143-165.
Ternström, S. (1996). Soundswell Signal Workstation v 3.4. Computer software, Soundswell Music Acoustics HB:
Stockholm, Sweden.
Back to index

THIS MUSIC IS A PART OF ME: ACQUIRING A SENSE OF OWNER...H MUSICAL PERFORMANCEperforming music to make it yours
IT'S A PART OF ME: SHAPING MUSICAL UNDERSTANDING AND KINESTHETIC
SELF-IDENTITY THROUGH MUSICAL PERFORMANCE
Alyssa Lightbourn, Department of Ethnomusicology, University of California, Los Angeles
Background. In many studies of the psychology of music, the philosophical premises of research
designs privilege the intellectual understanding of the Western harmonic canon as the optimal
doorway to musical appreciation. While widespread familiarity with this theoretical canon informs
certain aspects of performers' musical experiences and facilitates seemingly objective, verbally
efficient accounts of musical understanding, too often, much about musical experiences is overlooked.
Indeed, musicians have the capacity to powerfully experience music according to various physical,
emotional, and psychological sources of knowledge that lie beyond the comprehension of music
theory. Furthermore, these forms of knowledge shape not only experiences of music but also the
development of musicians' personal identities. They contribute to the intimate connection between the
musician and the music.
Aims. As one part of a larger investigation, this study explores the kinesthetic aspect of musical
performance as it relates to the connection between musical experience and personal
kinesthetic-identity.
Method. Six students learning to play traditional North Indian tabla (drum) music, which involves
rhythmic, dynamic, timbral, and pitch-related subtleties, and their common tabla instructor served as
informants for case studies. The informants had varying degrees of both skill, from beginning-level to
professional-level, and familiarity with tabla music, from almost none prior to the initial tabla lesson
to having had tabla music included in their life-long cultural heritage. Also, all but one of the
informants were musicians before starting their tabla studies. At videotaped interviews, participants
provided extensive, detailed verbal details of their kinesthetic experience of playing tabla pertaining to
(1) their perceived connection with the music or lack thereof and (2) their current and past kinesthetic
self-identity. Where they desired, interviewees supplemented their descriptions with performance.
Also, they contextualized their responses according to current and past emotional and intellectual,
musical and non-musical experiences.
Results. Amateurs' and professionals' kinesthetic experience of the musical performances shaped their
sense of connectedness with the music, their awareness of their bodies, and their approaches to
listening to music, albeit in different ways. All informants expected a stronger bond with both the
tabla and the tabla music with continued musical instruction.
Conclusions. The case studies demonstrate individual complexes of physical, emotional, and
intellectual understandings of music. Also, the studies show that performing tabla music contributes in
various ways to performers' senses of identity.
Back to index
file:///g|/poster1/Lightbou.htm [18/07/2000 00:30:18]

parncutt
IS SCIENTIFIC RESEARCH ON PIANO PERFORMANCE USEFUL FOR PIANISTS?
Prof Richard Parncutt
parncutt@kfunigraz.ac.at
Background:
Recent decades have seen significant progress in the physics

(Askenfelt-Jansson, Bork, Conklin, Fletcher-Rossing, Hall, Meyer, Weinreich),
physiology (Altenmüller-Bangert-Parlitz-Peschel, Klöppel, Harding, Moore,
Schlaug-Jäncke, Wagner) and psychology (Clarke, Ericsson-Lehmann-Krampe,
Gellrich, Kopiez, McKenzie, Palmer, Parncutt-Raekallio, Repp, Shaffer, Sloboda)
of piano performance. However, little of this research is known to professional
pianists, and almost none has found its way into piano teaching.
Aims:
We explore what aspects of this research might be useful to pianists during

their training.
method:
Piano performance majors at the University for Music and Performing Arts
attended a series of lecture-demonstrations on selected relevant recent
scientific research. Implications for piano teaching, practising, and
performance were discussed. Arguments were presented for and against strategies
arising from the research, drawing on both scientific evidence and authors' and
participants' practical experience. Two months after the third lecture,
participants were interviewed to explore if, and how, the presented information
had been of practical use. Their teachers were similarly interviewed.
Results:
A pilot study suggested that pianists vary in their willingness to accept and
use suggestions beyond those of their teacher. Of those that are open to
outside influences, pianists with a scientific background (e.g., those who
studied sciences at high school) and those with a relatively analytic approach
to teaching and learning (responding well to detailed instructions rather that
imitiation or intuition) more often took advantage of scientific knowledge. No
consistent preference was observed for physical, physiological or psychological
aspects. Results of the major study will be presented at the conference.
Conclusions:
A relatively small subset of pianists, but one that appears to have

considerable musical potential, can usefully absorb scientific research results
on piano performance. Post-secondary music schools might consider including
such material in their curricula.
file:///g|/poster1/Parncutt.htm (1 of 2) [18/07/2000 00:30:19]

parncutt
Back to index
file:///g|/poster1/Parncutt.htm (2 of 2) [18/07/2000 00:30:19]

The role of performance in the cognitive reality of the hierarchic structure
Proceedings paper
The role of performance

in the cognitive reality of the hierarchic structure
Favio Shifres & Isabel Cecilia Martínez
Universidad Nacional de La Plata
Introduction
Studies in music performance provide evidence about the ability of the performer to communicate attributes of the musical piece to
the listener, such as meter (Sloboda, 1983), phrasing (Repp, 1992; Todd, 1985), dynamics of rhythmic gestures (Gabrielsson, 1987),
texture hierarchy (Palmer, 1996b, 1989), among others. Thus, musical structure turns apparent in the performance and, therefore,
this is the most illustrative focus to represent music (Palmer, 1996a). Even though it might not exist a real performance of a musical
piece, at least it does exist as a virtual one in the mind of both the analyst and the performer .
Performance microstructure - the set of modulations of tempo, dynamics, articulation, pedalling, vibrato, tuning, etc. which the
performer displays beyond the score instructions) characterises the "personality" of a performance. It is well known that this
microstructure is highly controlled during an expert performance. Possibly, the structural representation which monitors the
performance is based on structural attributes of higher organisational order such as tonality. Little is known about the incidence of
the relation between tonal structure and performance. Thompson & Cuddy (1997) discovered that performance microstructure is
very important in the cognition of psychological distances between different tonalities. The microstructure of different voices is
particularly special in this process. Thus, expressive components of the voice leading within complex textures seem to be related to
the understanding of the tonal relationships of the musical piece.
At the same time, many studies about the representation of the musical structure on the listener have been developed. An important
body of research has focused on the analysis of the listeners representations according to concepts derived from theoretical models.
Some principles of different theories of musical structure have already started to be explored (Deliège, 1987; Dibben, 1994;
Krumhanls, 1995). In connection to this research tradition Serafine, Glassman & Overbeeke (1989) have provided evidence about
the ability of listeners to match a melody with its rendered underlying structure, pointing to the cognitive reality of the hierarchic
structure. It is noticeable that, even though it exists great interest to investigate the ways in which structure is communicated during
performance, this issue has not been treated in the studies concerning the listeners representation of musical structure.
The aim of the present work is to verify the results of the mentioned study (Serafine et al., 1989) taking into account the incidence
of some peculiarities of the performance. In doing so, different performances are used as independent variables.
The analysis of the underlying voice leading would seem to be important in the rendition of a coherent performance (Cook, 1990;
Rothstein, 1995). However, it does not seem to be a clear evidence of the existence of objective indicators of a hierarchic
representation in actual performances. Cook (1987) studies the timing related to the structure in a Bach prelude, providing same
evidence about timing in relation to long term prolongational structure. However, in a small scale he presents only partial
information interpreted in an ambiguous way.
This paper focuses on the study of timing as an attribute of the microstructure. During the performance, timing reveals both a
low-level way of organisation related to psychoacustical phenomena and a high-level way that involves structural organisation
(Penel & Drake, 1998; Repp, 1998). A second aim is to analyse the underlying voice-leading as a potential source of timing.
file:///g|/poster1/Shifres.htm (1 of 9) [18/07/2000 00:30:23]

Study of Performance
Two restrictions limited the selection of the stimuli used by Serafine et al. (1989):
1. Given the interpretative nature of the reductions and in order to take the interpretation provided in that study, it was possible
to select only the four analysed examples reported there.
2. Taking into consideration the importance of the vertical timing (chord asynchrony) to analyse the problem of the underlying
voice leading (Palmer, 1989, 1996b) the study was focused on monophonic musical pieces -as the stimuli were not obtained
via MIDI, the only procedure which has proved appropriated for the study of vertical timing at present - reducing the
possibilities of selection to only one musical piece.
Even though the mentioned restriction constrains the scope of the present study, it is considered that the results may provide useful
evidence to future endeavours focusing on performance.
Method
The Performances
Six expert performances of the Bourré I from the Suite Nro. 3 in C Major for solo cello by J. S. Bach (measures 1 to 4 - with the
upbeat) were selected ( Figure 1A). The melody shows a sequence of six rhythmic groups of 2, 2, 4, 2, 2, and 4 beats each, in which
the last duration is always the longest. The performers were Paul Casals (PC), Pierre Fournier (PF), Maurice Gendron (MG),
Mitislav Rostropovich (MR), Paul Tortelier (PT) and Yo Yo Ma (YM).
Thus, the collection of versions, although reduced, show famous interpreters whose interpretative styles are both different and
widely recognised, representing a large range of qualified interpretations of the piece.
Figure 1. a) Bourré I from C Major Cello-Suite by J. S. Bach, m. 1-4. b) the foreground reduction - by Serafine et al.
(1989).
Procedure of Measurement
A standard software of sound edition (Sound Forge 4.5) was used for the analysis. It displays wave forms (amplitude envelops). The
measurement of the onset from the cello signals present difficulties, mainly because of the noise produced by the bow, before the
pitch is clearly defined. As the performer intuitively operates with that interval of time to regulate the onset, it was considered that
the timing would be determined by the moment in which the sound is perceived as one melodic tone. The predominant non-legato
articulation made easier the task providing clear wave decays for each note. However, the low register, the arpeggi, and the original
recordings condition added a great deal of noise and confusion to the onset of certain notes. Therefore, it was followed an aural
procedure in which segments increasingly smaller of the wave were subjected to aural testing. Both researchers analysed the
performances separately. Differences between the two measurements of each onset were not higher than 10ms. In those cases in
which the difference was higher, the agreement was reached through Inter ratter sessions. Thus, both visual and aural cues were
considered, and then â€"if necessary- further analysis of the fundamental frequency and spectrogram were run.
In that way, 22 inter-onset intervals (IOI) were determined. The onsets 2-1 (measure 2 first beat) and 4-1 correspond to a chord â€"
which, according to the possibilities of the cello is performed as arpeggio-. In such cases, the considered onset corresponded to the
highest pitch, since this is the one represented in the reductions. Each IOI â€"measured in milliseconds- was divided by the nominal
value of the note according to the tempo of the performance. It is obtained dividing 15.000 (the number of milliseconds of a minute,
divided by 4 â€"the number of quavers which are contained in a single temporal unit, the half-note) by the actual average duration
of the minimal unit. A proportion of deviation of the actual performance from the nominal value was obtained. These values were
graphically represented as profiles of expressive timing, in which the horizontal axis represents time and the vertical one represents

the expressive deviation of each note. The value 0 represents the nominal timing.
Results
Overall characteristics of timing
BasicTempo (BT). Determining the tempo is important since: i) it is basic to calculate the timing profiles, and ii) it is probable that
different tempi give rise to encoding the timing in different ways, given that there is an optimal zone for the detection of the
temporal variations (Drake y Botte, 1993). The more varied the timing modulation the least applicable the procedure explained
above in order to obtain the basic tempo of a performance, because sharp variations produce strong deviations from this proportion
(Repp, 1998). It is expected that the selected fragment does not show this problem provided that dance style and tempo require a
particular adjustment in timing. However, due to considerations of technique performance, the two chords (mm. 2-1 and 4-1) might
modify substantially the IOI average. Therefore, they were removed in order to calculate the basic tempo, considering only 20 IOI.
The tempi used by the cellists comprised a range of ï•¨ï€ = 59 (MR) a ï•¨ï€ = 88 (YM) [PC = 83; PF = 71; MG = 85 y PT = 74].
Relative Modulation Depth (RMD). It is the IOI coefficient of variation (DS/Mean), and allows the comparison between the amount
of temporal variation of all the examples even though if they are performed in different tempi. The degree in which it is possible to
detect the temporal variation also depends of the level of dispersion of the sequence (Drake & Botte, 1993). The RMD ranged from
0.24 (MG) to 0.71 (MR) [PC = 0.45; PF = 0.30; PT = 0.42; y YM = 0.33]. The tempi utilised by the cellists presented a trend to
correlate negatively with RMD with a marginal significance (considering the small number of degrees of freedom) of r(4)= -.753; p
< .084. This suggests an association between slow tempi and greater variability.
Comparison of the Timing Profiles

The timing profiles were graphically averaged using the arithmetic mean of each IOI. (Figure 2A- Error bars displays DS).
Some features of this mean refers to the voice leading:
Groups 2 and 5 (inner voice, see figure 1b): they are performed slightly faster than groups 1 and 4 respectively, which show the
same rhythm. An ANOVA Repeated Measures which considered the 6 Performances as a within subjects factor and the Voice
(upper - lower) as a between subjects factor showed a significant effect of Voice (F[1,5] = 15.781; p = .003). The interaction
Voice-Performance was not significant, indicating that the faster performance of motives 2 and 5 was a common strategy. The
criteria of "slower, more emphatic" as a principle within -context, suggests that a lengthening implies the emphasis of the
lengthened note. Notice that, as the rhythm of the piece involves both quarter notes and eighth notes, a shorten does not necessary
imply an actual short value.
Groups 1 and 4. Group 1 emphasises the E and the G as belonging to the I degree, being F far more shorter. While in group 4 a
similar behaviour would be expected, it shows a longer E and in a trend of lengthening toward F (the 7th of V chord). However, E
and F show both high DS suggesting the use of different strategies.
Groups 3 and 6. They exhibit a similar timing. The non structural A (in 3) and D (in 6) are shorter. Evidently, the eighth notes
following the arpeggi tend to compensate the time demanded by its execution. However, the non structural notes seem to absorb this
compensation. Again, these points show high DS revealing different strategies.
In order to analyse both the differences between versions and the use of common strategies it was run Principal Component
Analysis in which two factors expressed together the 72.45 % of the profiles variance. A Varimax rotation distributed this total in
two factors (42.66 % and 29,80 %) which can be interpreted as different timing strategies. These factors present different
characteristics which are described as Properties of the strategies. Properties relevant to the reduction presented are analysed (Figure
2b).

Figure 2. A) Means of the timing profiles. B) Factors I and II. C) Comparison of Factor I with the performances of Rostropovich
and Ma. D) Comparison of Factor II with the performance of Gendron.
Group 1. Although components appear to be very similar, factor I shows a higher relative duration of the first note (E).
Group 4. While factor I (FI) lengthens the passing-tone E, factor II (FII) shortens it. Thus, taking into account that the rhythm,
emphasises F because it is a quarter note, FI reinforces the interval E-F and FII, D-F (this will be named Property A)
Group 3. A great lengthening of the C characterises FI. Inversely, FII lengthens the B. The later is largely emphasised by the
noticeable shortening of the A. Thus, FI shows a relative emphasis of the A, while FII dissembles it. (Property B).
Group 6. In the parallel motive the emphasis is almost reversed in the first note: now is FII which emphasises the G previous to the
arpeggio. At this point it emerges a conflict between the metric position of the tones and their condition as belonging to the C Major
chord. In both factors, timing is used to search the equilibrium. In m. 4-1, FI privileges the metric component and FII the tonal one
(emphasising the E). On the contrary, in m. 4-2 the pattern is reversed, and FII emphasises the metric component while FI does the
same with the tonal one. Therefore, the final emphasis of FI is E-C and the one of FII is D-C (Property C).
Analysis of the individuals performances

Fournier presents the highest loading with FI (.908; PT: .778,YM: .738 y MR: .642) while Gendron does it with FII (.893; PC: .661)
- Figure 2d. However, according to the analysis of the Properties A, B and C, Rostropovich seems to exaggerate the Property A (as
representative of FI which is opposite to FII which emphasises A) and fits with FI concerning properties B and C (figure 1c).
Rostropovich is thus an "exaggerated" example of FI. Conversely, Ma accords with FII in property A (showing a lower loading:
.479!), while it is almost neutral in properties B and C. Thus, Ma might well qualify as a case of Factor I which tends to approach to
Factor II precisely in A, B and C.

Discussion
The analysis of expert performances of the first phrase of the Bourré I from the Suite No. 3 by Bach revealed that renown artists use
a variety of timing strategies. However, it is possible to find some commonalties: for example, the acceleration of the lower voice
reveals that timing is used to differentiate polyphonic principles in this monophonic texture. Thus, timing strategy is operating on
the clarification of particular features of the voice leading, at least in a overall sense. Another communality is the lengthening of the
IOI involving arpeggi.
Differences in the timing strategies range from global aspects ( i.e. adopted tempo) to local effects (i.e. performance of non-chord
tones). In spite of the rhythmic features, it was not possible to find any association between a specific tempo and particular strategy
of timing. Contrarily, extreme tempi (Ma and Rostropovich) both were identified with the same factor II.
Noticeably, differences in the timing strategies appear to be connected to the prolongations at the voice leading level. Thus,
observing the reduction in figure 1b, FI y FII may be understood as different strategies to the performance of the passing notes and
neighbour notes.
There would not be expressive deviations in order to point to the phrasing or the texture because the fragment is short and
monophonic. In that way, tonal and rhythmic-metrical components would be the more relevant to the microstructural organisation
of the example. Some of the temporal variations could be explained as a result of bottom-up processes involved in rhythmic and
melodic perception (Drake, 1993). But the lack of similar timing patterns which match similar structural patterns reveals the
existence of other sources related to the deeper musical structure.
Very different strategies could be observed not only between the artists but also within a single musician. For example,
Rostropovich uses two different strategies to face a similar structural problem: in m. 2-2 it privileges the metric structure and
lengthens the note in the strongest metric position. On the contrary, in m. 4-2 shortens the note in the strongest metrical position and
emphasises the note which is tonally structural. It seems, then, that neither the metric structure nor the underlying voice leading are
able to explain separately, all the temporal alterations.
Study of Listening
In order to verify the influence of the performance in the representation of the hierarchic structure Experiment 1by Serafine et al.
(1989) was followed. In that experiment subjects listened to both a model melody and two reductions â€" true and foil reductions in
schenkerian terms- and match one of them to the melody. In our experiment different versions were used as independent variables,
under the assumption that subjects will tend to choose the foil if it displays notes which, although non structural, are emphasised by
the performer. For example, if the foil displays a E in m. 2-4 (Property B) instead of D (Figure 1a and 1b - compare to Figure 3 b)
subjects will prefer it more while matching with the Rostropovich's performance (who emphasises the E) instead of while listening
the GendronÂ´s version (who shortens the E).
In order to minimise the effect of repetition of the same piece on the learning of its own structure, the number of versions was
reduced, selecting those which represented the most interesting interpretations to be tested. Thus, GendronÂ´s version (representing
FII) and MaÂ´s and RostropovichÂ´s ones (representing FI with different tendencies in the Properties and extreme tempi) were
used.
Method
Subjects
N = 40 (60 %, with moderate musical experience - mean = 4.8 and 40 % without musical instruction). Mean age = 21.4 years (18 -
36).
Stimuli
Stimuli consisted on three of the performances studied, played by Gendron, Rostropovich and Ma. Besides four reductions for each
of them were synthesised. The first, Original (Figure 1b), is the one proposed by Serafine et al. (1989). The other three, Foils A, B
and C, present only note different from the Original (figure 3). The change in FA refers to Property A. It is observed that B was not
replaced by A; it was added instead, due to the remarkable emphasis on B as a consequence of the arpeggio. So, eliminating it
would have distorted the task. For this reason, FA is "less reduced" , that is to say, corresponds to a more superficial level. FB
presents a change which refers to Property B (D changes by E). FC presents a change which refers to property C (E changes by D).

Figure 3. Foil reductions used as lure. FA: B changes for B-A from m. 2 (Property A). FB: D changes for E from m.2
(Property B). FC E changes for D from m. 4 (Property C).
Both the tempo and the timing profile of the corresponding version were kept as the reductions. For each note of the reductions the
onsets measured in milliseconds were determined according to the way in which the artist played them in the original version.
However, as the reductions did not include the arpeggi, the whole previous IOI sometimes sounded unmusical - with a break that
interferes its continuity-, for that reason, a proportional shortening of this lengthening was implemented . Parameters of dynamics
and timbre (cello) were kept constant all along the 4 reductions.
Procedure
In each trial, subjects listened the melody (Model) and two reductions (1 and 2) in the following sequence: Model, 1, 2, Model, 2,
Model, 1, Model, 2, 1. One of the two reductions was always the Original, and the other was a Foil. After listening to the sequence,
subjects: 1) chose "the best reduction of the model", and 2) indicated how sure they were of the answer - not sure, more or less sure,
or very sure. There were two warm-up examples taken from another fragment. It was included a 15 second fragment from other
pieces for cello by Bach in order to "separate" the trials, with the purpose of spacing the repeated listening of the same fragment.
The whole session lasted approximately 20 minutes.
Design
There were three sequences for each performer, which compared Original/ Foil A (Property A test), Original / Foil B (Property B
test), and Original/ Foil C (Property C test). Thus, the whole test consisted of 9 trials, which were presented in different orderings
according to 1) an order within the pair; 2) the Foil that belongs to the pair; and 3) the performer.

Data for 1 or 2 responses and confidence ratings were translated into a single score ranging from 1 (very sure 2) to 6 (very sure 1)
where 3 and below represented "2" and 4 and above represented "1". Then, data were translated to represent 1 = Original and 2 =
Foil. Thus, the chance value is 3.5.
Due to an error in the recordings of the test, the Property C test for Rostropovich performance could not be run (Figure 4). This
contingency obliged to consider data separately. However, as Property B test was run twice, these data were used as a measure for
the reliability of the responses. A T test indicated that the means of responses for the two Property B tests for Rostropovich
performance was not significant (t(39)=.845; p = 403).
Predictions:
1. Property A test: while listening the pair Original/FA, subjects will prefer Original for Gendron and Ma and FA for
Rostropovich;
2. Property B test: while listening to the pair Original/FB, subjects will prefer Original for Gendron, FB for Rostropovich and an
intermediate value for MA;

3. Property C test: while listening to the pair Original/FC, subjects will prefer Original for Rostropovich and (at a lesser extent)
MA and FC for Gendron.
Figure 4. Means of the ratings for the different foils (A, B and C) against the Reduction in the versions by MG, MR
and YM. The lower the value of the ratings the higher the preference for the Foil
Results indicate that:

1. FA and FB tended to have higher ratings for Rostropovich than for Gendrom, according to the prediction. A 2 (Versions) x 2
(Properties) ANOVA repeated measures showed a F value moderately significant for the factor Version (F[1,39] = 5.058; p=
.030). In other words, Rostropovich's performance is better reduced by both FA and FB than by the Original. Besides, FA and
FB reduce better the Rostropovich's performance than the Gendron's one.
2. Ma does not show significant differences with Gendron (F[1,49] = .72; p = .790). This reveals that, although the Ma's
performance shows a timing profile more similar to Factor I, the timing similarities related to FII in Properties A, B and C are
communicated to the listeners.
3. Although listeners preferred more FA for Rostropovich than for the other cellists, FA was more chosen than the Original for
the three artists. Possibly, FA represents indeed a more superficial level of reduction than Original. Thus, subjects could be
chosen the more superficial reduction.
4. Predictions do not explain why while listening to Gendron's performance, subjects choose most Original in the Property C
test than in the Property B test. However, they do explain why listeners choose it while listening the Ma's performance and
why the rating for that version is higher in Property C test.
5. Property B test is the one which is closest to the prediction: listeners prefer the Original for Gendron's performance, FB for
the Rostropovich's one and hesitate between both of them for Ma's one. Although the corresponding ANOVA was not
significant, a post hoc analysis found a marginal significance for the difference between Gendron and Rostropovich
(F[1,39]=3.49; p=.069).
6. Finally, a significant main effect of Reduction for Ma and Gendron (F[2,78]=6.822; p=.002) indicates that the way in which
the Foil is composed is very important. Therefore, the composition of the lure permits to appreciate the extent of the
conclusions. The lure composed by Serafine et al. for this melody included two different notes which did not belong to the
local context. If in SerafineÂ´s test the lures used would have been different, containing notes which in fact belong to the
local context, results would have possibly been different also.
Notice that in this experiment the task was rather different than the task in the Serafine's experiment because subjects had to listen
the true reduction much more times than the foils, and this did not happened in SerafineÂ´s test. And, although the results are not
clear in this sense, it would be useful an experimental design in which the number of times each reduction is listened to would be

better balanced.
General Discussion
The analysis of the performances showed different timing strategies. Many differences are related to the relative lengthening of the
structural notes compared to the more superficial ones. Possibly, the variety of timings may be the manifestation of diverse ways to
conceive musical structure in reductional terms.
The main aim of the present study was to verify performance effects in the representation of the tonal structure. Although the
fragment employed seems a priori not to require of noticeable timing variations in order to be expressively performed, the values of
the RDM (Relative Modulation Depth) revealed different modalities for the microstructural control.
The performer's possibility to emphasise notes which are in weak metric positions through subtle enlargements, reveals that the
aspects of timing combined to conditions of melodic-tonal coherence may exert an important action in the mental configuration of
the underlying structures. Although the findings are not strong, mainly due to the limitations of a design highly constrained, it is
possible to state that musical performance has incidence in the matching task of a melody with its rendered structure. On the one
side the timing strategies may reveal the kind of structural representation of the performer and on the other side they may convey
such structure to the listener.
From a structural point of view, different microstructures may facilitate or interfere the processes of tonal tension and relaxation in
the listeners representation. This is congruent with the findings of Thompson & Cuddy (1997). However, it may appear to be in
contradiction with important concepts of the theory which does not take into account duration's aspects. Noticeably, the reductions
presented here are very near the surface, a context in which rhythm is taken into consideration by the theoretical framework.
Related to this, it is important to state that listeners tend to match much stronger the option which presents a more superficial (even
subtle) level. Although it is necessary to investigate more this aspect, it could reveal that the reductional process is not automatic;
instead, it requires an activation. If the listener can not activate this process, he will remain at a more superficial level. If this holds
water, it might stand out the role of performance as activator of the process of reduction.
Although the results are very incipient they talk about the needs of considering the findings of listening studies according to the
research about microstructural components of performance. It is particularly relevant to those studies which intend to generate
ecologically valid contexts. Although it has been investigated only one aspect of the microstructure, possibly other attributes have
similar or even more incidence than the timing on the hierarchic listening. It is assumed that the dynamics and the control of tuning
and vibrato -in instruments which allow it- may be powerful attributes which provide important cues to the listener during the
abstraction process of the tonal hierarchies.
References
Cook, N. (1987). Structure and Performance in Bachâ€(tm)s C Major Prelude (WTC1). An Empirical Study. Music
Analysis, 6 (3), 257-272.
Cook, N. (1990). Music, Imagination, & Culture. Oxford: Oxford University Press.
Deliege, I. (1987). Grouping Conditions in Listening to Music: An Approach to Lerdahl & Jackendoffâ€(tm)s
Grouping Preference Rules. Music Perception, Vol. 4 No. 4, 325-360.
Dibben, N. (1994). The Cognitive Reality of Hierarchic Structure in Tonal and Atonal Music. Music Perception, 12 No
1, 1-25.
Drake, C. & Botte, M. C. (1993). Tempo sensitivity in auditory sequences: Evidence for a multiple-look model.
Perception & Psychophysics, 54 (3), 277-286.
Drake, C. (1993). Perceptual and performed accents in musical sequences. Bulletin of the Psychomonic Society, 31,
107-110.
Gabrielsson, A. (1987). Once Again: The Theme form Mozart's Piano Sonata in A Major (K.331). In A., Gabrielsson.
Action and Perception in Rhythm and Music. Publications issued by the Royal Swedish Academy of Music No 55. 81-
103.
Krumhansl, C. (1995). Music Psychology and Music Theory: Problems and Prospects. Music Theory Spectrum. Vol.
17 No. 1, 53-80.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human
Palmer, C. (1996a). Anatomy of a Performance; Sources of Musical Expression. Music Perception, Vol. 13 No. 3,

433-453.
Palmer, C. (1996b). On the assignment of Structure in Music Performance. Music Perception, Vol. 14 No. 1, 23-56.
Penel, A. & Drake, C. (1998). Sources of timing variations in music performance: A psychological segmentation
model. Psychological Research, 61, 12-32.
Repp, B. H. (1992). Diversity and commonality in music performance: An analysis of timing microstructure in
Schumannâ€(tm)s "Traumerei". Journal of The Acoustical Society of America, 92 (5), 2546-2568.
Repp, B. H. (1998c). Variations on a Theme by Chopin; Relations Between Perception and Production of Timing in
Music. Journal of Experimental Psychology: Human Perception and Performance, Vol. 24 No. 3, 791-811.
Repp. B. H. (1998d). A microcosm of musical expression. I. Quantitative analysis of pianistsâ€(tm) timing in the initial
measures of Chopinâ€(tm)s Etude in E major. Journal of The Acoustical Society of America, 104 (2), 1085-1100.
Rothsthein, W. (1995). Analysis and the act of performance. In J. Rink (Ed.). The Practice of Performance. Studies in
musical interpretation. Cambridge: University Press. PP 21-54. In J. Rink (Ed.). The Practice of Performance. Studies
in musical interpretation. Cambridge: University Press. PP 217-240.
Serafine, M. L.; Glassman, N. & Overbeeke, C. (1989). The Cognitive Reality of Hierarchic Structure in Music. Music
Perception, 6 , 397-430.
Sloboda, J. A. (1983). The communication of musical metre in piano performance. Quarterly Journal of Experimental
Psychology, 35A, 377-396.
Thompson, W. F. & Cuddy, L. L. (1997). Music Performance and the Perception of Key. Journal of Experimental
Psychology: Human Perception and Performance, Vol 23 No. 1, 116-135.
Todd, N. P. (1985). A Model of Expressive Timing in Tonal Music. Music Perception, 3 (1), 33-58.
Recording References
Bourré I from Suite No. 3 in C Major for Cello Solo
(Artists. Company. Number)
Casals, Paul. EMI. CDH - 7 61028 2
Fournier, Pierre. Archiv Produktion. Stereo 449 711-2 gior 2
Gendron, Maurice. Phillips. 442 239-2
Ma, Yo Yo. CBS Masterworks. M2K 37867
Rostropovich, Mstislav. EMI. 7243 5 55365 2 5
Tortelier, Paul. EMI. 7243 5 73526 2 8
Back to index

demos
Demonstration Papers
These run concurrently with the poster sessions on Sunday and Wednesday
First Author Poster title Day Room.
Dalgarno, G. A vibroacoustic system for experiencing music Sunday Fraisse
Furno S. Concepts et cat?orisation dans le champ du son musical: Le TAS Sunday Helmholtz
A study of the nature of components of disciplinary structure in applied violin Wing
Gholson, S. performance in expert teaching practice (ABSTRACT)
Sunday
Underhill, J. & Carterette

Set theory and harmonic structure (ABSTRACT) Sunday
Lewis, C.
Vistamusic: an environment for assisting physically disabled people to produce Carterette
Dalgarno, G. musically expressive performances Wednesday
Langner, J. Real time analysis of Dynamic shaping Wednesday Helmholtz

Arriving at humanistic messages through the use of extra-musical Strasser
Stanislawski, A. connections in teaching musical masterpieces
Wednesday
Webster, P. Music composing software for people from ages 6 to 60 Wednesday Seashore
Return to main menu page
file:///g|/demos/demos.htm [18/07/2000 00:30:24]

A vibroacoustic chair for the better perception of music by hearing impaired people, and for therapeutic use
Proceedings paper
A vibroacoustic couch to improve perception of music by deaf people and

for general therapeutic use.
G. Dalgarno
Unit for the Study of Musical Skill and Development,
Dept. of Psychology.
University of Keele,
Staffs
ST5 5BY
g.dalgarno@keele.ac.uk
Introduction
Vibrotactile and vibroacoustic chairs/couches have been used in two areas: for therapeutic use and for helping hearing impaired to perceive
music. Pioneers in the therapeutic use have been Olave Skille [1] and Tony Wigram [2]. Our own work was for many years solely in the
perception of music area [6], but in the last three years we have moved very much into the therapeutic area.
Systems have been produced commercially; in the UK these include those made by The Sound Beam Project and overseas by the Somatron
Corporation in the USA, and a number of systems from Scandinavia.
These have fallen into two classes: (a) those where music is played though headphones and the vibration in the chair or couch is unrelated
to the music (e.g. those designed or employed by Skille and Wigram); and (b) where the vibration is driven by the music itself.
Type (b) is of course the only type of interest for the perception of music by hearing impaired people. We believe it is to be preferred for
therapeutic use, other than for specific physical treatments designed for example to improve joint angles in people with cerebral palsy. (The
latter topic is not covered in this paper.)
Why is it then not more used? We believe this is because of the several technological problems which have to be solved to make this
possible in a truly satisfactory way. The solution to these will be described. Incidentally, here is then no conflict between the requirements
of designing a chair/couch for therapeutic use and for music perception; only the way of using it is different.
Features to be considered in obtaining a satisfactory system

1. A suitable type of sensation
First, and most important, we seek a means of generation vibrations in the body which a gives a sensation which is as analogous as
possible to the sensation of hearing music. Thus while it is well known that the finger tips are the part of the body most sensitive to
vibration, the sensation of vibration in the finger tips is not very analogous to hearing. Our research has shown that the best part of
the body to apply vibration is the back, above the lungs. However it is not a matter of creating a superficial vibration there on the
skin; it has to be a deep body sensation if it is to be judged analogous to hearing. Hence we are considering large transducers and
powers of 15 watts and above, rather than the small, low powered transducers which are employed to assist in the conveying of
speech in a number of commercially available aids for deaf people. For a review of the latter see [3].
Having established that, it is not just a matter of putting a loudspeaker driver (or other transducer) in a box or buried in the upholstery
of a chair. Doing that will certainly provide a sensation of the rhythm and a feeling of the bass notes, and indeed that is largely what
is felt by using some of the existing commercial systems. But our goal is to go well beyond that and to do so we have to consider all
the factors below.
2. To produce the desired sensation over the widest possible range of pitches.
Low notes are no problem. In any case it is undesirable to go below about 30 Hz: to do so is to invite feelings of discomfort even
nausea. It is extremely dangerous to use frequencies below about 12 Hz because the body's internal organs can be driven into
resonance. Fortunately, most if not all, commercial systems are fail safe in that respect because neither the amplifiers nor the
transducer have a significant response at frequencies much below 20 Hz.
It is the high notes which are the problem. Merely embedding loudspeakers in the upholstery or mounting them in a plywood box
does not give a good acoustic design, and many such systems respond effectively only up to bass notes.
file:///g|/demos/Dalgarn1.htm (1 of 5) [18/07/2000 00:30:26]

Before one can discuss the upper limit of pitch response one has to define the conditions. There are several mechanisms in the body
though which mechanical vibrations may be transduced, giving nerve impulses which can be received by the brain. Some of these
mechanisms operate best at around low frequencies, eg 30 Hz; the one capable of best transducing the higher vibrational frequencies,
up to a maximum of about 950 Hz, is through the Picinian corpuscles [4] and [5]. However the energy required to obtain a response
increases very rapidly with increasing frequency. Hence if extremely high powers are used at the frequencies in question, then the
result is a volume of sound as a by-product which would be considered unacceptable by most people. Indeed almost any design could
be pressed into giving a higher frequency response if the resultant volume of sound were not a problem. Further as well as excessive
volume of sound, it would be a sound which is distorted and unpleasant to the ear irrespective of its volume.
Hence we will define the response as that obtained using no more than 10 watts of electrical power supplied to the transducer at any
frequency in the range claimed. Most of the vibroacoustic chair/couches on the market are then capable of responding up to about the
note A3 (in the notation where C4 is middle C).
This is the fundamental reason why many give a sensation of little more than the rhythm and the bass at acceptable levels and quality
of sound produced as a by product of the vibration.
A major part of our work has been the design of a suitable acoustic coupling system. Our most recent design enables us to go up an
octave higher, to A4.
The details of this design, which is also helpful in obtaining the desirable type of sensation as described in 1., have to remain
confidential until patenting is complete.
This means that the tune, rather than just the bass of a small number of original pieces of music can now be perceived. But by using
arrangements of music, eg pieces written for say the violin but arranged for the 'Cello or Euphonium the repertoire of pieces can be
much enlarged. Further with modest amount of pitch shifting downwards, a reasonable number of original pieces are also brought
within range. It should be remembered that large amounts of pitch shifting, or transposing by more than an interval of a fourth,
usually gives a result which is musically unacceptable for reasons well understood in musical acoustics. However this of course
applies for listening through normal hearing and we deal with hearing through the ears separately as will be described later
3. To obtain an even response to different pitches over the useable range.
Electronic equalisation can be applied but for this to be successful the acoustic design should be such that there are no sharp
peaks and troughs in the frequency response of the transducers as combined with their coupling system to the body. (An
exactly similar requirement is found in the design of loudspeaker drivers /loudspeaker enclosures for Hi-fi use.)
Having achieved this equalisation can be applied to the signals sent to the power amplifiers so that a sensation of vibration at
the different frequencies in the range is judged subjectively as equal, or rather of a level corresponding to approximate equality
when listening to sounds of the same pitch.
With variations in degrees of equalisation required which change quite rapidly, albiet smoothly, with frequency, it is
indispensable to use 1/3 octave equalisation. This can be achieved conveniently using a commercially available 30 band
graphic equaliser, such as the Alesis model M-EQ230. The table below shows a typical equaliser setting for our system, this
being a combination of the relative sensitivity of the body and the particular design of the acoustic coupling and of the
transducer.
Frequency (Hz) 25 31 40 50 62 80 100 125 160 200 250 320 400 500 640 800 Onwards
Gain/Atten. (dB) 0 -6 -8 -9 -10 -11 -12 -12 -11 -10 -4 4 10 12 -2 -12 -12
Why not apply even more equalisation - eg by using two such graphic equalisers in series? Because as discussed in 2., the
volume of sound as produced as a by-product together with its unpleasantly distorted nature, should prohibit it.
4. To carry out amplitude compression.
The volume of sound of symphony music, live at a classical concert, might range from 35 dB HL to 115 dB HL. While the ear can
cope with such very large range, the tactile sense certainly can not and very considerable compression would be necessary - so that
the range of volume variation is within 10 dB and ideally 6 dB.
In "pop" music the range is very much smaller and sometimes it is in already amplitude compressed - reducing further the already
small range. Sometimes little or no additional compression is necessary.
There is no technical difficulty in carrying out the compression; a commercial dynamic range compressor such as the Behringer
model MDX 1200 or the Alesis model 3630 is suitable for the purpose. If very great compression is required it may be necessary to
use two compressors in series with appropriate settings on each.
A note on 3. & 4.
An alternative to the use of a 1/3 octave equaliser and a dynamic range compressor (on each separate channel - and there may be up
to 4) is to produce a special recording on tape with this processing already done.
5. To have an adequate means of conveying pitch.

The following applies only to the application for the perception of music by deaf people - and then only to those with no hearing
whatsoever.
Pitch perception was formerly the greatest problem, in the application for completely deafened people, because the sense of pitch,
conveyed through vibration in the absence of any cochlear action, is very poor. For western art music at least, one needs to be able to
discern, as a minimum, a semitone, whereas the discrimination here is about a fourth or a fifth under realistic music listening
conditions. Moreover, there is no sensation of musical intervals - not even a sensation of an octave.
People who can do much better - eg who say they can discern musical intervals and to be able to discriminate semitones or better,
have in the authors experience have always had some cochlear action at the frequencies in question. They might insist that they are
"stone deaf" but that does not mean that they are!
Accordingly an attempt was made to represent pitch by using a spatial array of transducers. This is fully described in [6] and will be
mentioned here only briefly.
Each transducer responded to one musical note, irrespective of its octave. With the frequency of vibration doubling in each octave the
octave can then be discerned readily, at least over a 3 octave range.
In our first attempt the array was on the forearm (inspired by the pioneering work of Von Békésy). One forearm was used for the tune
and the other for a monophonic (ie single note) accompaniment. Our second attempt was a truly extensive digital system - using the
digits - the fingers! - as well as the digital electronics used in the first system.
Although both the systems worked technically in the way we had designed them, neither was practical success for reasons explained
in the paper.
Hence we would not been able to produce a satisfactory system for those people who had no hearing whatsoever, had it not been that
multi-channel cochlear implants came along at just this time when it appeared we were unable to provide a satisfactory solution to
this pitch perception problem. These give enough hearing sensation for a large proportion of users - results varying very much for
different individuals for reasons which no one seemed to understand, at least at the time. For some the results were so good that
music could be enjoyed with the cochlear implant alone.
6. Hearing and feeling music simultaneously.
It is highly desirable to present music to the ears and to the body separately. The reason for this is that if a compromise is attempted
by pitch shifting or transposing the music downwards, it will generally be too low for clarity of hearing yet remaining too high for
clear and comprehensive vibrational sensations (particularly with other vibroacoustic chairs/couches). Moreover the envelopes of
amplitude will not be optimal for each; and the harmonic content of the signal driving the vibration systems will generally not be
low, so giving strong fundamentals.
Only very small and low powered loudspeakers are required for the ears, or if desired, headphones may be chosen. The choice is
down to the individual. Some do not like the pressure or feeling of headphones on the head; some like them because they help to
exclude the sound of the vibration in the rest of the chair/couch.
Which ever of these is used, their function is important: to convey the high pitched information. Remember that since even with our
developments the highest note sensed by vibration alone is A4 (concert A) or at most few semitones higher..
The system we employ is different according to whether it is MIDI based music or ordinary audio.
With MIDI we produce a copy of the notes of the music an octave down (occasionally, when appropriate, 2 octaves down) and
usually with different instrument sounds - chosen to have a suitable amplitude envelope and a strong fundamental pitch, giving
optimum vibratory perception as well as optimal hearing. The system for this is described in [6].
With ordinary audio we cannot do this nearly so well, but by signal processing of the signals to the head speakers and for the
vibrational systems separately, we do the best we can in the above direction
7. Simplification but maintaining the essence of the music - reduction.
When perception via vibration rather than a pleasing accompanying sensation to hearing is desired, just 2 or 3 monophonic parts is
ideal, particularly to start with. Later as the person builds up some experience of vibratory perception, they can perhaps go on to up to
4 parts later, but we think not beyond that number. The use of music with this restriction on the number of parts is strongly
recommended for profoundly deaf people.
For therapeutic use this requirement is not necessary, which is fortunate as this would be too restrictive of the repertoire of music for
this purpose. But some reduction of original music is often useful where the music is complex and this may be done by the selection
of suitable arrangements of music.
8. Separation of parts.
The ear/brain is able to separate the sound of instruments wonderfully well. Thus the sound of a prominent oboe part in a symphony
orchestra can be picked out against a background of perhaps 80 instruments playing at the same time. Moreover this is a task that
even a room full of computers is unable to accomplish.
However, the body through the sense of touch, certainly cannot do this! Hence we believe it is very important: to simplify the body's
perceptual task, and we do this by splitting up the parts and sending them to different parts of the body. For example suppose we
have a trio arranged for baritone saxophone, trombone and bass guitar. We might place the vibration generated by the saxophone on

the right side of the back, that from the trombone on the left back and the guitar's on the buttocks and thighs. If in addition there were
another bass instrument, playing lower notes, we might place that on the feet. Not everyone would prefer that choice of positioning.
In particular, some people like the tune on the left back and the counter-tune on the right back and vice versa for others. The actual
choice of which part goes where can be made by the individual if desired, simply by setting 4 rotary switches.
How is this separation of parts achieved?
With a MIDI based system each part is allocated a separate MIDI channel and a MIDI sound generator is used which has a facility to
allocate a separate audio output to each MIDI channel. For example the Roland D100 is very suitable for driving the vibrational
systems and these are now available at very moderate cost second hand. Further details can be found in [6].
Without MIDI it is more difficult, yet having a means is important because, with MIDI the repertoire is limited, (although ever
increasing) and, perhaps more importantly, many of the commercially available MIDI based performances are of a mechanical nature
without expressive feeling. One solution is to make special recordings. While acceptable for duets and trios, costs (or difficulties in
finding suitable volunteer musicians) would increase rapidly with larger groupings. Another difficulty is that to obtain really good
separation (and this is not something which is required, or even desirable, for normal audio listening) the instruments would have to
be widely spaced and with microphones very close to the instruments. (The ideal would be in separate sound booths).
A solution which we have employed is to add a part to a commercial recording (eg an ordinary CD of the music). To do this we ask a
musician to play along with one of the performers on the recording on a piano type keyboard and we record this as separate track on a
four track recorder, the other tracks being used for the signal processed version of the original recording. This requires some special
skill (beyond being a reasonably good pianist or organist) and dedication for preparation of the part of the musical since the
"doubled" musician cannot be seen, nor the conductor if there is one. There are advantages in using a MIDI based, rather than just an
audio producing keyboard system for the doubling of a part: it allows the performer to hear what is being played by them at the same
pitch as the doubled instrument and with a suitable aesthetic choice of timbre for the performance. Simultaneously, e a sound in a
different octave, amplitude envelope and harmonic content can be generated for recording on the multi-track tape. Some performers
are aided in this process by being offered a monitoring system which allows them to hear their own playing in one ear, and the
original recording in the other,
9. Aesthetic, and ergonomic considerations
To be acceptable a chair/couch has to look nice; to appear as a professional product. It also has to be suitable ergonomically; it
has to be comfortable, easy to get on in and out of, or on and off in the case of a couch/bed. It has to be constructed so that it is
physically safe; without any risk jerky movements which might jar the user and of course any possibility of collapse.
We do not have the skills or knowledge to achieve the above; accordingly we begin with what we believe to be the best of the
existing commercial products, the Somatron range of vibroacoustic chairs/couches/beds, which as well as being good
vibroacoustically, meets all the above criteria very well. We then either replace the actual vibratory drive units with those of
our own design, or modify the Somatron drive units to our own design.
A brief word on results

This paper is about design requirements and results will be mention very briefly. (These are the subject of a paper in preparation.)
People who prior to being become very deaf, were music lovers, and who have experienced other vibroacoustic systems, have without
exception, said they said something like "I have never experienced music anything like so clearly". No statistic needed here!
We have tried it with 6 children who were "attention demanders" in the classroom; these children were described by their teachers as not
giving them a minute's peace. Of these, 5 of were calm and relaxed and showing every signs of enjoying the experience within 2 minutes.
This amazed some of the teachers - who said that has never seen these children like that, that they went to fetch colleagues.
Very good results, which could truly be described as "heart-warming" have been achieved in brightening up the lives of very severely brain
injured children and young adults.
Further trails are planned or in progress, and we hope to carry out trial with people who have certain mental illnesses, in co-operation with
medical colleagues.
References
[1] Skille, O. (1992) Vibro Acoustic Therapy : Manual & Reports. Levanger, Norway. ISVA
[2] Wigram, A.L. (1996) The Effects of Vibro Acoustic Therapy on Clinical and Non-Clinical populations. PhD Thesis.
www.members.tripod.com/~quadrillo/VAT/tonyphd.htm
[3] Summers, I.R. (Ed.) (1992) Tactile Aids for the Hearing impaired. London: Whurr Publishers Ltd
[4] Bolanowski, S.R. et all (1988) Four channels mediate the mechanical aspects of touch. J Acous Soc AM 84: 1680-1694.
[5] Varillo, R.T. et all (2000) Some basics of tactile sensation: temporal and spatial considerations. ISAC'00 Univ. of Exeter.
[6] Dalgarno, G. (1989) A computer based system for music for hearing impaired people". Proceedings of the Second National Conference

on Music and the Hearing Impaired. University of Kansas (1989) pp 31-42
[7 ] Wigram, A.L. & Dileo, C. (Eds.) Music Vibration . Jeffrey Books, New Jersey, USA.
Acknowledgements
To Professor John A Sloboda for much advice, support and encouragement.
To the musicians who created many of the "doubled part music" as described in 8. were Karen Twitchett, Professor John A
Sloboda and Dr Steve Roberts.
To the support staff, particularly Chris Woods, in the Dept. of Psychology at Keele for much practical help.
To Somatron Inc for help with the supply of equipment and for substantial technical advice.
The work has been possible through grants from the following:-
The Orpheus Trust, The Norman Collinson Trust, The Arts Council of England, The Sport and Art Foundation, The National
Lottery Charities Board, whose financial support is gratefully acknowledged
Back to index

CONCEPTS ET CATEGORISATION DANS LE CHAMP DU SON MUSICAL : LE TAS (TEST D'ATTRIBUTS DU SON)
Proceedings paper
CONCEPTS ET CATEGORISATION DANS LE CHAMP DU SON MUSICAL
LE TAS (TEST D'ATTRIBUTS DU SON)

Silvia C. Furnó. Faculté de Beaux Arts - Departement de Musique
Université Nationelle de La Plata - République Argentine
INDEX
1. ANTÉCÉDENTS
1.1 Genèse du TAS

1.2 Selection du contenu
2. OBJECTIFS
3. MÉTHODOLOGIE
1. Les matériels
2. La forme parallèle
3. La description du logiciel
4. L'application du test
5. La performance du sujet
6. Le rôle de l'examinateur
7. Le TAS en action
8. L'application expérimentale
4. RÉSULTATS
1. Quelques commentaires
5. DISCUSSION
1. L'utilité du logiciel
file:///g|/demos/Furno.htm (1 of 14) [18/07/2000 00:30:38]

2. Le raisonnement à voix haute .
3. La valeur des données complémentaires
4. L'analyse du protocole
6. BIBLIOGRAPHIE
1. ANTÉCÉDENTS
L'habilité de distinguer des événements sonores dans l'environnement constitue une des tâches perceptives propres de l'adaptation d'un être vivant à son milieu. La variété et la
richesse des répertoires de sons que les membres de différentes espèces emploient pour communiquer, montrent le développement des habilités de discrimination auditive de
différents degrés de subtilité.
Cependant, la structuration des sons dans des systèmes catégoriels ou la construction de discours qui puissent être compris en termes musicaux et partagés à partir d'une
perspective esthétique, suppose une tâche d'une plus grande envergure, reservée en principe, à l'être humain.
Ainsi que l'homme construit des schèmas classificatoires de toutes sortes qui lui permettent d'ordonner les faits et les phénomènes pour mieux comprendre le monde, la
connaissance des sons dans des contextes musicaux exige l'utilisation des principes et des catégories qui permettent de mieux comprendre la musique et de trouver le plaisir de
l'écouter.
La littérature concernant la formation de concepts fait allusion aux procès d'abstraction, aux critères de groupement selon des propriétés en commun ou aux systèmes de
classification et catégorisation.
La construction de concepts occupe une place préponderante dans la recherche psychologique visant à l'étude des processus cognitifs. Ces questions-là se rattachent à des
problématiques propres de la mémoire (la reconnaissance), de l'attention et de la représentation. En ce qui concerne la formation de concepts on a proposé des modèles et des
hypothèses diverses qui s'appuient sur des principes solides des théories associationnistes, des schèmas, des prototypes, ainsi que des «modèles organisateurs» (Moreno Marimón,
1998). Il est possible ainsi de disposer d'un nombre important de ressources et de tests de différente nature pour explorer la formation de concepts. Il est plus difficile de trouver
des épreuves semblables pour étudier la formation de concepts dans le champ de la musique.
Dans cette étude on présente un instrument spécialement conçu pour explorer la formation et le développement des concepts dans le champ du son musical nommé Test
d'Attributs du Son (TAS). D'après ce qu'on sait, il n'existe pas dans la spécialité, des instruments ou des épreuves spécifiques à ce propos (Madsen. C. 1999).
Le dessin et la production du TAS se développent dans le « Programa de Incentivo al Docente-Investigador » de l'Université National de La Plata, Buenos Aires - República
Argentina.
1.1 Genèse du TAS

La Psychologie Cognitive appliquée à la Musique est un champ qui est en constant développement. Le processus d' « internalization » musicale, en utilisant le mot américain,
occupe un lieu remarquable dans cette discipline. Les travaux de R. Francès (1958) à propos de l'acculturation tonale chez les adultes étudiée depuis un plan perceptif, ainsi que
ceux de M. Imberty (1969) et A. Zenatti (1969) faits avec des enfants, peuvent être considérés comme des pionniers.
De nos jours, ces recherches se sont multipliées. En conséquence, il est possible de disposer d'œuvres telles que les travaux de D. Deutsch (1982), M. L. Serafine (1988), P.
Howell, I. Cross et R. West (1985), J. A. Sloboda (1985), et E. Dowling et D. L. Harwood (1986). D'autres études abordent le développement musical de l'enfant -telles les
travaux de R. Shuter-Detson et C. Gabriel (1981), H. de la Motte-Haber (1985), D. Hargreaves (1986), A. Zenatti, (1981), et A. Zenatti, (1994)-.
J. Carlsen (1996), dans une synthèse sur le développement de la conceptualisation en Musique, met en relief les apports de C. Scott et D. Taebel en ce qui concerne le

développement de ces processus chez des enfants. Ses propres travaux sur ce sujet se fondent sur la théorie des prototypes.
Les recherches faites par L. S. Vygotsky et ses collaborateurs (1934, 1968), étudient le processus de formation de concepts dans la résolution de problèmes, à travers l'emploi de
matériel sensoriel et verbal. Dans ce but, ils ont développé la Méthode de la Double Stimulation (MDS) qui utilise deux sortes de matériels : des blocs en bois (stimulus
sensoriel) et des trigrammes (stimulus verbal).
Le rapport qui s'établit entre le matériel concret (22 corps géométriques qui se groupent selon les traits communs) et le matériel verbal (des syllabes sans sens, Lag, Mur, Bik et
Cev) est à l'origine de quatre catégories (Semeonoff, B. and Trist, E. 1958). Le sujet doit classifier les 22 corps en quatre classes et décrire les critères utilisés pour cela.
Ainsi que L. S. Vygotsky met en évidence les rapports entre le mot et les attributs qui sont perçus visuellement, il est possible de supposer que le son peut aussi être lié à
l'utilisation des mots. En conséquence, le dessin du TAS se sert de la Méthode de la Double Stimulation de L. S. Vygotsky pour étudier des processus de la formation de concepts
dans le champ du son musical. Dans ce cas, des trigrammes et des sons constituent les stimuli qui établissent le rapport entre le matériel sensoriel et verbal.
Pour nommer les catégories, on a adopté les dénominations du MDS (trigrammes Lag, Mur, Bik et Cev). La sélection du matériel sensoriel, une batterie de sons, sera développée
dans le paragraphe suivant.
1.2 Sélection du contenu

Pendant les 30 dernières années du XXe siècle, la nature du son a été le sujet d'étude d' importantes recherches. Parmi celles-là, on trouve les contributions de Pierre Shaeffer
(1966) à propos de cequ'il qualifie comme « l'objet sonore », ainsi que les travaux de ses disciples -entre eux, ceux de M. Chion (1983) dans lesquels il développe le concept
d'audiovision (Chion, M. 1993) comme une perception syncrétique.
Les études de A. Bregman (1999) se référant à l'analyse des scènes auditives, ceux de S. Mc Adams (1994) par rapport à la perception et à la reconnaissance des événements
sonores, ainsi que les travaux de R. Crowder (1994) sur la mémoire auditive, rendent compte des progrès dans ce champ. L'irruption de nouvelles technologies a permis la
digitalisation du son, ainsi que sa production et son analyse par des moyens électroniques. L'emploi généralisé de la norme MIDI est, peut-être, l'un des avancements les plus
significatifs.
La génération de sons par des moyens électroniques a élevé presque à l'infini les possibilités de production musicale. Les nouveaux sons diffèrent de ceux qui proviennent de la
tradition musicale, ajoutent de nouveaux attributs et donnent lieu à de nouvelles esthétiques. Pour la construction du TAS on a tiré le plus grand profit de cette disponibilité tant
pour sélectioner des sons que pour faire des contrôles à propos des magnitudes des attributs partagés.
2. OBJECTIFS
Le propos de ce test est l'analyse de certains processus engagés dans la formation de concepts musicaux. Pour mieux comprendre les mécanismes impliqués dans le processus de
conceptualisation, on considèrent comme représentatives les actions suivantes :
a. explorer des similitudes et des différences parmi les sons ;
b. abstraire les traits communs des sons perçus ;
c. trouver un principe qui puisse lier les traits abstraits ;
d. déployer des stratégies / procédés heuristiques pour résoudre des problèmes de rapprochement et relation entre les sons ;
e. transcoder avec des mots les traits du son perçu (nommer les attributs, décrire des similarités ou des différences, utiliser des métaphores, etc.) ;
f. activer l'utilisation d'autres manières de représentation pour communiquer des caractéristiques de l'attribut perçu quand on n'arrive pas à l'exprimer à travers les mots
(c'est-à-dire, faire des grimaces ou des mimiques, utiliser l'imitation vocale, etc.).
Un autre but de l'étude est de vérifier si ces habilités diffèrent parmi les auditeurs musiciens et non musiciens.

3. MÉTHODOLOGIE
3.1 Les matériels
Par rapport à la perception visuelle, les évidences démontrent qu'il existe une plus grande prégnance de la couleur et de la forme (selon cet ordre) sur d'autres attributs de l'objet
(Kahneman, D. 1997). De la même manière, il est possible de supposer que certains attributs du son musical peuvent être perçus avec une plus grande prégnance ou saillance.
D'accord avec cela, on considère que l'identification de tendances de réponses permettra de juger si ce phénomène apparaît dans le champ du son musical.
Les attributs du son qu'explore le TAS font allusion aux propriétés que les gens discriminent fréquemment dans leur environnement. Il s'agit de rélations de :
● intensité ( plus fort ou plus faible ) ;
● durée ( plus long ou plus bref que ) ;
● hauteur ( plus aigu ou plus grave ) ;
● couleur ou timbre instrumental ( un piano, une guitare, etc. ; des sons produits par un objet en bois ou en métal; des caractéristiques à propos du brillant ou du sourd, etc.
);
Pour établir cette sélection, on a considéré les critères suivants :
a. le besoin de respecter la combinatoire entre quatre attributs ; et
b. l'utilisation de sons musicaux qui puissent être connus des auditeurs musiciens et non musiciens d'âges différents.
Pour un premier contrôle, on a soummis à des experts le degré de différenciation des attributs du son à inclure dans l'épreuve. Sur la base de ces appréciations on a mis au point la
sélection originelle.
3.2 La forme parallèle

Au milieu d'une combinaison différente des quatre attributs du son, on a dessiné une forme parallèle du test.
La possibilité de disposer des deux formes (A et B) permettra de faire le re-test et enregistrer les progrès et, de cette manière, on pourra avancer sur les études de validité et
fiabilité du TAS. Il est possible de supposer que les sujets qui peuvent résoudre le problème avec la forme A, devraient aussi pouvoir le faire avec la forme B.
On a déjà effectué quelques contrôles pour vérifier le degré d'équivalence des deux formes.
Forme A
Attributs critiques : intensité et durée
Caractéristiques des sons : deux dégrées d' intensité ( différence 12 dB ), deux valeurs de durée ( relation 3:1 ) ; cinq dégrées de hauteur, ( E3, B#3, F#4, C5 et G#5 ) et six
variétés de timbre instrumental, ( flûte, guitare, trompette, mandoline, cloche avec basse et basse dénaturée ).
Nomenclature utilisé pour désigner les catégories à construire : Lag, Mur, Cev et Bik.
Les sons de la Forme A, partagent l'intensité et la durée, et ils diffèrent par la hauteur et le timbre. C'est-à-dire que, pour résoudre la tâche le sujet doit isoler et combiner les sons
selon leur durée et leur intensité.
Forme B
Attributs critiques : hauteur et durée

Caractéristiques des sons : deux degrés de hauteur ( différence = tierce majeure ), deux valeurs de durée (relation 3:1); cinq degrés d'intensité, ( pp, p, mf, f et ff ) et six variétés
de timbre instrumental, ( flûte, guitare, trompette, mandoline, cloche avec basse et basse dénaturé ).
Nomenclature utilisé pour désigner les catégories à construire : Lag, Mur, Cev et Bik.
Les sons de la Forme A, partagent la hauteur et la durée et ils diffèrent par l'intensité et le timbre. C'est à dire que, pour résoudre la tâche le sujet doit isoler et combiner les sons
selon leur durée et leur hauteur.
3.3 La description du logiciel

Le TAS utilise une interface comme support.
L'écran où l'on fait le test présente cinq secteurs :
● un cercle central en spirale tout autour duquel on dispose les matériels : 22 sphères qui représentent 22 sons différents ;
● quatre quadrants où l'on peut classifier les sons, chacun avec une couleur différente.
Les sphères sont identiques dans le but de centrer l'attention du sujet sur les stimulus auditifs. On suppose que grâce à cette présentation visuelle les sons acquièrent du corps. Ils
deviennent donc « maniables » : le sujet opère avec la souris sur les sphères qui
● sonnent, quand t-on les signale avec le curseur-main et on clique avec le bouton droit de la souris ;
● changent de place, quand on les entraîne avec le curseur-main, en appuyant le bouton gauche et en déplaçant la souris.
Les sphères peuvent se distribuer sur n'importe quel lieu des quadrants ou retourner au point de départ en les entraînant et en les lâchant sur la spirale du cercle central. Ainsi, il
est possible de grouper les sons, de les séparer, de les écouter en ordre différent, etc.
Les trigrammes qui représentent les catégories des sons sont cachés. Il est possible de les voir quand l'examinateur, depuis le clavier, active certaines commandes pour permettre
le feed-back.
3.4 L'application du test

Le test s'applique en deux phases. Pendant la première le sujet doit se familiariser avec le fonctionnement du logiciel. On accorde les items d'essai pour rendre possible la pratique
avec l'emploi de la souris et avec les actions d'écouter et de déplacer les sons.
Pendant la seconde phase l'examinateur donne la consigne de travail :
« classifier 22 sons en quatre catégories et décrire les critères utilisés ».
Lorsqu'on considère que le sujet a compris la nature de la tâche, l'examinateur lui fournit l'aide initiale : quand il appuie la touche , une sphère se déplace vers le quadrant
jaune où elle reste fixe et avec le trigramme visible. Cette sphère ne pourra pas être déplacée par le sujet, qui ne pourra que l'écouter.
À partir de ce moment le sujet actionne les sons sans aucune indication : il dispose de tout le temps nécessaire pour essayer de différentes manières de grouper les sons et pour
résoudre la tâche. Ainsi, il écoute, il compare les sons entre eux et/ou avec le son de l'aide initiale. Après, en faisant tous les essais nécessaires, les placera dans la catégorie à
laquelle, selon son jugement, ils appartiennent.
Lorsque le sujet a accompli la première tentative de groupement, l'examinateur lui en demande l'explication. Si le groupement a été erroné, l'examinateur, au moyen de la touche
, montre l'étiquette d'un son qui a été mal catégorisé. Tout de suite, au moyen de la touche et avec le bouton droit de la souris, l'examinateur rend visible le trigramme de
ce son-là. Le sujet, dispose alors de deux aides qu'il peut utiliser pour modifier le critère de recherche (changement d'hypothèse).
Ces actions, se répètent jusqu'à ce que le sujet :
a. trouve la solution (le critère selon lequel il est possible classifier les 22 sons en quatre groupes),
b. insiste pour considérer la réponse trouvée comme réussie (malgré les aides reçues) ou
c. abandonne la tâche.
Lorsqu'on arrive à la fin du test, l'examinateur utilise la touche pour rendre visibles tous les trigrammes et pour confirmer de cette manière, la correction de la réponse.
Finalement, l'examinateur appuie sur le bouton « Sortir » pour finir la tâche.
3.5 La performance du sujet

Les actions effectués par le sujet peuvent être considérées comme des indicatives des stratégies de recherche, de la comparaison, de la sélection et de la prise d'une décision
résultante pour la résolution du problème.
L'accomplissement de la tâche suppose la mise en jeu de processus d'attention et de rétention. Ceci obéit à certaines contraintes qui conditionnent la manière d'agir :
● les sons s'écoutent un par un ;
● quand on écoute un son, les autres « disparaissent » du champ auditif ;
● les comparaisons entre deux sons ne peuvent se réaliser que de façon successive ; ce qui exige de retenir le premier son pour le comparer avec le second ;
● pendant l'émission des stimuli, une alerte soutenue est indispensable. Si le sujet oubliait les sons, il devrait les écouter de nouveau ;
● le temps de contact avec le stimuli est prédéterminé et il ne peut pas être réglé par le sujet.
3.6 Le rôle de l'examinateur

● disposer les moyens nécessaires: entrer au logiciel, effectuer les réglages dans les niveaux de intensité selon les règles prévues et préparer le magnétophone ;
● présenter le logiciel et son fonctionnement en employant les items d'essai ;
● expliquer le problème et s'assurer que le sujet a compris la nature de la tâche ;
● pendant l'accomplissement de la tâche, et sur la demande du sujet, élargir l'information sur le fonctionnement du logiciel ;
● omettre les commentaires ou demandes d'explications qui pourraient nuir à l'audition ;
● s'absteindre de guider le processus de résolution du problème, en évitant de donner des « pistes » ou d'émettre des insinuations ou suggestions ;
● devant les réponses erronées, fournir les aides prévues à la manière de feed-back ;
● fournir avec des aides de type verbal quand on le jugera nécessaire ;
● contrôler le degré de fatigue ou de perte d'intérêt. L'épreuve finie : transcrire sur le protocole les reinsegnements provenants du protocole verbal et des observations
effectuées pendant l'administration du test.
Dans la page suivante on peut voir le TAS en action.



3.8 L'application expérimentale

La version préliminaire de la forme A a été fournie à
● un groupe d'enseignants, non musiciens ;
● 4 enfants de 7, 9, 13, y 15 ans ;
● deux étudiants universitaires, non musiciens, produisant un rapport verbal simultané, (pensée à haute voix)
La forme B, a été fournie à

● 3 enfants de 7, 9 et 13 ans.
4. RÉSULTATS
Pendant l'administration du TAS on enregistre le comportement complet du sujet ( c'est-à-dire, on essaie de recueillir « tout ce qu'il fait et tout ce qu'il dit» ). C'est pour cela que,
les données obtenues proviennent de trois sources : l'information automatisée stockée par le logiciel, le protocole verbal enregistré par le magnétophone et les observations
recueillies par l'examinateur.
On a élaboré un premier schéma initial pour catégoriser les réponses ; il se fonde sur le degré de correction et d'efficacité.
Selon le degré de correction, les réponses peuvent être catégorisées en :
A.- Réponses totalement correctes. Ce sont celles qui présentent
i. une sélection des sons partageant deux attributs critiques ;
ii. justification de la sélection (description des deux attributs).
B.- Réponses partiellement correctes. Ce sont celles qui présentent
iii. une sélection des sons partageant un attribut ;
iv. justification de la sélection (description de l'attribut partagé).
C.- Réponses incorrectes. Ce sont celles qui présentent
v. Une sélection des sons sans attributs partagés.
Le degré d'efficacité de la réponse, dans toutes les catégories, peut être estimé selon
❍ le nombre d'essais nécessités (quantité d'aides) ; et
❍ quantité d'attributs nommés pour justifier la réponse.

4.1 Quelques commentaires
À partir des données recueillies jusqu'à présent, on a élaboré quelques commentaires.
Chez les adultes on a trouvé que :
● La durée et l'intensité ont été les attributs les plus faciles à décrire.
● Les adultes non musiciens ont fait attention, tout d'abord, au timbre instrumental les rapportant à des sources sonores connues.
● La hauteur n'a pas été nommée comme un attribut perçu ou discriminé.
● Les aides additionnelles prévues, ont permis aux sujets de considérer le rapport entre deux attributs.
● La restriction dans la disponibilité d'un vocabulaire spécifique s'est mis en évidence au moment de décrire les attibuts. Ces difficultés ¿sont-t-elles perceptives,
discriminatives, ou des deux types... ?
● Les musiciens ont fait appel à une analyse détaillée qui a rendu difficile le groupement dans des catégories plus étendues.
En ce qui concerne l'âge :

● Les sujets de 13 et 15 ans ont réussi à résoudre le problème ; c'est-à-dire, ils ont pu abstraire les deux attributs partagés qui constituent la base de la classification.
● Les enfants de 6 et 9 ans ont atteint des solutions partielles : ils n'ont consideré qu'un attribut, par présomption celui le plus prégnant ou saillant.
● L'enfants de 6 ans n'a tenu compte que de l'intensité.
● L'enfants de 9 ans a seulement consideré le timbre instrumental (de la même manière que les adultes non-musiciens).
● D'après l'étude préliminaire des protocoles, on peut penser que, chez les enfants aussi bien que chez les adultes, la pratique musicale préalable est une variable d'une grande
importance.
D'ailleurs, selon quelques tendances de réponses observées, certaines actions pourraient s'associer à l'emploi des stratégies, par exemple :
● profiter de l'information trouvée par hasard, réaction qui pourrait s'associer avec l'insight ;
● prendre un son comme une référence ( prototype ? ) -en le conservant, apparemment, dans la mémoire à court terme- et l'employer pour le confronter avec les autres.
● réécouter les sons -apparemment avec l'intention de faire le monitorage de sa propre action-.
● procéder dans certains cas, par essais et erreurs et dans d'autres, à travers un schéma ordonné -c'est à dire, ces sont des actions qu'illustrent des différents styles
d'approximation au problème-.
Quant aux deux formes du TAS, des données expérimentales préliminaires montrent que les réponses à la forme B ont résulté congruentes avec celles obtenues un mois
auparavant avec la forme A. Les réussites on été les mêmes quant au degré de correction des réponses, la quantité d'aides nécessitées et les commentaires émis.
5. DISCUSSION
On présentera l'information recueillie jusqu'au présent, organisée autour de :
1. l'utilité du logiciel, quant à la présentation et l'enregistrement des données expérimentales ;
2. le raisonnement à voix haute ( protocole verbale ) ;
3. les observations de l'examinateur ;
4. le protocole.

5.1 L'utilité du logiciel
5.1.1 Le logiciel dans la présentation
Quelques remarques par rapport à la présentation du test par ordinateur :
● la familiarité avec l'utilisation de la souris est une condition préalable pour résoudre la tâche ;
● les sujets, indépendamment leur âge, ont une tendance à percevoir la tâche comme un jeu plus que comme une situation d'examen. L'utilisation d'aides pour orienter la
réponse stimule la curiosité et prédispose au jeu. On dirait que la présentation ludique augmente l'attrait et réduit les effets de la fatigue. On a observé quelques
comportements ludiques tels que : laisser les sphères sur la spirale, les employer pour construire des figures, les grouper et les faire sonner tour de rôle en écoutant les
relations sonores résultantes. Cette caractéristique ludique oblige l'examinateur à déterminer dans chaque cas si le sujet est en train de résoudre la tâche où s'il ne fait que
jouer ;
● la centralisation de l'attention sur le stimulus auditif est favorisée par la posibilité d'isoler des sons d'instruments traditionnels en les dépouillant de leur apparence visuelle.
C'est-à-dire, la technologie permet
❍ d'émuler les sons instrumentaux avec un haute dégré de fidélité; -ainsi, le son digitalisé de la flûte rappelle le son d'une flûte réelle- ;
❍ de réduire l'association avec le stimulus visuel car les sons se présentent avec une apparence visuelle identique ; par exemple, le son de la flûte a non pas la forme
caractéristique de cet instrument mais celle d'une sphère ; de même celui de la trompette, de la mandoline ou ceux des autres timbres instrumentaux ;
● on peut réduire remarquablement le risque de divination de la réponse (Cronbach, L. J., 1998, p. 95) par rapport au MDS, parce que le logiciel a caché les trigrammes, sauf
dans le cas où l'examinateur actionne les commandes respectives pour les laisser visibles.
5.1.2 Le logiciel dans l'enregistrement des données

● Le logiciel optimise la mise en mémoire de l'information minutieuse des opérations qui se font sur les sons, étant donné qu'il fait un stockage précis de toutes les actions
faites par le sujet et par l'examinateur sur l'écran. Ces données s'enregistrent automatiquement sur une feuille de calcul et elles permettent de, entre autres :
❍ différencier les actions faites par le sujet (« il joue », « il déplace », « il retourne au centre ») de celles qui fait l'examinateur (« aide initial » ; « recommencement » ;
activation ou désactivation de l'étiquette d'un son déterminé ; « plus d'aide » ; « voir toutes les étiquettes » ; « sortir ») ;
❍ mesurer les actions en secondes ;
❍ localiser chaque son pendant toute l'épreuve et connaître leur place exacte dans les quadrants ;
❍ identifier chaque son et ses attributs au moyen d'un code prédéterminé ;
❍ enregistrer la quantité d'aides fournies.
● De même, il est possible d'obtenir des graphiques pour représenter la place des sons dans les quadrants.
5.2 Le raisonnement à voix haute

Une autre source de donnés provient des dialogues, des explications, des descriptions, des doutes, des questions et des commentaires spontanés du sujet et de l'examinateur
(Ericsson, K. and Simon, H., 1993; Leal A., 1998). Pendant le développement du test, un magnétophone enregistre l'information verbale. Lorsque l'épreuve est finie,
l'examinateur consigne les réponses dans le protocole.
L'information verbale obtenue dans les premières études a permis de :
● éclaircir le sens des actions du sujet recueillies par le logiciel ;
● détecter des changements de directions dans la recherche des solutions ;

● identifier la direction de l'attention du sujet ;
● vérifier la disponibilité de vocabulaire pour se référer au son musical ;
● identifier des modes de communication non-verbale que le sujet emploie quand il n'a pas le mot ;
● relever les différentes significations assignées à un même mot, ainsi que l'emploi de différents mots pour parler d'un même concept ;
● relever la différente signification donnée à certains mots par les musiciens et le non-musiciens ;
● isoler quelques actions révélatrices des différentes façons d'agir pour résoudre un problème ;
● remarquer, dans quelle mesure, les explications de l'examinateur aident le sujet à l'éclaircissement des réponses; autrement dit, observer l'interaction entre l'action et le
discours et dans quelle mesure ils se nourrissent mutuellement (rétroalimentation).
5.3 La valeur des données complémentaires

Finalement, l'information s'enrichit avec les données qu'on ne peut recueillir ni avec le logiciel, ni avec le magnétophone. L'examinateur consigne sur le protocole les données
rélatives aux gestes, aux réactions d'acceptation ou de rejet, aux regards, aux hésitations au moment de parler, etc., observées lors de l'administration du test. Ainsi, certains
indices provenants de ces données, ont-ils permis de voir :
● le degré de sécurité ou insécurité des réponses ;
● la demande d'aide, qui peut être exprimée ou voilée ;
● le rejet d'aides en vue de résoudre tout seul la tâche ;
● l'impatience et la nervosité dues à la difficulté pour décrire des questions relatives aux sons ;
● le degré d'intérêt ou la perte de celui-ci, face aux difficultés ;
● l'inquiétude et/ou la frustration face à l'impossibilité de trouver la solution ;
Malgré la tâche additionnelle qu'implique la récupération et l'enregistrement du protocole verbale (Das-Kar-Parrila R. K. 1998) et des données complémentaires dans le protocole,
on considère qu'il s'agit d'une information d'une grande valeur difficilement remplaçable par des données automatisées.
5.4 L'analyse du protocole

Le Protocole ressemble les données automatisées, les données verbales et les observations de l'examinateur.
Les premières colonnes de la feuille de calcul montrent les données qui proviennent du logiciel ; ils renseignent à propos du temps (mesuré en secondes), les actions faites par le
sujet et par l'examinateur et les stimuli : l'identification de chaque son au moyen du trigramme qui en représente la catégorie ; des attributs qui les caractérisent et leur
codification et la place de chaque son pendant toute l'épreuve. Cette dernière information renvoie au secteur central, à la couleur des quadrants respectifs, et aux coordonnées qui
indiquent la place exacte de chaque son et permettent de transformer l'information en graphiques.
Les deux dernières colonnes présentent l'information non automatisée. Dans l'avant-dernière, on transcrit le protocole verbal sous forme de dialogue et dans la dernière, les
observations complémentaires de l'examinateur.
La transcription de cette information est une tâche que fait aussi l'examinateur. Il est convenable de la réaliser dès qu'on a finie l'épreuve. Le protocole verbal aissi que les
observations complémentaires sont transcrits avec une correspondance « point par point » avec les actions stockées par le logiciel. Pour le faire l'examinateur emploie sa capacité
de discrimination auditive pour identifier, dans l'enregistrement du magnétophone, chaque son qui a été activé pendant l'épreuve.
Les perspectives du TAS se considèrent dans deux directions : celles qui concernent sa propre application et celles qui se rapportent au développement du logiciel.
Quant à la première, on croit que la systématisation et la généralisation des observations permettront de faire des progrès dans l'étude des stratégies pour la résolution de

problèmes et des processus impliqués. On élargira les échantillons pour étudier le rôle de l'expérience musicale préalable et de l'âge. Pour cela, on pense qu'il sera nécessaire
d'augmenter le nombre d'examinateurs qualifiés. Par ailleurs, le TAS sera soumis à des contrôles de validité et de fiabilité usuels.
En ce qui concerne le développement du logiciel, on a prévu d'une part, l'amélioration et l'ajustage systématique de l'instrument. D'autre part, on considère la possibilité d'adapter
le logiciel à d'autres champs de la connaissance liés à la musique, tels que les rapports musicaux de différente nature.
Finalement on a prévu, comme une dérivation au champ de la didactique, le dessin d'un logiciel destiné à l'entraînement auditif des enfants.
Les progrès attendus avec le développement du TAS sont encourageants et les questions qu'on se pose à propos de celui-ci peuvent être développées en de multiples directions.
6. BIBLIOGRAPHIE
Bregman, A. (1999) Auditory Scene Analysis. The Perceptual Organization of Sound. Massachusetts : The MIT Press.
Carlsen J. C. (1996) Las representaciones mentales en la música. Eufonía, 5, pp 67-79.
Cronbach, L. J. (1998) Fundamentos de los Test Psicológicos. Madrid : Biblioteca Nueva.
Chion M. (1983) Guide des Objets Sonores. Paris : Bouchet / Chastel.
Chion M. (1993) La Audiovisión. Introducción a un análisis conjunto de la imagen y el sonido. Buenos Aires : Paidós
Crowder, R. (1994) La mémoire auditive . En Mc Adams, S. St Bigand, E.: Penser les sons. Psychologie cognitive de la audition. Paris: Presses Universitaires de France. pp
123-156
Das J. P., Kar B. C., Parrila R. K. (1998) Planificación Cognitiva. Bases Psicológicas de la Conducta Inteligente. Buenos Aires : Paidós
Deutsch, D. (1982) The Psychology of Music. London: Academic Press.
Dowling, W. J., Hawood, D. L. (1986) Music Cognition. Orlando. Fl., Academic Press.
Ericsson, K. and Simon, H. (1993) Protocol Analysis . London: Cambridge, Massachusetts
Francès, R. (1958) La perception de la musique. Paris: VRIN.
Hargreaves D. J. (1986-1998) Música y desarrollo psicológico. Barcelona: Graò.
Howell, P., Cross I. and West, R. (1985) Musical Structure and Cognition, London: Academic Press.
Imberty, M. (1969) L'acquisition des structures tonales chez l'enfant. Paris: Klinksieck.
Kahneman, D. (1997) Atención y Esfuerzo. Madrid: Biblioteca Nueva. Psicología Universidad
Leal A. (1998) Los Cambios en el lenguaje. En Moreno M., Sastre G., Bovet M., Leal A. Conocimiento y cambio. Los modelos Organizadores en la construcción del
conocimiento. Buenos Aires : Paidós. pp 143-184
Madsen, C. (1999) Communication personnelle avec l'auteur.
Moreno M. (1998) La psicología cognitiva y los modelos mentales. En Moreno M., Sastre G., Bovet M., Leal A. Conocimiento y cambio. Los modelos Organizadores en la

construcción del conocimiento. Buenos Aires : Paidós. pp 31-47
Motte-Haber, H. de la, (1994) Principales théories scientifiques en psychologie de la musique: les paradigmes. En Zenatti, A. Psychologie de la Musique. Paris: Presses
Universitaires de France. pp 27-54
Semeonoff, B. and Trist, E. (1958) Diagnostic Performance Tests. London: Tavistock Publications Limited
Serafine, M.L. (1988) Music as cognition: The developmente of Thought in Sound. New York: Columbia University Press.
Shaeffer P. (1966) Traité des Objets Musicaux. Paris : Éditions du Seuil.
Shuter-Dyson R., Gabriel C. (1981) The Psychology of Musical Ability . London: Methuen.
Sloboda, J. (1985) L'Esprit Musicien. La psychologie cognitive de la musique. Bruxelles: Pierre Mardaga .
Vygotsky, L. S. (1934 - 1995) Pensamiento y Lenguaje . Buenos Aires: Fausto.
Zenatti A. (1969) Le Developpement génétique de la perception musicale, Paris: CNRS, 2º éd. 1975.
Zenatti, A. (1981) L'Enfant et son Environnement Musical. France, Issy-les-Moulineaux: Edition Scientifiques et Psychologiques
Zenatti, A. (1994) Psychologie de la Musique. Paris: Presses Universitaires de France.
Back to index

Dr
PRELIMINARY REORT OF AN INVESTIGATION IN PROGRESS:
A STUDY OF THE NATURE OF COMPONENTS OF DISCIPLINARY STRUCTURE IN APPLIED VIOLIN

PEFORMANCE IN EXPERT TEACHING PRACTICE
Dr. Sylvia A. Gholson
sag2@is.nyu.edu
Background:
This project builds on a prior study of the general nature of renowned violin
teacher Dorothy Delay's pedagogical practice. The present investigation
attempts a deeper exploration of DeLay's knowledge base in order to describe
the ways in which basic performance technique and interpretive processes are
understood and externally represented, taught, and integrated into performance
discipline.
Aims:
This study explores the nature of components of disciplinary structure in the

domain of applied violin performance. Areas of focus will be basic performance
technique, musical interpretation, the application of basic technique to
musical interpretive intent, and the sequencing of instructional literature.
Method:
This qualitative investigation follows the design of a single case study which
includes on-site observation; unstructured interviews; fieldnote, audiotape,
and videotape data collection; systematic data categorization and analysis; and
interpretation and theory development.
Results:
Preliminary findings suggest that pedagogical knowledge functions to solve

disciplinary problems which "undergo continual reformulation at higher levels
as lower levels are achieved." Findings also suggest that effective performance
discipline is organized so that the whole is in the part - any aspect of a
component of disciplinary structure is used in ways that lead to full
incorporation.
Conclusions:
Organization of disciplinary structure is categorical and integrates the

functional properties of instrument and performer. The most inclusive
categories represent fundamental components of disciplinary structure. Patterns
of teacher behavior reveal consistent ways of conveying instructional
information and the important role dimensions of measurement play in the
development and refinement of performance craft.
Back to index
file:///g|/demos/Gholson.htm (1 of 2) [18/07/2000 00:30:40]

Dr
file:///g|/demos/Gholson.htm (2 of 2) [18/07/2000 00:30:40]

Janet Underhill
SET THEORY AND HARMONIC STRUCTURE
Janet Underhill
junderhill@latinschool.org
Background:
Harmonic structure is the most difficult and least accessible element of music
to the listener of European Art Music, though, like the foundation of a
building, it underpins and structures the music. the appreciation of the
special skill of the great composer's ability to manipulate the harmony to
expressive effect is part of the richness of the listening experience.
Aims:
This presentation uses the mathematics of set theory, and an animated sequence
constructed by Macromedia Director, to clarify the web of harmonic structure in
the first movement of a piano sonata by franz Joseph Haydn.
Main contributions:
By framing harmonic movement in terms of intersecting sets (pivot chords), the

tensions and resolutions of movement away from and toward tonal centers are
illuminated, By providing a set of animated visuals that link chord function
and intersecting sets, represented by Venn diagrams, the logic of harmonic
motion is highlighted.
Implications
The basic tenets of set theory are straightforward and accessible to the non
mathematician. A knowledge of these basic principles provides a framework for
the experience of harmony. The visual element provided by the animated sequence
of intersecting sets offers insight into the special skill and genius of the
composer, and enhances the listening experience.
Back to index
file:///g|/demos/Underhill.htm [18/07/2000 00:30:40]

VISTAMUSIC : AN ENVIRONMENT FOR ASSISTING PHYSICALLY DISABLED PEOPLE TO
Proceedings paper
Vistamusic : an environment for assisting physically disabled people to

produce musically expressive performances
G. Dalgarno
Unit for the Study of Musical Skill and Development,
Dept. of Psychology.
University of Keele,
Staffs
ST5 5BY
g.dalgarno@keele.ac.uk
Introduction
There are clearly two components in musical performance; (a) the skill of being able to play the
instrument and (b) the mental process of deciding upon a particular musical interpretation. Which of
these two factors will be most appreciated by an audience will depend on several factors, but tending
towards the first if it is technically demanding piece in which to play the correct notes with the
appropriate timing and to the second when most pianists of modest ability (eg Associated Board
Grade 6 level) could play it correctly from a purely technical point of view but not highly
expressively.
It is argued that these two skills can be separated, and that through the use of suitable computer
software and appropriate additional hardware interfaces, a person who is unable to use their hands
well, or even at all, can nevertheless create their own expressive performance as a recording. Once it
is recorded it is (stored) performance like any other.
There is no reason to believe that such disabled people are any more or less musical than anyone else,
and such a system would enable their musical ability to be creatively used and expressed, which
would otherwise be trapped within them.
Vistamusic is a system of software and ancillary hardware interfaces which we have developed for
this purpose. Currently it is for piano music only and its is capable of successfully tackling most types
of piano music. To extend this to all piano music and to some other instruments is a huge task and we
can only be tackled if major funding is made available
In addition, the system can be used equally by non-disabled people who cannot play the piano to
enable them to nevertheless create their own expressive performances of piano music as a recording.
In this case no additional hardware interfaces are necessary. While some may "sniff" at this, we
believe that this is perfectly valid creative activity in music. Just as with those who cannot use their
hands, provided the piece played is not technically demanding, the interest in the performance will be

in the interpretation, not in being able to play the notes. Clearly it would be inappropriate for either
type of user to play technically demanding pieces (except perhaps for their own private enjoyment).
For either it should be stressed that the credit for the expressive performance is entirely with the
person creating it. The computer system supplies no musicality. (This is by intent; although it would
be perfectly feasible to build some in we have chosen not to do so.) It is on the same basis as word
processor - one would not give the word processor the credit for the writing. Hence "knocking"
comment such as "its only pressing buttons" or "its only the computer that is doing it" are invalid, and
deny people, most cruelly for disabled uses, their due acknowledgement for musical creativity.
Design criteria
Our goal is for the result to be independent of manual abilities, depending only on the inherent
musicality of the person.
Accordingly our criteria for usability is:-
The system has to be equally usable by a person who can only press one key at a time
and who cannot press a key at a specified time or hold it down for specified length of
time once it has been depressed.
It is believed that this has been achieved in a way which in no way compromises usability for people
who are able bodied. In other words it is certainly not "a system for the disabled". By designing
software with such people in mind, rather than by attempting to add some facility afterwards, it has
been made equally accessible to all. This is a philosophy which the author would advocate in the
design of any computer system.
Further, the system should be designed to be as natural to use and as effort saving as possible - with
the minimum of key presses or other physical movement required to achieve a given musical result.
Surely this is something which surely benefits everyone, not just those with limited manual abilities.
Considering other software which is available

In view of the very great cost involved in developing complex software programs it is essential to
consider whether the task can be achieved in other ways.
There are a number of commercial software packages for music but the facilities are designed in a
way that are ill suited to our purpose. Indeed most seem to be designed for people who can play
keyboards and, in addition, they are often oriented towards "pop" music. True there is usually some
form of "step time" editing available but rather as an "add on" rather than as a central, optimally
designed, feature.
A number of computer based systems have been produced to enable people, including disabled people
who could not play a conventional musical instrument, to compose or arrange music, eg the E-scape
system, see [1] and [2]. Although this also contains performance features, it is a performance in real
time which would not be possible by many of the people for whom our software is designed. This
software meets the needs of a different type of user. MIDIGRID, see [3], is primarily a performance
system rather than a composition/arranging tool, but the same applies here.
It could be argued that almost any commercial music software package for sequencing could be used,
with the addition of suitable hardware interfaces. But use would be so slow and tedious that we would

not like to contemplate use for the purpose. That is even assuming that the hardware interfaces could
be optimised. (The latter is something which by no means always possible unless the owners of the
commercial software were willing to make changes themselves or willing to provide the source code
to enable others to make changes).
For these reasons we went ahead with the development
Describing the System

It is easier to illustrate the essence of the system by showing it in operation, than to describe it in
words: Demonstrations will be given at the conference. However the most important features will be
described here.
The system has been refined and extended over many years. Earlier versions are described in [4] and
[5]. In the past 2 years more effort has gone into developing methods of using the software than into
further software development.
If the expressive playing of the notes could be modified only on a note by note basis it would be so
tedious as to be intolerable. This is particularly so since a change to how one note is played can
sometimes necessitate a change to how a large number of other notes are to be played.
The essence of the system may be described as that which would be created by thinking of everything
you might want to do in a piece and then providing the simplest and most direct way of achieving it.
Always remembering the criteria we have adopted for usability.
1. Operations on a marked section
First and foremost there is an obvious need to be able to operate on a marked section of piece.
One might wish to make a make a rallentando or a crescendo to any required extent over the
section. When making a crescendo, one would like the choice of making it either as an even,
steady increase, or as an increasing trend on the existing underlying pattern of variation.
One might wish to make the note more or less detached. Or to increase or decrease the stress,
either by a fixed amount or as a percentage on the notes in the section and to do this for all the
parts, eg to make the top part "sing out" or only to say parts 2 to 5. One might wish to keep the
pattern of expression in section but to either make it more accentuated, or lessened, in effect.
As well as operations of the above type, one might wish to apply a particular stress pattern, eg
in each bar in the Marked Section to "steal" some time from the first beat and give it to the
second beat, or to play the final beat more quietly. Having applied a general rule to a section
one can then make individual exceptions. This is enormously quicker than having to do it not
by note.
To copy the expression used in one phrase to another phrase, whether the notes are identical or
not. Again exceptions can be made subsequently.
2. Operations on a "Duration Unit"

We apply this name to a small group of notes, eg in 9/8 time one could set it to 3
(corresponding to 3 quavers or eighth notes) and at other times to 9 (a whole measure) while at
other it could be set to 1. One might ask why this was a necessary feature since one already has
the "Marked Section" feature. The answer is that it would be tedious if one had always to mark

a section to operate on more than a single note or chord. With this one can put the cursor
anywhere in the piece and operate on the Duration Unit while leaving the Marked Section
unchanged.
3. Operations on a single note or chord.
One would wish to be able to change the characteristics of how any single note is played. Or
any single chord, or particular notes in chord, eg "all except the bass" or "all except the top
note".
4. Auditioning
Flexible and easy to use features for auditioning what had been done are of the greatest
importance.
The following are examples of some of the keys (or equivalent) which can be pressed to give
the specified result for auditioning purposes:-
• Play from the beginning until a "stop" key is pressed, otherwise to the end.
• Play the current Marked Section repeatedly until the "stop" key is pressed.
• First play the previous phrase, following on with the current phrase then return the cursor to
its previous position.
• Play from the beginning of the current phrase until "stop" is pressed and position the cursor at
the point where "stop" was pressed.
5. Summarising
The ability to hear what has been done and make modifications accordingly, in a quick and easy
way, is essential. It is vital to listen, make adjustment, and listen again. It is believed that it
would not be possible to produce a good performance of a piece of music without doing this -
and surely this fits with the essence of musicianship.
The performance is not "fixed"

Some have argued that a performance of this kind is too "fixed"; comparing it to a conventional
performance where the performer plays it differently each time. But this argument is surely not
correct; the person who created the "recorded performance" might be happy with it when listening to
it next day but in week's time quite dissatisfied with it (just as a conventional performer might be with
his recording). The user can go back and modify it as often as desired. (Or start the interpretation
again from scratch if desired, thus creating a fresh interpretation).
Related applications of computer systems in music

A most interesting study of what performers actually do in performing Schumann's Träumerei,
including a statistical analysis, has been made by Repp [6].

Creating a final recorded performance with high quality piano sound

The quality of MIDI keyboard and MIDI module "piano" sounds is now very good. The sustain pedal
is the usually the weakest feature - their Achilles' heel - and with current technology this cannot be
solved
A method of alleviating this to some extent is described in [5].
Where the person creating the performance is able to work at the highest level, the results obtained
from the above, while good, would be inadequate at that level - just as they would be if a concert
pianist were to give a performance using a MIDI electronic piano.
To achieve the highest quality and a realistic piano sound, particularly where pedalling is important
(as it so often is) then the only way to achieve it with the present state of technology is to use a real
acoustic piano equipped with a MIDI interface, rather than any device which generates the sound
electronically.
Of course one would no more expect disabled people to have such a piano in their house, plus a
recording studio, any more than one would expect most pianists to have a Steinway Concert Grand
and the studio. Where the standard of performance merits it they would take their MIDI file to a
recording studio so equipped, and after making adjustments to the performance (various operations on
the "velocities" associated with the notes) to suit the particular piano, the recording would be made.
Testing the system - Results

Our criteria for success, for the type of music in question, was to test it with aspiring professional
pianists to ensure that the following was achieved:-
(a) Everything that the pianist wished be done, could be done, and done reasonably easily; and
(b) the pianist was reasonably satisfied with the final "stored performance" so produced,
allowing only for the fact that only a MIDI sound module, and not a grand piano was producing
the sound.
This testing is described in more detail in [4].
One can judge for oneself from a cassette tape available from the author, price £2 (to cover costs
only). This also illustrates the process of building up the expressive interpretation.
References
[1] Anderson, T.M. (1990) E-Scape: an extended sonic composition & performance
environment. Proc. of ICMC, Glasgow 1990.
[2] Anderson, T.M. & Smith, C. (1996) 'Composability': Widening participation in music
making for people with disabilities via music software & controller solutions. Proc. Of
ASSETS 96 (ACM/SIGCAPH)
[3] Hunt, A.D. & Kirk, P.R. (1994) MIDIGRID - a computer based musical instrument. J.of
Musical Instrument Technology, June 1994
[4] Dalgarno, G. (1991) A computer based system to enable people who cannot use their hands

well, or at all, to produce music with their own individual expression. Dalgarno, G. Proceedings
of the Institute of Acoustics, November 1991, pp 275-283
[5] Dalgarno, G. (1997) Creating an expressive performance without being able to play a
musical instrument". Brit. J. of Music Ed.(1997) 14:2 116-171.
[6] Repp, B.H. (1992) Diversity and commonality in music performance: An analysis of timing
and microstructure in Schumann's 'Träumerei'. J. Acoust. Soc. Am. 92 (5), Nov. 1992
Acknowledgements
To Professor John A Sloboda for much advice, support and encouragement.
To the support staff, particularly Chris Woods, in the Dept. of Psychology at Keele for practical help.
To Dr Richard Parncutt (formerly of the Dept of Psychology at Keele) for many helpful discussions.
To Andy Hunt, of the Dept of Electronics, University of York, for many stimulating discussion and
for technical advice on MIDI systems.
To Professor John Paynter, Dr David Kershaw, Mr Richard Orton and Mr Bruce Cole of the Dept. of
Music, University of York, for their help and encouragement.
The work has been possible through grants from the following:-
The Orpheus Trust, The Calouste Gulbenkian Foundation, The Paul Hamlyn Foundation, The
Radcliffe Trust, The Arts Council of England, The Sport and Art Foundation, The National Lottery
Charities Board, whose financial support is gratefully acknowledged.
_______________________
Back to index

Intonation of embedded intervalls: Task adaptation to 2 tuning systems
Proceedings paper
REALTIME ANALYSIS OF DYNAMIC SHAPING
Jörg Langner
Humboldt University of Berlin
Musikwissenschaftliches Seminar
Unter den Linden 6, D-10099 Berlin, Germany
Phone: +49-(0)30-20932065
Fax: +49-(0)30-20932183
E-mail: jllangner@aol.com
Reinhard Kopiez
(Music Conservatoire Hannover)
Christian Stoffel
Martin Wilz
(University of Cologne)
Introduction
Compared to research on timing, the field of musical dynamics is a neglected parameter in performance research. For example, despite a
focus on musical rhythm, timing and performance, the latest edition of Deutsch's (1999) survey over a whole discipline, The psychology
of music does not even contain a sub-chapter on dynamics. Other research literature is widely distributed. We cannot say whether this
situation is due to a lack of interest but would rather assume that it is due to a lack of adequate research methods, which prevents a deeper
understanding of the nature of dynamics. To sum up we can formulate some important research topics:
● Although the history of performance practice shows the increasingly important role of dynamic shaping for conveying expression
in music, we know only very little about the relationship between musical form and musical dynamics. Based on musical
experience we can say that e.g. a Bruckner-Symphony is unimaginable without the form-generating force of dynamics.
● The relationship between timing and dynamics and its importance for musical perception is unclear: they either exist in a
hierarchical relationship (e.g. with a dominance of timing over dynamics), or are of equal importance. In the first case we could
assume that dynamics have only a small effect on a global level and a greater effect on a more local level - this contradicts musical
experience; in the second case the problem of redundancy is evoked: why should we take care of a second expressive parameter
(dynamics), if expression is already mediated by the domain of timing?
● Which methods of analysis of dynamics and of obvious presentation of results are available? This question concerns the field of
performance analysis as well as that of educational application. Only a highly obvious and easy manageable analysis of dynamics
will be accepted by the majority of instrumental teachers. This means a special need for realtime methods of analysis and
presentation.
Some answers to the above mentioned questions can be found in the literature: as one of the founding authors, Riemann (1884) published
a treatise on musical phrasing which concentrated exclusively on the role of dynamics and rubato. His simple assumption was that
dynamics and rubato are coupled, and that the development of an eight bar long musical phrase is shaped simultaneously by a crescendo
and an accelerando until the climax of the phrase. This more global perspective of dynamics seems to be more plausible. Huron's (1991;
1992) perception theory of "ramp archetypes" fits well into this perspective. Huron (1991) calculated a mean length of 4.3 bars for
crescendi and of 5.8 bars for decrescendi using a sample of 537 works or movements of 14 composers with a total of 85476 bars. The
same idea of a simple coupling of the two parameters can be found in Todd's (1990) model of musical expression: "the faster the louder,
the slower the softer" (p. 3540). We don't believe in such a simple, rule-based relationship of parameters and assume that this perspective
meets only a part of musical reality. This approach allows only a very limited view. However, as Friberg (1991) tried to show, it is
possible to generate decent synthesized performances by use of such a rule-based system.
We would like to try a different approach: referring to the "Theory of oscillating systems" (TOS) by Langner (1999) we hypothesize that
the timing and dynamics of a performance are shaped on multiple levels, including local and global layers. Local layers concern the
dynamic shaping for example from note to note or from measure to measure; global layers on the other hand are connected to the
relationship of dynamics between larger sections or subsections of a piece of music. (Such multi-level structure can also be found in
other domains of performance; for the analysis of timing see Langner & Kopiez 1995). An adequate method of performance analysis
should preserve the full information contained in the performance data (without any reduction), and as Langner (1999, pp. 153-155;
file:///g|/demos/Langner.htm (1 of 4) [18/07/2000 00:30:45]

1997) demonstrated, this multi-dimensional character of dynamic shaping is in strong concordance with musical perception and
experience.
If possible, analysis of dynamics should be done in realtime. Our demonstration uses recently developed software, and shows by use of
graphical output in the form of so-called "Dynagrams" some possibilities for application.
Method
The procedure will now be described, based on the assumption that the piece of music to be analysed is available as complete audio file.
Following that is an outline of the modifications made when the analysis 'works through' the music step by step in realtime.
(a) Non-realtime procedure
The starting point for the procedure is the digitized audio signal. From this (step 1) the loudness curve of the piece is calculated - this
means that at regular intervals in the piece a loudness value is allocated in Sone units. For this purpose a particular computer programme
was used which was developed by Bernhard Feiten & Markus Spitzer (Technological University of Berlin) on commission from the
Hochschule für Musik und Theater Hannover (see also Langner, Kopiez & Feiten 1998, pp. 18-20). This programme is based on
Zwicker's 'Model of loudness' (Zwicker & Fastl 1990, pp. 197-214), which guarantees close proximity to the perceived loudness, and
produces superior quality to the simple use of decibel values.
This loudness curve (step 2) was then subjected to multiple 'smoothing out' processes of varied strength. This 'smoothing out' was
achieved through the inclusion, when measuring at a particular point of time, of not only the loudness value at exactly this point but also
the surrounding values, thus creating a mean measurement. This 'surround' of the point is also termed the 'window' for calculation. The
wider the calculation window, the stronger the smoothing out effect. If one were to take in an extreme case the length of the entire piece
of music on a window, there would be only one mean value for the whole piece, and the smoothing out would therefore be at its
maximum. In contrast, a very narrow window would produce a smoothing out very similar to the original loudness curve. There are
many interim steps between these extremes. Concrete graphic examples are to be found in Langner (1997). A strong 'smoothing out' of
the loudness curve only represents differences (that is deviation from a horizontal line) when the appropriate wide-ranging dynamic
shaping exists; precisely these varied strength smoothing out cases allow multi-layer analysis as mentioned in the introduction. Our
procedure uses a wide spectrum of various window sizes. The exact range can be selected within the programme. A frequently applied
setting contains 37 different 'windows', sized between 0.25 and 128 seconds (in logarithmic steps).
Finally (step 3) the gradients in every smoothing out curve are calculated at each point in time. The procedure is similar to that of the first
derivative in mathematics and physics; here though the gradients according to each separate window size for each smoothing out curve
were calculated. The effect is that the strongly smoothed out curves (which generally show much weaker fluctuation) can obtain just as
steep gradients as the more weakly smoothed out. There is a sense of 'equal treatment' under the contrasting smoothing realignments.
Following from this change to gradients, the analytical perspective changes from focusing upon loudness to focusing upon loudness
changes - that is from loud/soft to crescendo/decrescendo. (This change in perspective stood the test of previous analyses; the final
decision as to whether loudness or loudness changes are actually represented has not yet been reached).
The output (step 4) is produced in graph form showing the gradients referred to, in a so-called "Dynagram" (see fig. 1 and fig. 2). The
time axis shows the horizontal axis; the window size is represented on the vertical axis. Red colouring signifies crescendo (the more
intense the red the stronger the crescendo); green colouring then shows decrescendo (the more intense the green, the stronger the
decrescendo).
(b) Realtime procedure
During the realtime version, the audio signal is recorded through a microphone link to the computer. The procedure described above is
carried out in the same way. In order to create the Dynagram it must be considered that in calculating the smoothing out, a certain
surrounding area of a point in time has constantly to be included. One must almost "look into the future" - for the weaker smoothing out
to a lesser extent, for the stronger one to a greater extent. (Such "looking into the future" can be considered with retrospective
re-interpretation of what is heard and is from this point of view plausible). As far as the procedure is concerned, the following is relevant:
the Dynagram can only be calculated retrospectively - with inconsequential delay for the small window sizes, but with considerable delay
for the large window sizes. The data points of a realtime Dynagram thus appear on the screen not in vertical axis form, but approximately
as a diagonal line.
Results
Figures 1 and 2 show the Dynagram of both a professional and a non-professional performance of Erik Satie's Gymnopédie No.1. The
generally more intense loudness pattern from the professional pianist is noticeable. Particularly remarkable is also the greater intensity in
the larger window area; the shading in this area reveals correspondence of loudness organization to formal structure. The composition
namely is made of two identical parts, each of which again consists of two almost equal sized sections. (To demonstrate these structural
segments, the start of each section is marked in the upper horizontal frame of the Dynagram.The starting points of the two main parts are
black; the less strong formal devisions are in contrast coloured grey). It is clear from the Dynagram of the professional player's version
that each of the four formal sections is covered from a red-green pair in the window size area from 8 to 16 s, just as the two halves in the
vicinity of 32 s. Clearly the professional pianist is capable of pointedly marking the structure of the composition through control of

loudness (in other words with "arches" from crescendi and decrescendi); the non-professional meanwhile achieves this only to a smaller
extent.
Fig. 1: Dynagram of a professional performance of Erik Satie's Gymnopédie No.1. The different colours have the following meaning:
intense red = strong crescendo, pale red = weak crescendo, white = constant loudness, pale green = weak decrescendo, intense green =
strong decrescendo. The dynamic shaping reflects clearly the formal structure of the composition (the formal breaks are marked in the
upper horizontal frame).
Fig. 2: Dynagram of a non-professional performance of Erik Satie's Gymnopédie No.1. The dynamic shaping is not as strong as in the
professional performance and reflects the formal structure of the composition less clearly.
Further examples fitting to the procedure showed the Dynagrams to be a way of making visible in particular more extensive loudness
shaping of a performance. The analysis of a recording of a movement a Bruckner Symphony (conducted by Günter Wand) for instance
revealed a build-up spanning some 20 minutes from start to the final climax of the piece.
Discussion and perspectives of application

The analyses carried out up to this point give reason to believe that in the Dynagrams important attributes and qualitative characteristics
can be registered. In particular the absence or presence of wide-ranging loudness shaping is clearly visualized. This can be seen in the

lower part of the Dynagram, the area for larger window sizes. These characteristics of the procedure render it a useful tool for
performance research. It brings into play a new analytical tool with which the issues briefly raised in the introduction - those of the role
and significance of dynamics - can be tackled in a new light. The option of carrying out the analysis in realtime, simply with a
microphone and standard computer, opens up a whole new perspective for application in instrumental lessions and musicians' practice
time: the Dynagram is displayed on a monitor while the pupil plays, and facilitates both pupil and teacher instant analysis of strength and
weakness. The Dynagrams can be saved and printed out, and thus kept for comparison at a later date; a pupil's progress can in this way be
documented. The pupil additionally has a mean to self-analysis.
Further possibilities for the analysis of music will be enabled by realtime transfer of several additional procedures. See here the
contribution from Langner "Rhythm, Periodicity and Oscillation" in the accompanying volume, in which an online-compatible process
encompassing rhythmical qualities is outlined. Both papers are part of a wider research project which reaches out over all areas of music,
and in particular covers analytical processes for harmony and melody (Langner 1999, pp. 156-157).
References
Deutsch, D. (Ed.) (1999). The psychology of music. 2nd edition. New York: Academic Press.
Friberg, A. (1991). Generative rules for music performance: A formal description of a rule system. Computer Music Journal, 15(2),
56-71.
Huron, D. (1991). The ramp archetype: A score-based study on 14 piano composers. Psychology of Music, 19, 33-45.
Huron, D. (1992). The ramp archetype and the maintenance of passive auditory attention. Music Perception, 10(1), 83-92.
Langner, J. (1997). Multidimensional dynamic shaping. In A. Gabrielsson (Ed.), Proceedings of the third triennial ESCOM conference,
Uppsala, Sweden, 7-12 June, 713-718.
Langner, J. (1999). Musikalischer Rhythmus und Oszillation. Eine theoretische und empirische Erkundung. [Musical rhythm and
oscillation. A theoretical and empirical investigation]. Dissertation, Hochschule für Musik und Theater Hannover. (A printed version of
this dissertation, including a comprehensive abstract in English, will be published by Peter Lang Verlag, Frankfurt/Main in 2000 or
2001).
Langner, J. & Kopiez, R. (1995). Oscillations triggered by Schumann's "Träumerei": Towards a new method of performance analysis
based on a "Theory of oscillating systems" (TOS). In A. Friberg & J. Sundberg (Eds.), Proceedings of the KTH Symposium on Grammars
for music performance, Stockholm, May 27, 45-58.
Langner, J., Kopiez, R. & Feiten, B. (1998). Perception and representation of multiple tempo hierarchies in musical performance and
composition. In R. Kopiez & W. Auhagen (Eds.), Controlling creative processes in music (pp. 13-35). Frankfurt a.M.: P.Lang.
Riemann, H. (1884). Musikalische Dynamik und Agogik. [Musical dynamics and agogics]. Hamburg: Rather.
Todd, N.P. McAngus (1990). The dynamics of dynamics: A model of musical espression. Journal of the Acoustical Society of America,
91(6), 3540-3550.
Zwicker, E. & Fastl, H. (1990). Psychoacoustics. Berlin: Springer.
Back to index

IMEA All-State Conference, Friday, January 30, 1998
Proceedings Demonstration paper
Music Composing Software for People from Ages 6 to 60

(all software available for Mac and Wintel Machines)
Peter Webster, Professor of Music Education and Technology, Northwestern

University
pwebster@northwestern.edu School of Music, Evanston, IL 60208
Thinking Things 2 CD Edmark $19.95 http://www.edmark.com/prod/tt/

This software is one of a series of CDs that provide thinking games for children 6-12 (our college
students love it too!) Two of the games have a musical theme and are good for exploring music
composition skills. The "Oranga" figure (and his friends) allow children to sequence sounds. Sounds
can be changed and custom sounds can be recorded. Two perception games add interest. The
"Tooney" game involves some interesting performance tasks and some listening.
Music Ace I and II CD Harmonic Vision $49.95 http://www.harmonicvision.com/
Music Ace is a program that is aimed primarily at notation and keyboard literacy using 24 lessons and
games. A second component, the Doodle Pad, allows students to compose as well as view prerecorded
pieces. Six different timbre are represented by different colored notes that "sing". Graphics and
animation suited for elementary and middle school students.
Making Music/Making More Music CD/MIDI Voyager/LearnTech, Inc. $29.95
http://voyager.learntech.com/
Making Music and Making More Music are designed by electronic music composer, Mort Subotnick.
They allow children to interact with the computer to create compositions without using traditional
notation. A drawing metaphor is used and children have extraordinary control over timbre, melodic
manipulation and rhythmic design. The program also includes several games. Making More Music
allows the transfer of the sketched music to traditional notation.
Rock Rap & Roll CD Silver Burdett Ginn/Pearson $33.34
http://www.sbgschool.com/products/music.html
Rock Rap & Roll is a piece of composition/improvisation software that features 10 Pop styles of
music. It allows students to "compose" by deciding on the sequence of 10 sound clips. Students can
then use the computer keyboard or mouse to "improvise" to their composed foundation. Songs can be
recorded, saved, and played back. Recording sounds is also possible with proper hardware. Depending
on musical experience level, this program can be used for elementary through college age students.
Demonstration songs are included for each style.
Band in a Box 8.0 MIDI PG Music $88 http://www.pgmusic.com/band.htm
file:///g|/demos/Webster.htm (1 of 2) [18/07/2000 00:30:46]

IMEA All-State Conference, Friday, January 30, 1998
Designed to be used in the accompaniment of improvisation, BIAB has several resources useful in
teaching composition. Students can experiment with harmonic progressions, enter melodies over
harmonies (with MIDI setup), or analyze a wide variety of styles. Teachers can also prepare cassette
tapes for students to practice composing melodies over. It is most useful with a MIDI keyboard, but
serviceable with Quick Time Instruments. Demonstration tracks of a variety of styles are included.
Learning curve of the program for student use suggests middle and high school students.
MicroLogic AV MIDI/Digital Audio Emagic $89
http://www.emagic.de/english/products/logicline/mlav.html
MicroLogic AV is the inexpensive, multifunctional entry level program of the Logic series. With up
to 16 audio tracks, easy real-time effects, the integrated stereo sample editor and virtual General MIDI
mixing consoles, MicroLogic AV gets the user familiar with desktop studio technology. Innovative
details such as the interactive real-time windows allows for easy use.
Print Music MIDI Coda $100 http://www.codamusic.com/
A subset of the the professional level music notation program, Finale 2000. This program offers
power and sophistication similar to Coda's Finale, but less staves and fewer options. Still offers the
same user interface and many of the options needed for most score preparation. Excellent value and
allows a student to "move up" to the professional-level Finale 2000 without having to learn a whole
new system.
Note: Other notation programs include: Overture 2 from Cakewalk, Finale 2000 from Coda; Encore
and MusicTime from Gvox, and the newest program: Sibelius
Back to index
file:///g|/demos/Webster.htm (2 of 2) [18/07/2000 00:30:46]

Malloch
COMMUNICATIVE MUSICALITY AND HUMAN COMPANIONSHIP
Dr. Stephen N. Malloch
malloch@ozemail.com.au
Background:
Infants usually stimulate an affectionate adult, male or female, to extended

poetic or musical speech, which often moves into wordless song, or imitative,
rhythmic and repetitive nonsense sounds. While listening to this speech,
infants can discriminate its timing patterns, pitch, loudness, melodic interval
and voice quality. Indeed, an infant can learn its mother's voice from before
birth, and can recognise melodies or poetic verses that were presented for it
to hear prenatally. Baby and mother listen to one another's sounds, creating
co-operative patterns of vocalisations. Indeed, in optimal
'proto-conversations', expressive phrases are precisely alternated or
synchronised between the infant and the adult. It appears that the mother's
intuitive behaviour supports the infant's innate communicative capacities.
Aims:
To demonstrate the existence of an innate 'Communicative Musicality' in the

joint vocalisations of mothers and their infants, consisting of the elements
Pulse, Quality and Narrative. This Communicative Musicality facilitates
communication and companionship, and allows for the sympathetic exchange of
feeling during mother/infant interactions.
Main contributions:
We demonstrate the vital importance of what is usually regarded as musical

parameters in mother/infant communication. While these parameters have been
observed in the past, here is presented clear evidence of the existence of
highly precise timings, pitch matching and narrative structures in
mother/infant vocal communication.
Implications
The implications for the understanding of Music Therapy are clear. Here we see
that musicality is at the very heart of human companionship and human sympathy
- thus, we argue that Music Therapy works by engaging the very foundations of
human sympathetic emotional exchange.
Back to index
file:///g|/Sun/Malloch.htm [18/07/2000 00:30:46]

VIBRATO RATE AND EXTENT IN VIOLIN PERFORMANCE
Proccedings abstract
VIBRATO RATE AND EXTENT IN VIOLIN PERFORMANCE
Julieta E. Gleiser and Anders Friberg, Department of Speech, Music and
Hearing (TMH), Royal Institute of Technology (KTH), Stockholm
Background.
Vibrato is an important means of expression in violin playing. Violinists have

a relatively large degree of control of the vibrato rate and extent compared to
e.g. singers. This makes the violin vibrato an interesting performance variable
to study, although it has, with a few exceptions, been neglected in previous
research.
Aims.
To measure the vibrato rate and extent in real performances and examine the
results in the context of dynamics, timing and phrase structure among others.
Method.
Four professional violinists performed six pieces each with piano

accompaniment. The recordings were automatically analysed, yielding continuous
measures as well as mean values for each tone, sound level, pitch, vibrato rate
and vibrato extent. In addition, interonset tone durations were measured.
Results.
Both rate and extent varied in a systematic way and the range of variation was
larger than the perceptual limits. Preliminary observations indicate that
vibrato extent is correlated with sound level and that the rate is higher for
shorter tones and usually increases at the end of tones (cf. the previously
repported "vibrato tails").
Conclusions.
The systematic variation of rate and extent confirms that the vibrato is used
for expressive purposes. The results indicate that vibrato is related to other
performance variables as mentioned above.
Back to index
file:///g|/Sun/Gleiser.htm [18/07/2000 00:30:47]

MELODIC CUE ABSTRACTION, SIMILARITY AND CATEGORY FORMATION: A
Proceedings paper
Melodic Cue Abstraction, Similarity and Category Formation:

A Computational Approach
Emilios Cambouropoulos
Austrian Research Institute for Artificial Intelligence
Vienna, Austria.
emilios@ai.univie.ac.at
1. Introduction
One significant component of musical understanding is the ability of listeners to cluster together musical materials into categories such as motives,
themes and so on. Salient musical features enable listeners to make similarity judgements between various musical materials and to organise these
materials in meaningful groups. It is maintained, in this study, that feature salience, similarity judgements and categorisation processes are
inextricably bound together in a way that each of these can be defined only in relation to the rest.
Based on a number of definitions for the above notions the Unscramble clustering algorithm has been developed. Given a segmentation of a
melodic surface and an initial representation of each segment in terms of a number of attributes (these reflect melodic and rhythmic aspects of the
segment at the surface and at various abstract levels), the Unscramble algorithm organises these segments into 'meaningful' categories. The
proposed clustering algorithm automatically determines an appropriate number of clusters and also the characteristic (or defining) attributes of
each category. There has been a limited number of attempts to use clustering techniques for organising melodic segments into motivic categories.
A brief survey and comparison of some existing formal models is presented in (Hoetheker et al., 2000).
A number of psychological studies have attempted to examine the notions of melodic similarity and cue abstraction using real melodic material
(e.g. Pollard-Gott, 1983; Carterette et al., 1986; Lamont and Dibben, 1997). The most extended studies however have been performed by I.
Deliège - see overviews in (Deliège, 1997; Deliège and Mélen, 1997) - wherein issues of feature salience (cue abstraction), musical similarity and
prototypical description of categories (imprint formation) in musical listening are empirically examined.
It is interesting to compare the performance of a computational model against the results given in empirical studies. A computational approach
requires explicit representations of the musical materials and detailed formal descriptions of similarity and categorisation processes. The various
processes can thus be traced and analysed step-by-step in a way that usually is not possible in empirical studies.
This study attempts to replicate, by means of computational modeling, two psychological experiments on cue abstraction and categorisation
performed on a monophonic piece by J.S.Bach (Deliège,1996; 1997). The results of the computational approach are compared to the empirical
results, and convergences and deviations are reported. The clusters produced by the algorithm correspond closely to the categories provided in the
empirical study. The application of the algorithm confirms most of the suggestions presented in the psychological studies regarding which cues
play a most significant role in categorisation tasks.
In the following sections, initially the concepts of similarity and categorisation will be discussed. Then, the Unscramble algorithm will be
described. Finally, results of the application of the algorithm on motivic segments of J.S.Bach's Allegro Assai, Finale of the Sonata for Solo Violin
in C major BWV 1005 will be presented and various interesting aspects of the computational experiment will be discussed.
2 Similarity and Categorisation
A commonly encountered hypothesis on which many categorisation models are grounded is that categorisation is strongly associated with the
notion of similarity, i.e. similar entities tend to be grouped together into categories.
However, there are different views on the relation between similarity and categorisation (Goldstone et al., 1994; Medin et al., 1993). On the one
hand, similarity is considered to be too flexible and unwieldy to form a basis for categorisation, i.e. any two entities may be viewed as being
similar in some respect (e.g. a car and a canary are similar in that both weigh less than 10 tons, but these objects are not normally considered to be
members of the same category!). On the other hand, similarity is regarded to be too narrow and restricting to account for the variety of human
categories (e.g. a whale is more similar to other fish but we still consider it to be a mammal). Goodman (1972) doesn't hesitate to call similarity 'a
pretender, an impostor, a quack' (p.437). Rips (1989) claims that "there are factors that affect categorisation but not similarity and other factors that
affect similarity but not categorisation. ...there is a 'double dissociation' between categorisation and similarity, proving that one cannot be reduced
to the other" (p.23).
The above debate is directly linked to a further issue; that is how entities and their properties are represented. If objects are described in terms of
mainly perceptual (e.g. visual or auditory) properties, then, obviously similarity is insufficient for many categorisation tasks, whereas, if any sort
of properties - perceptual or abstract or relational - are considered then similarity becomes too flexible.
It seems that the notions of categorisation, similarity and the representation of entities/properties are strongly inter-related. It is not simply the case
that one starts with an accurate description of entities and properties, then finds pairwise similarities between them and, finally, groups the most
file:///g|/Sun/cambouro.htm (1 of 8) [18/07/2000 00:30:58]

similar ones together into categories (figure 1a). It seems more plausible that as humans organise their knowledge of the world, they alter their
representations of entities concurrently with emerging categorisations and similarity judgements (figure 1b).
Figure 1 Relations between entities/properties, similarity and categorisation

One of the main assumptions made in this study is that similarity always depends on context (i.e. it is contextually defined), and when similarity
seems to be relatively stable, this is so simply because the context - e.g. the structure of the natural world or a specific cultural system - tends to be
quite stable. Of course, there are some general perceptual constraints as to what is perceptible in the first place, but from there on different
properties of entities become more prominent in a given context for a specific categorisation task or for a similarity judgement. Tversky (1977) has
highlighted the importance of context in similarity judgements and has shown how properties of objects become diagnostic within a specific
context; he treats, however, these contextual effects on similarity as specific cases/exceptions rather than the norm (his definition of similarity is
independent of categorisation).
A musical work may be considered as a local context within which things like motives, themes, harmonic progression groups etc. emerge. Trying
to discover the similarity of two isolated musical passages will usually produce dubious or relatively uninteresting results. Consider, for instance,
the following two rhythmic figures: x x x x & e e. Are these two similar or dissimilar? Some might say yes, others no, and still another group might
refuse to make a judgement. It is suggested that perhaps this similarity query is simply ill-designed in the first place, and perhaps subjects of the
third group are right in refusing to make a judgement. The problem seems to be that these excerpts are out of context. For instance, these two
figures are rather dissimilar - actually contrasting - within the homogeneous rhythmic context of J.S.Bach's Allegro Assai, Finale of the Sonata for
Solo Violin in C major BWV 1005 (the eighth-note figure may be seen as a distinctive feature characterising one of the two main motives), whereas
they are more similar within the more diverse context, e.g., of the third section of Mozart's Sonata in C major KV330 in which sixteenth-note
triplets are quite common. Context seems to be paramount in our establishing similarities and categories between musical passages and it is
asserted that it is not possible to find an absolute criterion for defining what things are similar in general.
In the light of the above discussion, formal definitions of similarity and category will be given wherein the two notions are inter-dependent, i.e.
changes in similarity result in category changes, and the converse. These definitions form the basis of the Unscramble algorithm.
3. The Unscramble clustering algorithm
In this section a brief description will be given of the Unscramble clustering algorithm that has been developed primarily for dealing with
clustering problems in the musical domain; a more detailed description that includes also computational complexity issues and more musical
examples is given in (Cambouropoulos et al 1999, 2000a).
The Unscramble algorithm is a clustering algorithm which, given a set of objects and an initial set of properties, generates a range of plausible
clusterings for a given context. During this dynamically evolving process the initial set of properties is adjusted so that a satisfactory description is
generated. There is no need to determine in advance an initial number of clusters nor is there a need to reach a strictly well-formed (e.g.
non-overlapping) categorisation. At each cycle of the process weights are calculated for each property according to how characteristic each
property is for the emergent clusters. This algorithm is based on a working definition of similarity and category that inextricably binds the two
together.
3.1 A Working Formal Definition of Similarity and Categorisation
Let T be a finite set of entities and P the union of all the sets of properties that are pertinent for the description of each entity. If d(x,y) is the
distance between two entities x and y, and h is a distance threshold we define similarity sh(x,y) as follows:
1 iff d(x,y) ≤ h (similarity)

sh(x,y) { (I)
0 iff d(x,y) > h (dissimilarity)
In other words, two entities are similar if the distance between them is smaller than a given threshold and dissimilar if the distance is larger than
this threshold.
The above definition of similarity is brought into a close relation with a notion of category. That is, within a given set of entities T, for a set of
properties P and a distance threshold h, a category Ck is a maximal set:
Ck={x1,x2,...xn|xi∈T} with the property: ∀i,j∈{1,2,...n}, sh(xi,xj)=1 (II)
In other words, a category Ck consists of a maximal set of entities that are pairwise similar to each other for a given threshold h.
A category, thus, is inextricably bound to the notion of similarity; all the members of a category are necessarily similar and a maximal set of
similar entities defines a category. According to definition I, similarity is not merely the inverse of distance, but additionally requires a threshold
that can be determined in relation to a specific categorisation description for a given context.
As the similarity function sh is not transitive, the resulting categories need not be disjoint (i.e. equivalence classes). In other words, overlap

between categories is permitted.
The distance threshold may take values in the range of 0≤h≤dmax where the distance dmax is defined as the maximum distance observed between all
the pairs of entities in T. For h=0 every object in T is a monadic category; for h=dmax all the objects in T define a single category.
3.2 The Unscramble algorithm
The above definitions of similarity and category can be restated in the terminology of graph theory as follows: objects are represented by vertices
in an undirected graph, similarity between similar objects is represented by edges, and categories are defined as maximal cliques (a maximal clique
is a maximal fully connected sub-graph). We will use this terminology below for the description of the Unscramble clustering algorithm.
It should also be noted that in the context of this paper 'properties' are taken to mean 'binary features' that correspond to a particular
'attribute-value' pair.
3.2.1 Algorithm input
The input to the Unscramble algorithm is a set of N objects each described by an m-dimensional property vector (e.g. object x: [p1,p2,...pm] and
object y: [q1,q2,...qm]). Each property has a corresponding initial weight wp=1. The (weighted) distance between two objects is given by the
following function (based on the Hamming distance):
m
d(x,y)= Σ w ·w
pi qi ·δ (pi,qi)
where: δ (pi,qi) = 0 if pi=qi and δ (pi,qi) = 1 if pi≠qi (III)
i=1
3.2.2 The algorithm

The algorithm proceeds in cycles; in each cycle, firstly, all the possible thresholds are calculated, then for each threshold an undirected graph is
constructed (edges connect similar objects), then for each graph maximal cliques are enumerated, then for each clustering a 'goodness' value is
computed and finally the clustering with the highest 'goodness' value is selected and new weights for all the properties are computed. A more
detailed description is given below:
Step 1. All the possible threshold values h are calculated. The number of thresholds l is equal to the number of possible distances between the N
objects of set T; lmax = N·(N-1)/2 - it often is smaller as some entities are equidistant.
Step 2. For each of these thresholds, all the similar objects are computed according to definition (I) and (III) and an undirected graph for each
threshold is created where edges connect similar objects.
Step 3. All the maximal cliques (II) are computed for each of these graphs, resulting in l different clusterings.
Step 4. For each of the l clusterings a 'goodness' value is calculated according to a 'goodness' function (section 2.3.2.1)
Step 5. The clustering that rates highest according to the 'goodness' function is selected and new weights are calculated according to function (IV).
Step 6. The algorithm is repeated from step 1 for the new weights.
Step 7. The algorithm terminates when the newly calculated 'goodness' value is less or equal to the value that resulted during the immediately
preceding run.
3.2.3 Additional fundamentals
The following definitions are also necessary for the algorithm:
3.2.3.1 'Goodness' of clustering
As the Unscramble algorithm generates a large number of clusterings (one for each possible similarity threshold) it is necessary to define some
measure of 'goodness' for each clustering so as to select the best. Two such measures have been considered:
a) Overlap Function. This simple function provides a measure for the degree by which clusters overlap; the less overlapping between clusters the
better.
b) Category Utility. This function favours categorisations with high uniformity (in terms of properties) within individual clusters ('intra-class
similarity') and strong differences between clusters ('inter-class dissimilarity'). Another way of interpreting this is that category utility measures the
prediction potential of a categorisation: it favours clusterings where it is easy to predict the properties of an entity, given that one knows which
cluster it belongs to, and vice versa (Gluck and Corter, 1985)
In the experiments reported in section 4, category utility has been used. The main advantages of this measure are its firm grounding in statistics, its
intuitive semantics, and the fact that it does not depend on any parameters. These measures are discussed in more detail in (Cambouropoulos et al.,
1999).
3.2.3.2 Weighting function.
When a clustering is selected, then the initial weights of properties can be altered in relation to their 'diagnosticity', i.e. properties that are unique to
members of one category are given higher weights whereas properties that are shared by members of one category and its complement are
attenuated. A function that calculates the weight of a single property p could be:
w =  m/n-m'/(N-n) where: (IV)

m = number of objects in category Ck that possess property p

m' = number of objects not in category Ck that possess property p (i.e. objects in T-Ck)
n = number of objects in Ck
N = number of objects in T
The maximum weights of each property calculated for each category are then selected for a given clustering. The whole process may be repeated
for the new set of weighted properties until the terminating conditions of Unscramble are met.
3.2.4 Categorising new objects
Using the above weighting function, a prototype for each cluster can be computed by selecting the most salient values (values with highest
weights) for each attribute. The threshold criterion for each cluster is simply the maximum distance of the prototype to all of the members of the
cluster. A new object is a member of a cluster if its distance to the cluster's prototype is less or equal to the threshold criterion. If a new object has
a new attribute value this receives a zero weight (i.e. it is non-diagnostic). This way Unscramble can be used not only for clustering but also for
membership prediction tasks.
The most useful characteristics of the Unscramble algorithm - depending on the task at hand - are the following:
• there is no need to define in advance a number of categories
• the prominence of properties is discovered by the algorithm
• categories may overlap
• the description of the category intesion is explicit and can readily be used for membership prediction of new objects.
Some examples of the usefulness of such clustering characteristics will be presented in the test case given in the next section.
4 Applying the clustering algorithm on Bach's Finale of the Sonata for Solo Violin in C major
In a series of experiments, I. Deliège (1996, 1997) has studied melodic cue abstraction and categorisation processes. For these experiments
J.S.Bach's Allegro Assai, Finale of the Sonata for Solo Violin in C major BWV 1005 has been used. Initially, this monophonic piece is segmented
manually into a number of segments, namely, 26 instances of the two main motives (figure 2) and 42 1-bar length motives (for the transitional
sections). A more detailed analytic description of this piece is presented in Deliège (1996).
From the various experiments reported in the above two studies only two that relate to the current computational experiments will be briefly
described (referred to as experiments I and II in this paper). In experiment I (Deliège 1996) subjects are asked to listen repeatedly to the two main
reference motives (instances 1 and 12 in figure 2) and then classify the rest of the derivative motives (instances 2-11 and 13-26) into two motive
categories (A and B). Musicians perform this task without any mistakes whereas non-musicians have a success rate of 90%.
In experiment II (Deliège 1997) subjects listened only to the first section of this piece (dotted line in figure 2 separates motive instances of the two
sections); then they were presented with items from the first section (HEARD), second section (UNHEARD) and modified items that do not
appear in the piece (MODIFIED) and were asked to state whether they had heard these items or not. "Evidence for the formation of an imprint will
be considered to be present if, at the end of this incomplete listening, the subjects 'erroneously recognise' as having heard excerpts coming from
the as-yet-unheard section of the piece." (Deliège 1997:61) In addition, the specific modified items of this experiment should be singled out as
being unheard if the hypothesis that 'an imprint will reflect the style of the piece' is correct. The results supported these assumptions (the best
results, i.e. correct responses, were given for the HEARD and the MODIFIED items).
The above two experiments prompted the design and realisation of two similar computational experiments. These are not exact replications of the
empirical experiments for reasons that will be explained below.
4.1 Computational Experiment I
Experiment I was very simple to replicate on the computer. All that was necessary was to calculate the distance of each of the motive derivatives
(2-11 and 13-26 in fig 2) to the two main reference motives (1 and 12) and to group each derivative with the reference motive that was closest
(distance metric as in section 3.1). This task was performed correctly for all the instances.
A more interesting, in computational terms, experiment was additionally designed. This involved using the Unscramble algorithm for clustering all
the motives of figure 2. Motives are represented as lists of musical attributes (these attributes reflect melodic and rhythmic aspects - see below).
The Unscramble algorithm is then applied for organising these segments into categories based on a similarity measure calculated for each pair of
segments. The algorithm does not know in advance how many clusters it has to construct (it has to determine this automatically). Additionally, it
has to determine the most characteristic (diagnostic) attributes and the prototypes for each cluster.
Representing melodic segments is a complex issue (see Cambouropoulos et al., 2000b). For practical reasons, the following set of simple
representations will be tried out in this experiment (each motive can be split in two 1-bar length cells a and b):
R1: diatonic pitch, contour and duration patterns for full-length motive (3 attributes)
R2: diatonic pitch, contour and duration patterns for full-length motive, for cell a and cell b (9 attributes)
R3: same as R2 only cells of motive 3 are reversed
R4: same as R3 plus 3 extra statistical attributes: leap, rep, cdir that are meant to reflect melodic properties such as 'smoothness', 'repetitiveness'

and 'change of direction').
leap: if number of leaps in motive =< 30% then attribute value 1; otherwise value 2
rep: if number of maximally repeated note =< 30% then value 1; otherwise value 2
cdir: if number of changes in direction =< 50% then value 1; otherwise value 2
The Unscramble algorithm gives the results below for each of the above representations (the numbers in each cluster indicate the motives of figure
2):
R1: {1,2,4,5,7,8,9,10}, {3}, {6,11}, {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}
R2: {1,2,4,5,7,8,9,10}, {3}, {6,11}, {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}
R3: {1,2,4,5,7,8,9,10}, {1,2,6,7,8,11}, {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26} - first cycle
{1,2,3,4,5,6,7,8,9,10,11}, {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26} - final
R4: {1,2,3,4,5,6,7,8,9,10,11}, {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}
As the representation of the motives in terms of attributes becomes richer, the resulting cluster descriptions improve (they actually give the correct
results for R3 and R4). For R1 and R2 motive 3 is a monadic category (this is because its two component cells are reversed) and motives 6 and 11
are clustered together (these two motives end with a quarter-note - actually they end the first and second sections of this piece). It is also clear that
the motive B cluster is much 'sharper', i.e. the resemblance of its members is much stronger, as it appears even for R1 and R2.
For each of these clusters a prototype is computed that comprises of the most characteristic attribute values. This prototype may not coincide with
any of the existing motives. Of course, anteriority (i.e. the temporal order in which motives appear) is not taken into account so it is not expected
that the reference motives 1 and 12 will be the prototypes given by the algorithm. For the clustering description produced for R4 the most
characteristic attributes are:
Cluster {1,2,3,4,5,6,7,8,9,10,11}: rhythmic pattern of the cell that contains the two eighth-notes, and the three statistical attributes.
Cluster {12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}: rhythmic pattern of the full-length motive, contour of the first cell and the
three statistical attributes.
It can be seen from this experiment that Unscramble is successful in clustering correctly the given motives. It finds the correct number of clusters
for R3 and R4 and also highlights the most prominent cues that are responsible for clustering the motives together (these cues correspond closely
to the cues suggested in the empirical study).

[Back to Deliège paper]
4.2 Computational Experiment II

In experiment II, described in the beginning of section 4, listeners are merely exposed to the first section of this piece and then asked if the have
heard a number of motivic items presented to them or not.
The task of making sense of a monophonic section by breaking it down to constituent parts and organising those into meaningful categories is a
very complex process at least as far as computational models are concerned (see Cambouropoulos 1998 for a proposed computational model). For
practical reasons, in this study, the motivic material of section 1 presented in figure 2 (above dotted line) is taken as a given (rather than being
extracted automatically from the musical surface). These motives (instances
1-6 and 12-18) are presented to Unscramble for clustering. Then, a prototype and threshold criterion is computed for each category. Finally, the
HEARD motives of section 1, the UNHEARD motives of section 2 and the MODIFIED motives of figure 3 are categorised into the existing
clusters (N.B. in the empirical study full-length motives are only one third of the tested motives - and these are considered to be easier to classify
than shorter 1-bar length motives).
For this clustering experiment only representation R4 was used. For this representation, the Unscramble algorithm gives the following results:
R4: {1,2,3,4,5,6}, {12,13,14,15,16,17,18}
For each cluster a prototype is computed and also a threshold criterion is calculated that gives its outer-most boundaries.
Then, the distance of each of the HEARD, UNHEARD and MODIFIED motives to each prototype is calculated; if the distance is less than the
threshold criterion then a motive is said to be a member of that category.
All the HEARD motives are classified correctly (trivial), the UNHEARD motives are classified as expected, i.e. motives 6, 7, 8, 9, 10 and 11 are
members of {1,2,3,4,5}, and motives 18, 19, 20, 21, 22, 23, 24, 25 and 26 are members of {12,13,14,15,16,17}, and the MODIFIED motives
(figure 3) are not members of either category (as their distance to the prototypes is greater than the threshold criterion).

Even though experiment II is more difficult to replicate and the above computational experiment is a gross simplification, it is still very interesting
to see that Unscramble is capable of inducing descriptions of the emerging categories that can be used successfully both to classify correctly
UNHEARD instances and to exclude MODIFIED motives that don't fit in these categories for the particular stylistic context (mainly because of
the rhythmic differences).
5 Conclusions
In this paper a computational model for melodic clustering and membership prediction has been presented. An attempt was made to apply this
model on melodic data used in two psychological experiments. Despite the simplifications for the needs of the computational experimentation, it is
still clear that the results obtained by the application of the proposed algorithm support the underlying hypotheses of the empirical studies on cue
abstraction, imprint formation and categorisation - e.g. from various of different attributes Unscramble abstracted a number of cues that were
appropriate for the specific categorisation tasks, organised the given melodic segments into plausible categories and successfully categorised new
melodic material into the previously determined motivic groups.
It would be very interesting to attempt a more sophisticated computational replication of the aforementioned experiments. Ideally, the system
should be able to break down automatically the musical surface into meaningful segments, then construct sophisticated representations for each
segment and finally organise these into motivic categories (preliminary such attempts have been made by the author on other melodic data - a
robust model however has not as yet been achieved). Such computational experiments are interesting as the various stages of the analytic process
are transparent to the researcher and the initial hypotheses can be systematically studied.
Acknowledgements
This research is part of the project Y99-INF, sponsored by the Austrian Federal Ministry of Education, Science, and Culture in the form of a
START Research Prize. The Austrian Research Institute for Artificial Intelligence is supported by the Austrian Federal Ministry of Education,
Science, and Culture.
References
Cambouropoulos, E. (1998) Towards a General Computational Theory of Musical Structure. Ph.D. Thesis. University of Edinburgh, U.K.
Cambouropoulos, E., Smaill, A. and Widmer, G. (1999) A Clustering Algorithm for Melodic Analysis. In Proceedings of the Diderot'99
Forum on Mathematics and Music, Vienna, Austria.
Cambouropoulos, E. and Widmer, G. (2000a) Melodic Clustering: Motivic Analysis of Schumann's Träumerei. In Proceedings of the III
Journées d' Informatique Musicale, Bordeaux, France.
Cambouropoulos, E., Crawford, T. and Iliopoulos, C.S. (2000b) Pattern Processing in Melodic Sequences: Challenges, Caveats and
Prospects. Computers and the Humanities, 34:4 (forthcoming).
Carterette, E.C., Hohl, D.V. and Pitt, M.A. (1986) Similarities Among Transformed Melodies: The Abstraction of Invariants. Music
Perception, 3(4):393-410.
Deliège, I. (1997) Similarity in Processes of Categorisation: Imprint Formation as a Prototype Effect in Music Listening. In Proceedings of
the Interdisciplinary Workshop on Similarity and Categorisation, University of Edinburgh, U.K.
Deliège, I. (1996) Cue Abstraction as a Component of Categorisation Processes in Music Listening. Psychology of Music, 24:131-156.
Deliège, I. and Mélen, M. (1997) Cue Abstraction in the Representation of Musical Form. In Perception and Cognition of Music. I. Deliège
and J. Sloboda (eds), Psychology Press Ltd, Hove, U.K.

Gluck, M.A., and Corter, J.E. (1985) Information, Uncertainty, and the Utility of Categories. In Proceedings of the Seventh Annual
Conference of the Cognitive Science Society, Lawrence Erlbaum Associates, Irvine (Ca).
Goldstone, R.L., Medin, D.L. and Gentner, D. (1991) Relational Similarity and the Non-independence of Features in Similarity Judjements.
Cognitive Psychology, 23:222-262.
Goodman, N. (1972) Seven Strictures on Similarity. In Project and Problems, by N. Goodman, The Bobbs-Merrill Comany, Inc., New
York.
Hoetheker K., Hoernel D. and Anagnostopoulou C. (2000) Investigating the Influence of Representations and Algorithms in Music
Classification. Computers and the Humanities, 34:4 (forthcoming).
Lamont, A. and Dibben, N. (1997) Perceived Similarity of Musical Motifs: An Exploratory Study. In Proceedings of the Interdisciplinary
Workshop on Similarity and Categorisation, University of Edinburgh, Edinburgh.
Medin, D.L., Goldstone, R.L. and Gentner, D. (1993) Respects for Similarity. Psychological Review, 100(2):254-278.
Pollard-Gott, L. (1983) Emergence of Thematic Concepts in Repeated Listening to Music. Cognitive Psychology, 15:66-94.
Rips, L.J. (1989) Similarity, Typicality and Categorisation. In Similarity and Analogical Reasoning. S. Vosniadou and A. Ortony (eds),
Cambridge University Press, Cambridge.
Tversky, A. (1977) Features of Similarity. Psychological Review, 84(4):327-352.
Back to index

THE EXPERIMENTS AND RELATED PREDICTIONS
Proceedings paper
PROTOTYPE EFFECT IN MUSIC LISTENING:

AN EMPIRICAL APPROACH OF THE NOTION ON IMPRINT
Irène Deliège
URPM - CRFMW
In the introductory theoretical talk of this symposium, the different aspects of the
SIMILARITY-DIFFERENCE model were sketched briefly. Other approaches then addressed certain
particular aspects and investigated empirically the rôle of cue abstraction in the model in the
processes of segmentation, the organisation of memory and the categorisation of musical events
during listening. The present research is also concerned with the rôle of cue abstraction, but, more
precisely, with the effect of prototypicality-which I have called the formation of imprints-that is, the
result of insistent use by the composer of the same cue in more or less varied forms.
Recently, during a meeting that Pierre Boulez devoted to the analysis of his latest work Sur Incises
(filmed presentation, November 1999), it occurred to me that the two fundamental elements of the
model-the principles of SIMILARITY andDIFFERENCE and the mechanism of cue
abstraction-expressed certain processes developed by the composer in the production of his work.
Indeed, Pierre Boulez stressed the primary rôle played in his compositional technique of very short
basic figures-he labelled them according to their audible effect: the seed, the slap in the face etc.-and
showed how these cells were later exploited, developed and varied while still remaining recognizable.
He emphasized that it was necessary for the listener to pick out these figures if he or she was to be
able to keep track of the intended meaning of the composition.
The experimental results that have been presented have demonstrated the different effects of
SIMILARITY and DIFFERENCE, together with the effects of abstracted cues in the development of
the mental schema of a work. For all that, we cannot assume that the listener is in a position to
memorise exactly the various operations that have been performed on the the "primary cells"-the
cues-during the process of composition. In fact-and it is worth emphasising this point-memory
simplifies the global information and effectively finds "statistical means"-the imprints-that retain the
essential information about a collection of more or less similar presentations of a given cue.
In this way, the most important feature of the SIMILARITY-DIFFERENCE model is the idea of
similarity as a principal axis in the real-time processes of categorisation that occur during listening
and its organisation around some central examplar of a given cue-the prototype. Intuitively, this
argument seems plausible in the case of music perception. It formed the basis of a number of
theoretical and empirical studies in the field of categorisation of the environment carried out during
file:///g|/Sun/deliege2.htm (1 of 10) [18/07/2000 00:31:01]

the seventies including the work of Rosch which we have referred to above in connection with our
own research.
At around the same time, the so-called "exemplar models" (see in particular Medin and Schaffer
1978), also involved the idea of similarity but gave less emphasis to the idea of a central prototypical
tendancy and recognized the important influence of specific traits of the different "exemplars" of a
given category which were supposed to be saved in memory. These approaches seem at first sight to
be close to each other and moreover seem not to contradict each other. According to Barsalou (1990),
it is even hard to distinguish between them on an empirical level. On the other hand, certain schools of
thought, perhaps too quickly influenced by a statement by the philosopher Nelson Goodman, cited
many times in this kind of context, which presented the concept of similarity as "a pretender, an
impostor, a quack" (1972, p.473) on the grounds that it was too vague and elastic, have developed
other theoretical arguments, demolishing the principal rôle that had been accorded up to that point to
similarity in categorisation and, consequently, the rôle of the prototype.
To put it briefly, we are talking here about the ad hoc models (Barsalou, 1983) that suggest that other
categorical forms could be developed on quite different principles, in particular, on the basis of
particular circumstances-for example, the collection of things that one gathers together to be taken on
a journey, or to make a meal, etc. To that extent, these models seem to be closely linked with Schank
and Abelson's (1977) idea of scripts and Minsky's (1975) idea of frames. About half a century before
that, Vigotsky and Luria (in Wertsch, 1985) distinguished between contextualised and
decontextualised categorisation behaviour that was strongly influenced by the educational background
of the individual. They noted, in particular, that the illiterate peasants of Uzbekistan grouped objects
only according to use or activities performed in the context of their daily life. These authors
emphasised that this type of behaviour becomes blurred as soon as literacy intervenes and disappears
completely after one or two years of education giving way to a process that was more guided by
abstract principles.
We would therefore recognise the existence of a plurality of categorisation strategies, adapted to the
circumstances and requirements of the subject which initiate grouping processes that are either
essentially functional where the context and thematic aspects are more important than relationships of
similarity; or taxonomic where the notion of similarity is predominant. We have proposed that the
processes of categorisation that occur in real-time during listening belong to the latter category.
Moreover, it would seem that this mode of categorisation could be considered relatively "universal," it
being found in a variety of cultural environments. It is also this mode of categorisation that is chosen
naturally by a child of about 5 years of age: it has been noticed that at this age, classifications operate
on the principle of "global similarity" rather than being a function of particular attributes (Smith,
1981; Keil, 1987). These various reports have given rise in recent times to a resurgence of interest in
the idea of similarity on the part of a number of authors who are today, once again, pleading in favour
of this point of view (Goldstone 1995; Hampton 1997). Not so long ago, James Hampton expressed
this idea in unequivocal terms as follows:
for everyday purposes we are content to continue putting together things that are
(superficially or deeply) similar. After all, such a system serves us perfectly well for most
daily purposes. (1997, p. 109)

We now report a new study which aims to develop the results of a preliminary experiment (for details,

see Cambouropoulos, this symposium) and to approach the possible effect of new experimental
variables on the imprint formation process. Therefore new conditions were introduced allowing to
investigate not only the role of musical training (T), but also the effect of (i) familiarisation (F) with
the musical context, (ii) length of the experimental sequences, and (iii) musical parameter concerned,
i.e. rhythmic vs pitch errors, introduced in the modified sequences. These aspects will be analyzed
under the factor EXPERIMENT (E).
As already explained, an imprint is considered to be an average value of the characteristics of a
category. Thus sequences which exhibit too distant features in comparison to this central value, should
induce hesitating responses. Moreover, questions already raised in the preliminary experiment
concerning the effect of the amount of information processed - one or two-bar items -, will receive
here a more extended analysis. In addition, it is here hypothesized that the number of familiarization
listenings to the piece (factor F) should improve subjects' performances. Finally, it seems reasonable
to expect a more significant effect for the rhythmic than for pitch modifications: a change of pitch
might even not be noticed whereas a rhythmic modification seems to damage more effectively the
integrity of the memorized musical sequence. The effects of the above factors should also have a clear
influence on the degree of certainty the subjects will be given for their responses.
GENERAL METHODS
1. Materials
The same piece by J. S. Bach - Allegro Assai of the Sonata for violin solo in C major BWV 1005 - was
employed. The first part of the piece (bars 1-42) and the sequences requested for the four different
experiments were played by Mira Glodeanu on a barock violin and recorded on a DAT. For playback
during the tasks, a cassette player and 2 Yamaha loudspeakers powered by a Denon amplifier were
employed.
Four different series of items were prepared, one for each experiment. They are based on the two
parent-motifs of the piece (motifs A and B, see Cambouropoulos, figure 2, this symposium) and some
of their variations.
[ Link to Cambouropoulos figure 2 - use the browser back button or the link to return here]
Two series presented two-bar items, and two others employed one-bar items. These were made up by
dividing the two-bars items in two sequences of one bar. Consequently, an equal number of first and
second bar items appear in the series of one-bar.
As in the preliminary experiment, each serie was built up in three parts: HEARD (H), i.e. items taken
in the first part of the piece, already heard in the acquisition phase ; UNHEARD (UH), i.e. items
borrowed in the second part of the piece, which the subjects did not listen to; MODIFIED (M), i.e.
items containing a small rhythmic or pitch modification.
Criteria chosen to set up the modified items
The effect of modifications introduced in the original text was studied in relation with rhythm
modifications in Experiments I and III and pitch modifications in Experiments II and IV. All these
modifications were introduced in the items of the part HEARD.
a) rhythm modifications : As in the preliminary experiment, one complete beat of the item is modified,
but no changes were introduced in the first beat, because some of the items have only three

semi-quavers in that beat and also because changes located at the very beginning of the item would
have induced too easy responses. The modified beat is replaced by :
2 quavers
1 quaver + 2 semi quavers
1 dotted quaver + 1 semi quaver
or 1 crotched
The original pitches were always preserved.
In the one-bar items, modifications occured an equal number of times on the second and the third
beat, using an equal number of times the one of the other type of modification planned.
In the two-bar items , (i) a complete set of items presented one single modification bearing on the first
beat of the second bar; (ii) another set was designed by the "addition" of bars 1 and 2 of the sets of
one-bar items. Modifications did intervene alternatively on the second and third beat.
b) pitch modifications : They altered only one single note defined in order to meet an equal number of
times the first, the second, the third and the fourth semi-quaver of each bar of the original item. But it
was decided that the three first semi-quavers of an item would not be changed to avoid too easy
responses. Modifications appeared an equal number of times and are located as follows :
1st beat modified on the 4th semi-quaver
2nd beat modified on the 1st or 3rd semi-quaver
3rd beat modified on 2nd semi-quaver
In the one-bar items, the whole set of items of the HEARD part has been modified according to this
plan.
In the two-bar items , three different sets are built by "addition" of the previous one-bar items: (i) one
set with changes bearing in the second bar, and nothing in the first one; (ii) another set with the
contrary, i.e. no changes in the second bar, but changes in the first bar ; (iii) a final set with changes in
both bars of the item.
2. Participants
600 adult subjects - 300 musicians and 300 non-musicians - took part in the four experiments. The
musicians were students of Royal music conservatories in Belgium and the non-musicians were
students of post secondary schools. They were between 18 and 25 years of age (average = 22).
3. Procedure
Subjects were tested in groups. The instructions were given on a form, stating explicitly the tonality of
the piece. Subjects had 10 minutes to read and ask questions. It was explained that they would be
asked to listen to the first section of the Bach piece - twice, six or ten times, depending their condition
- followed by a set of small items taken from that section, or from the non-heard section, or slightly
modified, and that their task was to respond, for each item, if they had already heard it or not. In
addition, it was asked to indicate their degree of certainty in their responses. They received a response
form to be completed. Those forms displayed a number of items in accordance with the experiment :
48 for Experiment I and II; 30 for Experiment III; 40 for Experiment IV. The degree of certainty for

each item was symbolised by an horizontal line of 10 cm. Subjects were invited to draw a small
vertical line at the point corresponding to their feeling of certainty.
RESULTS AND COMMENTS

1. Global results (n = 600)
62% correct responses (sd = 12%) were recorded globally for the four experiments, i. e. 75% for H
items (sd = 17%), 50% for UH (sd = 20%), and 63% for M items (sd = 19%).
(i) A one-way ANOVA has been performed on each main factor.
• For Factor T - Musicians vs Non-Musicians -, it appeared that Musicians performed significantly
better: H = 74 vs 67% [F(1,598)=22, p <.001]; UH = 58 vs 51% [F(1,598)=22, p <.001]; M = 69 vs
56% [F(1,598)=86, p <.001].
• For Factor F, a decreasing quality of the results was observed when the number of previous
listenings were reduced. Respectively, for 10, 6 and 2 listenings: H = 75, 70 and 67% [F(1,597)=13, p
<.001]; UH = 58, 52 and 52% [F(1,597)=6, p =.004]; M = 68, 62 and 57% [F(1,597)=18, p <.001].
• Interesting effects were observed for the Factor E. The total percentages of correct responses
recorded, respectively for Experiments I, II, III and IV, are 62, 57, 66, 61% [F(1,596)=14, p <.001],
which demontrates that longer items providing more information to store (I vs III; II vs IV) were
better performed. In addition, the percentages observed for the M items in experiments I, II, III, IV are
respectively 71, 57, 66, 56% [F(1,596)=14, p <.001], thus better performances are systematically
noticed for rhythm than pitch modifications (71 vs 57% and 66 vs 56%).
(ii) With a three-way ANOVA (2 x 3 x 4) on the main factors (T, F, E), similar significant results
have been observed { T : [F(1,582)=143, p <.001]; F : [F(2, 582)=41, p <.001]; E : [F(3,582)=20, p
<.001] }.
Musicians performed better for Experiments III and IV [F(3,582)=2.9, p =.03]: this point might
demonstrate that a more extended amount of information to process favored those who have the most
appropriate skills. On the other hand, the increased number of familiarization listenings lead to better
performances [F(6,582)=3.1, p =.006], but this was mainly the case for the Experiments III and IV,
and especially for Experiment III: rhythm modifications were again more efficient.
(iii) The three different types of items (H, UH, M) influenced significantly the results. The mean ranks
recorded by a Friedman test were respectively 2.4 (H), 1.6 (UH), and 2.0 (M): c2= 154, p<.0001.
When the data from the Musicians and the Non-Musicians are considered separetely, a similar effect
is observed : respectively mean ranks 2.3 (H), 1.6 (UH), and 2.1 (M): c2= 72, p<.0001; and 2.4 (H),
1.7 (UH), and 1.9 (M): c2= 91, p<.0001. These results seem to indicate clearly the expected prototype
effect, since in all cases the UH items were not correctly performed and were accepted as already been
heard.
2. Results of Experiment I (n = 150)
62% correct responses (sd = 9%) were observed globally, i. e. 70% for H items
(sd = 15%), 45% for UH (sd = 17%), and 71% for M items (sd = 17%).
(i) A one-way ANOVA on T and F showed that :

• Musicians perform significantly better except for UH items: H = 74 vs 65% [F(1,148)=14, p <.001];
UH = 46 vs 44% [F(1,148)=.6, p =.43]; M = 76 vs 66% [F(1,148)=13, p <.001].
• F did not influence significantly the results, except for H items [F(1.147)=6, p <.005].
(ii) The main effects (T and F ) tested by a two-way ANOVA recorded a significant effect of T
[F(1,144)=24, p <.001], but not for F.
For the H items, the main effects were significant, Musicians performed better [F(1,144)=16, p
<.001], and the number of previous listenings have produced higher results [F(2,144)=6, p =.002].
No effect was reported for UH, and for M items, only the effect of T was significant [F(1,144)=14, p
<.001].
(iii) The Friedman test recorded a strong influence of the type of items. The mean ranks are
respectively 2.3 (H), 1.3 (UH), and 2.5 (M): c2= 124, p<.0001. Considering the data from the
Musicians and the Non-Musicians separetely, a similar effect is observed : respectively mean ranks
2.2 (H), 1.3 (UH), and 2.6 (M): c2= 69, p<.0001; and 2.3 (H), 1.3 (UH), and 2.4 (M): c2= 57,
p<.0001. The best results were thus recorded for the M items, followed by the H items; as expected,
regarding the prototype effect, UH items were less successful.
3. Results of Experiment II (n = 150)
Globally, 57% correct responses (sd =10%) were observed, i. e. 65% for H items (sd = 14%), 50% for
UH (sd = 18%), and 57% for M items (sd = 18%). Already at this point of the analysis, it is observed
that the pitch modifications were less well perceived than the rhythmic ones.
(i) A one-way ANOVA on T and F showed that :
• Musicians performed significantly better than non-musicians except for HEARD items: H = 64 vs
65% [F(1,148)=.2, p <.64]; UH = 56 vs 44% [F(1,148)=18, p <.001]; M = 65 vs 49% [F(1,148)=35, p
<.001].
• F had no influence on the results, except for M items [F(1.147)=11, p <.001].
(ii) The main effects (T and F ) tested by a two-way ANOVA were strongly significant : T
[F(1,144)=37, p <.001]; F [F(2,144)=9 p <.001]. Nevertheless, the mean percentages for 10, 6, and 2
previous listenings are respectively 68, 59, 58% for the Musicians and 55, 55, 49 % for the
Non-Musicians. Thus the progress is located between 6 and 10 listenings for Musicians and between 2
and 6 for Non-Musicians [F(2,144)=5 p <.006].
Considering the types of items, it is observed that for the H items, the main factors did not show any
effect. On the contrary, they were strongly significant for the UH and the M items. The results are,
respectively,
T [F(1,144)=19, p <.001]; F [F(2,144)=2.5 p =.009] for the UH items, and
T [F(1,144)=42, p <.001]; F [F(2,144)=14 p <.001] for the M items.
(iii) Again, the Friedman test did indicate a strong influence of the type of items: mean ranks
respectively 2.4 (H), 1.6 (UH), and 2.0 (M): c2= 47, p<.0001. For Musicians' and Non-Musicians' data
considered separetely, a similar effect was observed : respectively mean ranks 2.2 (H), 1.6 (UH), and
2.2 (M): c2= 16, p=.0003; and 2.6 (H), 1.6 (UH), and 1.9 (M): c2= 40, p<.0001. In comparison with
the Experiment I, regarding the prototype effect, UH items were again less successful. In addition,

these results show an influence of the modified musical parameter: as expected the rhythmic errors
were better perceived than the pitch errors.
4. Results of Experiment III (n = 150)
This experiment employed two-bar items including two different types of rhythm modifications : M1
= one modification in the second bar; M2 = two modifications, i.e. one in each bar.
66% correct responses (sd =13%) were observed globally, i. e. 76% for H items (sd = 19%), 57% for
UH (sd = 19%), 57% for M1 (sd = 23%) and 76% for M2 (sd = 21%).
(i) A one-way ANOVA on T and F has shown that :
• Musicians performed significantly better. Nevertheless for H items, the difference is not statistically
significant: H = 78 vs 73% [F(1,148)=3.4, p =.07]; UH = 60 vs 53% [F(1,148)=6, p =.02]; M1 = 63 vs
51% [F(1,148)=12, p =.001]; M2 = 82 vs 69% [F(1,148)=15, p <.001].
• F influenced the results in all respects:
H = 82, 77 vs 68% [F(1,147)=7, p =.001]; UH = 64, 54 vs 53% [F(1,147)=5, p =.007]; M1 = 68, 52 vs
52% [F(1,147)=8, p <.001]; M2 = 82, 75 vs 70% [F(1,147)=4.7, p =.01].
ii) The main effects (T and F ) were strongly significant, as shown by the two-way ANOVA: T
[F(1,144)=28, p <.001]; F [F(2,144)=19 p <.001]. F was more effective for the Musicians: the mean
percentages recorded for 10, 6, and 2 previous listenings were respectively 82, 69, 62% for the
Musicians and 66, 60, 59 % for the Non-Musicians [F(2,144)=4.4 p <.01].
For the different types of items considered separetely, the main factors (T and F ) remain significant.
A more important effect of F was observed in Musicians for the M2 items [F(2,144)=3.2, p =.04]
which might indicate an influence of two-bar items modified rhythmically twice.
(iii) As for Experiment I and II, the Friedman test has indicated a strong influence of the type of items:
mean ranks respectively 3.0 (H), 2.0 (UH), 2.0 (M1) and 3.1 (M2): c2= 99, p<.0001. The best results
were recorded for the H items and the M2 items.
When the items are modified once in bar 2, they were less rejected : the absence of changes on the
head of the motif had a strong effect on the responses. For Musicians' and Non-Musicians' data
considered separetely, similar effects were observed : respectively mean ranks 2.9 (H), 1.9 (UH), 2.0
(M1) and 3.2 (M2): c2= 56, p<.0001; and 3.0 (H), 2.0 (UH), 2.0 (M1) and 3.0 (M2): c2= 46, p<.0001.
This analysis corroborates the observation made above showing that the H and the M2 items in both
groups of subjects received the best responses.
5. Results of Experiment IV (n = 150)
This experiment employed also two-bar items; three different types of pitch modifications were
included: M1 = in 2nd bar; M2 = in 1st bar; M3 = 2 modifications: one in each bar.
61% correct responses (sd =14%) were observed globally, i. e. 72% for H items (sd = 19%), 66% for
UH (sd = 18%), 44% for M1 (sd = 21%), 59% for M2 (sd = 25%) and 65% for M3 (sd = 25%).
(i) The one-way ANOVA on T and F has shown that :

• Musicians perform significantly better. Nevertheless for M1 items, the difference is not statistically
significant: H = 78 vs 67% [F(1,148)=20, p <.001]; UH = 70 vs 62% [F(1,148)=7.5, p =.007]; M1 =
45 vs 43% [F(1,148)=0.2, p =.62]; M2 = 71 vs 48% [F(1,148)=37, p <.001]; M3 = 76 vs 53%
[F(1,148)=57, p <.001].
• F influenced the results except for UH and M1 items: H = 78, 72 vs 68% [F(1,147)=5, p <.001]; UH
= 70, 65 vs 62% [F(1,147)=2.7, p =.07]; M1 = 49, 42 vs 41% [F(1,147)=2.0, p =.13]; M2 = 71, 55 vs
52% [F(1,147)=10, p <.001]; M3 = 71, 65 vs 58% [F(1,147)=4.4, p =.01].
ii) The two-way ANOVA has shown that T and F are strongly significant, except for the M1 items.
On the global percentages we observe T [F(1,142)=60, p <.001]; F [F(2,142)=16, p <.001].
(iii) The influence of the type of items analyzed by the Friedman test recorded worst results for the
M1 items: mean ranks 3.7 (H), 3.2 (UH), 1.9 (M1), 2.8 (M2) and 3.3 (M3): c2= 121, p<.0001. Similar
effects were observed for Musicians and the Non-Musicians data considered separetely.
6. Degrees of certainty
(i) A one-way ANOVA on each main factors (T, F and E ) has shown that :
• Musicians gave generally higher degrees than Non-Musicians, both for correct (C) and incorrect (I)
responses { C = 7.2 vs 6.3 [F(1,598)=49, p <.001]; I = 6.4 vs 6.0 [F(1,598)=9, p =.004] }.
• For C responses , but not for the I responses, the degrees increased with the number of opportunities
to listen to the piece {C = 7.1, 6.8 vs 6.3 [F(1,597)=13, p <.001] }.
ii) With a three-way ANOVA (2 x 3 x 4) on the three main factors
(T, F, E), similar significant results were observed { T : [F(1,582)=49, p <.001]; F : [F(2, 582)=12, p
<.001]; E : [F(3,582)=12, p <.001] }.
Musicians were given higher degrees the more previous listenings had been heard: mean degrees for
10, 6, and 2 listenings respectively 7.6, 7.0, 6.4 for the Musicians and 6.2, 6.3, 6.1 for the
Non-Musicians [F(2,582)=9, p <.001]. Thus subjects' confidence in their responses was related to the
familiarization length.
In addition, it appeared that the Familiarization listenings induced more progress for Experiments I
and II: more stable degrees were reported for III and IV [F(6,582)=4.4, p <.001].
CONCLUSIONS
As already noticed in the preliminary study (Deliège, 1997), the results reveal an Imprint effect
operational for all subjects. The Friedman tests established that the weakest performances were
recorded for the UNHEARD items which were accepted as already been heard during the
Familiarization listenings. Concerning the factor TRAINING, musicians gave better responses, a
result which corroborates previous remarks: « Non-musicians had less capacity to counteract the
effect of imprint by calling on an understanding of explicitly analytic and syntactic associations. But
associations of this kind were not always sufficient for the musicians either; the effect of imprint
supplanted acquired knowledge in many instances. » (ibid. p. 63). An additional Training effect was
also observed for the two-bar items : Musicians performed better with longer items (Exp. III & IV)
showing that an extended amount of information only favored trained subjects.
Familiarization was not considered in the previous study. Broadly speaking, performances are better
when the number of familiarization listenings are increased. However, some specific results showed

that this effect was more effective for Musicians: again trained subjects had a clear advantage on this
point and particularly for the longer items which emphazises the remark made above about the same
items.
The two-bar items, in relation with the Modified items, lead to new interesting observations. Indeed,
the M1 items of Experiment III and IV did not collect very good responses. Those items were only
modified in the second bar which might signify an unexpected effect of absence of modification of the
head (beginning) of the sequences. Pattern recognition was made on the basis of the very beginning,
and subjects did not pay attention on what happened afterwards. This particular impact of the "heads"
of the units was already reported by Anderson in his ACT* model (1983, pp. 52-53), in relation to a
study by Horowitz (1968).
Another aspect investigated in this study concerned the degrees of certainty given by the subjects for
their responses. The three factors (T, F, E) had a significant effect. TRAINING favored confidence in
the responses as well for Correct than Incorrect responses, and FAMILIARIZATION lead to higher
degrees especially for one-bar items.
Considering finally the aspect of style of the piece, which should influence the responses recorded for
the MODIFIED items, it was again observed that the imprint , incorporating an average value of the
main stylistic characteristics, is rather rapidly formed after the cue abstraction . In general, the M
items were rejected as having not been heard previously. But, when comparing rhythm and pitch
modifications (i.e. results of Experiments I vs II, and III vs IV), it was observed that rhythm
modifications were much more efficient to induce rejection. These results clearly emphazise the
already cited Schoenberg's statement « The preservation of the rhythm allows extensive changes in the
melodic contour » (1967, p.30).
REFERENCES
Anderson, J.R. (1983) Architecture of Cognition, Cambridge, Mass., Harvard University Press.
Barsalou, L.W. (1983) Ad hoc categories. Memory and Cognition, 11, 211-227.
Barsalou, L.W. (1990) On the indistinguishability of exemplar memory and abstraction in category
representation. In T.K. Srull & R.S. Wyer Jr (eds), Advances in social cognition, vol. 3, Hillsdale, NJ,
Lawrence Erlbaum, p. 61-88.
Deliège, I. (1997) Similarity in processes of categorization. In Proceedings of SimCat 1997,
Edinburgh University, 59-65.
Goldstone, R.L. (1995) Mainstream and avant-garde similarity. Psychologica Belgica, 35, 145-165.
Goodman, N. (1972) Seven strictures on similarity. In H. Goodman (ed.) Projects and Problems. New
York, Bobbs-Merrill, p. 437-447.
Hampton, J.A. (1997) Similarity and Categorization. In Proceedings of SimCat 1997, Edinburgh
University, 103-109.
Horowitz, L.M., White, W.A., & Atwood, D. W. (1968) Word fragments as aids to recall : the
organization of a word. Journal of Experimental Psychology, 76, 219-226.
Keil, F. C. (1987). Conceptual development and category structure. In U. Neisser (ed.), Concepts and
conceptual development : Ecological and intellectual factors in categorization.. Cambridge,
Cambridge University Press. pp. 175-201.

Medin, D.L. & Schaffer, M.M. (1978) Context theory of classification learning. Psychological
Review, 85, 207-238.
Minsky, M. (1975) A framework for representing knowledge. In P.H. Winston (éd.) The psychology
of computer vision. New York, McGraw-Hill.
Schank, R.C. & Abelson, R.P. (1977) Scripts, plans, goals and understanding. Hillsdale, N.J.,
Lawrence Erlbaum.
Schoenberg, A. (1967). Fundamentals of musical composition., G. Strang et L. Stein (Eds) Londres,

Faber & Faber.
Smith, L. B. (1981). Importance of the overall similarity of objects for adults and children's
classifications. Journal of Experimental Psychology: Human Perception and Performance, 7 (4),
811-824.
Wertsch, J.V. (1985). The semiotic mediation of mental life: L.S. Vygotsky and M.M. Bachtine. In E.
Mertz & R.J. Parmentier (eds) Semiotic mediation: sociocultural and psychological perspectives. New
York, Academic Press.
Back to index

THE INFLUENCE OF PHRASE CUES ON CHILDREN'S MELODIC RECONSTRUCTION
Proceedings paper

Steven M. Demorest, School of Music, University of Washington
Demorest@u.washington.edu
Numerous studies have established the importance of phrase cues in melodic perception and recognition. We know that listeners use
phrase cues to help make sense of musical structure (Gregory, 1978; Lerdahl & Jackendoff, 1983; Sloboda & Gregory, 1980), to
learn new songs (Sloboda, 1977, Sloboda & Parker, 1985), and to recognize previously learned songs (Tan, Aiello & Bever, 1981;
Chiappe & Schmuckler, 1997). We know less about when such strategies develop. Since not all of the world's music has the same
phrase structure, we can assume that a certain amount of our response to musical phrase information is learned at some point, but
when?
A recent study of children's melodic memory found that children as young as 7 years old use phrase cues as a grouping strategy in
their memory for melodies even when they are taught the song as an entire unit (Demorest, 1999). Children from grades k-1 and
grade 4 were asked to reconstruct two previously-learned songs by putting four melody blocks in the right order. In one condition the
melody blocks matched the phrases of the melody, in the other, the blocks were of similar length, but divided against the phrase
break. All of the children reconstructed a regular and a irregular set of blocks. The hypothesis was that if phrases cues were important
in children's melodic representations then it should be harder to reconstruct melody blocks that did not match their internal
representations. There was a significant difference in the number of times subjects had to listen to the blocks and move the blocks
with the irregular condition requiring more listenings and more moves to complete the task. This was true for both the younger and
the older children, though older children required fewer operations overall to complete the task in either condition.
In this earlier study the children were taught two common (though unfamiliar) children's melodies in English to maximize the
ecological validity of the song-learning task. A number of studies have found evidence for the integration of text and melody in the
musical memory of adults (Crowder, Serafine, & Repp, 1990; Serafine, Crowder & Repp, 1984; Serafine, Davidson, Crowder &
Repp,1986) and children (Chen-Hafteck, 1999; Feierabend, Saunders, Holahan, & Getnick,1998; Morrongiello & Roes, 1990). The
relationship between text and melody seems to be complex, and may be affected by subject experience, the nature of the memory
task, and the subjects' culture. For children, text seems to be a crucial dimension in melodic recognition, and plays a significant role
in melodic memory (Feierabend, Saunders, Holahan, & Getnick,1998; Morrongiello & Roes, 1990). It is possible that the children in
my earlier study responded more to text phrase cues than musical phrase cues to help them memorize the songs, and then used that
information when reconstructing the melody. This study investigates the role of musical phrase cues in children's melodic memory in
the absence of meaningful text phrase cues.
Method
The participants were 39 children aged 6-11 years from the Northwestern United States. They were taken from kindergarten (n=9),
grade 3 (n=15), and grade 5 (n=15) of a local elementary school where they received general music instruction twice a week for 30
minutes. All of the children were taught two standard children's songs by their music teacher using the whole song or immersion
method of rote teaching (i.e. not phrase by phrase) as a regular part of their music class. The two songs were Appalachian folk
melodies, but both songs were set to an unfamiliar language (Maori) to remove text cues. This was considered the best solution to the
issue of text cues since learning songs without text (e.g. on "la") would be quite unnatural for children this age, whereas learning
songs in another language is not uncommon.
Phrase memory was tested using a reconstruction paradigm in which subjects are asked to demonstrate their memory for a melody by
reconstructing a familiar four phrase melody on a computer from four pieces or "blocks" that were placed out of order (Figure 1).
Children clicked on a block to hear the melody fragment it contained and then placed the blocks in a left-to-right sequence to
reconstruct the song. The reconstruction approach is more involved than simple recognition, but does not rely on children's
performance skills in testing their melodic recall.
Each subject performed the reconstruction task on the two melodies they had been taught in music class. In the regular phrase
condition, the four melody blocks represented the four phrases of the melody while in the irregular phrase condition, the blocks were
broken either before or after the natural phrase break. Figure 2 shows the regular and irregular divisions for Melody 1. The number of
notes per block is very similar, but the difference is in the placement of the division. Thus if the memory task were simply a matter of
remembering notes, there should be no difference in task difficulty between the two conditions. If however, children rely on
structural information such as phrases as a memory aid, then the irregular condition would not match their internal melodic
representation, and should be more difficult to reconstruct. All children reconstructed one melody in each condition. As with the
earlier study, the hypothesis was that if phrase groupings were important in melodic memory, it should be more difficult to
reconstruct the irregular phrase groupings than the regular. This difficulty would be reflected statistically as a within-subject
file:///g|/Sun/Demorest.htm (1 of 5) [18/07/2000 00:31:05]

difference in the total number of times subjects had to check the blocks, or block sequences, in solving the puzzle. In addition, the
study looked at between-subject differences in performance by age and participation in private music lessons.
Figure 1. The reconstruction task using Impromptu (Bamberger & Hernandez, 1992-2000).
Figure 2. One of the test melodies in regular and irregular phrase groupings shown with the Maori text.
Results.
The primary question of the study was the influence of melodic grouping on children's reconstruction performance in the absence of
text cues. Repeated measures analysis of variance revealed a significant difference in performance between the regular and irregular
phrase conditions. Subjects had to listen an average of 2.67 more times to reconstruct the irregular melody blocks than the regular
melody blocks [F (1, 33) = 12.77, p=.001]. This was true regardless of the age of the subject. Figure 3 shows the mean scores by age
group for the regular and irregular phrase conditions. There were no significant between subject differences due to age [F (2, 33) =
0.69, p=.507] or musical training [F (1, 33) = 0.03, p=.864], but there was a significant age by training interaction [F (2, 33) = 4.74,
p<.05]. This interaction can be seen in the graph in Figure 4 which demonstrates that while students with private training in grades
three and five needed fewer hearings overall to complete their reconstructions, kindergartners with private training performed worse

than their untrained counterparts.
Figure 3. Mean number of hearings required by each age group in the two phrasing conditions.
Figure 4. Mean number of hearings required for subjects with and without private training.
Discussion
Reconstructing melodies whose segments do not correspond to melodic phrase breaks is a more difficult task for children even
without the benefit of meaningful text cues to aid in segmenting the melody. This suggests that children as young as 6 years old are
recognizing and employing purely musical phrase cues in song acquisition and memorization. In fact, when the performance of these
students was compared with those of the earlier study, they actually required fewer moves overall to reconstruct the same melody in
either condition. This difference may be due to a number of factors including the quality of music instruction at the different schools
and other aspects of musical background, but it does suggest that the lack of text didn't impair student's performance. Indeed some
research has suggested that simultaneous presentation of text and music can actually hamper song-learning and song performance
(Goetze, 1986; Levinowitz, 1989; Welch, Sergeant, & White, 1995/1996). Future research should directly compare student's
reconstruction performance with and without text cues in song-learning.
The lack of an age-related difference in overall performance contradicts the findings of the earlier study (Demorest, 1999) and is not
consistent with overall developmental improvements memory. Perhaps the lack of a meaningful text reduced the memory advantage
for older students in the study, forcing them to rely on their musical memory alone. It would be interesting to see if age-related
differences in melodic memory are dependent on text conditions in future studies.
The interaction between age and private study is not too surprising given the relatively short time that the kindergartners had been
studying privately. It is unlikely that any instructional benefits would be present after such a short time, and it is unusual for most
children to be receiving lessons at that young age regardless of their ability. The improvement in performance for older children may
indicate the benefits of private training, or simply reflect the natural tendency of students with musical interests or aptitude to seek
extra instruction.

There was one other issue which should be considered in interpreting the data. As with the earlier study, some students could not
successfully complete the task. In this study 50% of the kindergartners tested could not reconstruct both melodies successfully. There
was however, no pattern regarding whether they failed on the regular or the irregular phrase condition and some failures were do to
stopping voluntarily, difficulties using the computer and attention span. As other studies have found, young children often have
difficulty with melodic memory tasks (Feierabend, Saunders, Holahan, & Getnick,1998). It was interesting to note that the average
age of the subjects who did not successfully complete both conditions was 2 months younger than those who did and included all of
the subjects under age six (3).
The failure rate of the youngest subjects might also be reflective of a greater memory challenge involved in learning a song without
the benefit of meaningful text cues. This would be consistent with the results of a study of preschooler's melodic recognition
performance, that found that learning songs with text yielded a better performance on a subsequent recognition task (Feierabend,
Saunders, Holahan, & Getnick,1998). Perhaps students who couldn't use musical phrase cues to help them memorize the song were
simply unable to reconstruct any melody, regardless of phrase condition, though they could sing the songs from memory as a group in
class.
The reconstruction paradigm was created to attempt to address some of the shortcomings of recognition and recall methodologies
(Chen-Hafteck, 1999; Sloboda & Parker 1985). In the future, students' performance on reconstruction tasks should be compared
directly to their performance on these other memory measures to determine the relative difficulty and concurrent validity of the
various tasks for measuring children's melodic memory. The reconstruction paradigm may be a more challenging memory task for
young children by virtue of the need to manipulate and sequence musical segments or the presentation of all songs without text.
The songs students learned for this study had a text, but the Maori text offered no linguistic cues to aid in phrase identification. While
some studies have demonstrated that text can be an important aid in song learning for young children (Feierabend, Saunders,
Holahan, & Getnick,1998; Morrongiello & Roes, 1990), children may also be able to respond to purely musical cues when
segmenting melodies in long-term memory. Continued research on the interactions of musical cues and text cues in children's
melodic memory has the potential to reveal much about the nature of musical memory and its development. It may also lead to better
teaching strategies for developing children's musical thinking.
References
Bamberger, J. S., & Hernandez, A. (1992-2000). Impromptu. [Software, Version 1.0.2] New York: Oxford University Press.
Chen-Hafteck, L. (1999). Discussing text-melody relationship in children's song-learning and singing: A Cantonese-speaking
perspective. Psychology of Music, 27, 55-70.
Chiappe, P., & Schmuckler, M.A. (1997). Phrasing influences the recognition of melodies. Psychonomic Bulletin and Review, 4,
254-259.
Crowder, R. G., Serafine, M. L., & Repp, B. (1990). Physical interaction and association by contiguity in memory for the words and
melodies of songs. Memory and Cognition,18, 469-476.
Demorest, S. M. (1999). The role of phrase groupings in children's memory for melodies: A preliminary study. In Suk Won Yi (Ed.)
Music, Mind, and Science (Seoul: Seoul National University Press) p. 410-419.
Feierabend,J. M., Saunders, T.C., Holahan, J.M., & Getnick, P.E. (1998). Song recognition among preschool age children: An
investigation of words and music. Journal of Research in Music Education, 46, 351-359.
Goetze, M. (1986). Factors affecting accuracy in children's singing. Dissertation Abstracts International, 46, 2955A.
Gregory, A. H. (1978). Perception of clicks in music. Perception and Psychophysics, 24, 171-174.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press.
Levinowitz, L. (1989). An investigation of preschool children's comparative capability to sing songs with and without words. Bulletin
of the Council for Research in Music Education, 100, 14-19.
Morrongiello, B. A., & Roes, C. L. (1990). Children's memory for new songs: Integration or independent storage of words and tunes?
Serafine, M.L., Crowder, R.G., & Repp, B.H. (1984). Integration of melody and text in memory for songs. Cognition, 16, 285-303.
Serafine, M.L., Davidson, J., Crowder, R.G., & Repp, B.H. (1986). On the nature of melody-text integration in memory for songs.
Journal of Memory and Language, 25, 123-135.
Sloboda, J. A. (1977). Phrase units as determinants of visual processing in music reading. British Journal of Psychology, 68, 117-124.
Sloboda, J. A., & Gregory, A. H. (1980). The psychological reality of musical segments. Canadian Journal of Psychology, 34,
274-280.
Sloboda, J. A., & Parker, D. H. (1985). Immediate recall of melodies. In P. Howell, I. Cross & R. West (Eds.), Musical Structure and
Cognition (pp. 143-167). London: Academic Press.

Tan, N., Aiello, R., & Bever, T. G. (1981). Harmonic structure as a determinant of melodic organization. Memory and Cognition, 9,
533-539.
Welch, G. F., Sergeant, D. ., & White, P. J. (1995-1996). The singing competencies of five-year-old developing singers. Bulletin of
the Council for Research in Music Education, 127, 155-162.
Back to index

DISCUSSING CROSS-CULTURAL ISSUES IN CHILDREN'S SONG-LEARNING AND SINGING
Proceedings paper
DISCUSSING CROSS-CULTURAL ISSUES

IN CHILDREN'S SONG-LEARNING AND SINGING
Lily Chen-Hafteck
ASME, University of Surrey Roehampton
Background
With the rapidly increasing travels and emigration of people around the world, we are constantly in
contact with people from different cultures. In order to help us to gain a better understanding of the
people around us, the diversified cultural backgrounds of people from different countries have become
an important issue in research. Researchers in music and music education have been very concerned
with how the music, musical behaviour, and music learning are similar or different in various cultures.
To increase our knowledge in cross-cultural issues in music can help us to enhance not only our
understanding of the people from different cultures, but also our understanding of music from a
broader perspective.
Conducting cross-cultural research in music and music education is a complicated process. Music and
musical behaviour in different cultures have to be studied in relation to their contexts. For instance,
we cannot expect African children to learn a song in the same way as the Western children do. Thus,
we cannot impose the Western teaching method on children from different cultures, including African
children, when comparing how African and Western children's learn songs. So the validity of the
research tools employed in studying the different cultures has to be addressed properly. This paper
attempts to raise the complications of cross-cultural research in music education, and to draw the
attention of researchers to them. Emphasis has been laid on children's song-learning and singing. This
is considered as an important universal issue for cross-cultural comparison because almost all children
around the world have to learn to sing songs in their environment.
The present investigation

This paper discusses the various issues in children's song-learning and singing, based on my past
research and experience on Western, Chinese and African children from U.K., U.S.A., Hong Kong
and South Africa respectively. It is hoped that from a comparison of children from these contrasting
cultures, some interesting cross-cultural issues can be highlighted, from which research methods
appropriate for this kind of research can be derived.
Cross-cultural issues related to children's song-learning and singing

Some cross-cultural issues related to children's song-learning and singing have been identified as
having significant influence on children. They should therefore be considered seriously by
researchers.
file:///g|/Sun/chenhaft.htm (1 of 6) [18/07/2000 00:31:07]

a. Meanings of songs and singing in different cultures

The meanings of songs and singing may vary from one culture to another. Such differences can
in turn affect song-learning and singing behaviour. A good example to illustrate this is the
Chinese (Cantonese) words of 'song' and 'poem'. In Cantonese, one of the Chinese dialects
spoken in the southern part of China, there are two characters which make up the Chinese word
of 'poem'. One of them is exactly the character for 'song' (See Figure 1). This shows that in
Cantonese culture, the musical quality of the language or words in poem is recognised and
poems are a kind of songs. Thus, songs and poems seem to belong to the same genre for the
Cantonese, which means that singing and chanting are not considered as distinctly different
matters. Indeed, when asking the Cantonese people to improvise a song, they tend to improvise
a 'chant' in the Western sense. Such 'songs' are usually based on the natural pitch and rhythm of
the words rather than a melody as such. On the contrary, when we look at Western culture,
songs and poems are different, and singing songs and reciting poems are considered as different
activities. Thus, when Cantonese children learn to sing a song and recite a poem, they tend to
perform them in a similar manner, whereas Western children may not do so. This may imply
that for the Cantonese children, music and language are less distinct.
Figure 1. Examples of the Chinese words
Besides the Chinese, the meaning of songs for the Africans is also different from that of the
West. In African culture, songs mean much more than songs in the Western sense, which
implies a piece of music with fixed melodies and words to be sung by human voice. The
traditional African songs do not have fixed melodies and one can always improvise and make
variations to the melodies. Thus, very often, we can find that a traditional African song will not
be sung in exactly the same way by two people, or by the same person at different times.
Furthermore, music and dance are integrated in African songs, and one cannot sing an African
song authentically without the accompanying movement and dance. All these cultural
differences should be noted when comparing African and Western songs and singing.
The two examples above show that we need to be cautious when carrying out cross-cultural
comparison of children's song-learning and singing. As we can see, the meanings of songs and
singing are not universal. Obviously, this can exert effects on children's singing behaviour.
When asking a Chinese child to sing, he/ she sings with the musical quality of the language,
like reciting a poem. When asking an African child to sing, he/ she sings with creativity in the
music and combines the singing with natural body movement. Therefore, it is important to take
into consideration the meanings of songs and singing for children from different cultures before

we can make any comparison.

b. Formal and informal music learning in different cultures
Research in music education tends to emphasize the formal music learning, that is, the teaching
and learning of music at school. The informal music learning that occurs in the environment of
children is often ignored. However, it is evident that such informal learning has a strong
influence on children's musical skills and ability.
A good contrast in formal and informal music learning can be found in U.K. and South Africa.
The formal music education at school in U.K. is very well-developed in terms of the design of
curriculum and methods of instructions in comparison to that in South Africa. The National
Curriculum in U.K., which has been practised for years at British schools, places great
emphasis on developing creativity in music and the musical skills of listening, performing and
composing. However, the music lessons in majority of the South African schools are still
following a very traditional way of music instruction, which is mainly the learning of songs by
imitation. If formal music education has a strong influence on the acquisition of musical skills
and ability, the British children should have a higher achievement in music than the South
African children. But in fact, most South African children are more spontaneous and skillful
than most British children in singing and dancing, improvising songs and movements. This tells
us that looking only at formal music learning is not adequate. A lot of musical learning,
especially among the African children, takes place outside school setting. As we all know,
music, singing, and dancing are part of the African social life. Moreover, many African mothers
always sing to their children. All these extensive exposure to music and music-making activities
in their environment form an important part of the musical experience of the African children,
through which they acquire most of their musical skills and ability. Therefore, when conducting
cross-cultural research, besides formal music education, we need to take into account the
informal music learning experience as well, which can play a significant role in children's
music learning.
c. Singing styles of children in different cultures
The differences in the singing styles of children from different cultures have been demonstrated
by the computer pitch analysis data of children's singing. (Chen-Hafteck, 1999a): Chinese
children sing with a more detached style, and African children sing with more glissando, when
compared with the British children. Moreover, Chinese and African children were found to
have a better use of their singing voice than the American children (Rutskowski &
Chen-Hafteck, 2000). Thus, the singing of children from different cultures can be qualitatively
different. It is difficult for researchers to compare across cultures with quantitative approach,
like a lot of the past research using standardized measurement of singing accuracy designed for
Western children as an indicator of singing ability. Such approach would overlook some
important aspects in singing that is qualitatively different. Moreover, in doing so, there is a risk
of drawing comparison based on culturally biased criteria. Thus, the basis of cross-cultural
comparison has to be considered with caution in order to reach valid and significant conclusion.
d. Effects of language differences on children's song-learning and singing
Different cultures have different languages with different characteristics. Most Chinese and African
languages are tonal languages, which are quite different from English which is a non-tonal language.
In tonal languages, pitch serves a semantic function in the form of linguistic tones by determining the
meaning of the words, which is unlike intonation in non-tonal languages. Thus, the pitch inflections of
these two kinds of languages are very different. As songs consist of words, singing songs in tonal and

non-tonal languages bear different characteristics.

The tonal characteristics of Chinese and African languages have shown to give an advantage in
singing accuracy to children speaking these languages over children speaking a non-tonal language
(Chen-Hafteck, 1998; 1999a; 1999b). This can be explained by the fact that the children speaking
tonal languages have to develop acute pitch discrimination ability at early age. Such ability can
possibly be transferred to their singing skills and help them to sing accurately. Therefore,
characteristics of the language that the children speak may affect song-learning and singing.
Moreover, language characteristics can also have an influence on the effects of text-melody
relationship, more specifically, how texts and melodies of songs interact and influence the ways that
children learn songs. It was argued that text-melody integration exists at different levels, including the
internal potential which are children's linguistic and musical abilities, the external stimuli which are
the songs, and the internal responses which are children's cognitive strategies (Chen-Hafteck, 1999c).
In other words, the relationships between words and music are not merely a matter of external stimuli
that appear in songs. The extent to which words and music are related should also be considered
within children's linguistic and musical abilities as well as the way they cognitively process the
information of the songs. The characteristics of the native languages that children speak can influence
such relationships. Due to the tonal language that involves minute pitch discrimination, the ability in
pitch perception and production of children speaking such language at a relatively good level is
essential not only in music, but also in language. This contrasts with the case of children speaking a
non-tonal language in which such an ability is required more significantly in music than in language.
Therefore, it is possible that children speaking a tonal language have developed a closer relationship
between music and language than children speaking a non-tonal language, and there is a higher degree
of integration of texts and melodies in their linguistic and musical abilities. Moreover, the similarity in
the requirement of pitch ability in music and language can also lead to a higher degree of integration
between texts and melodies in their cognitive strategies. In other words, when the children process the
information of texts and melodies in songs, they tend to process them in an integrated manner.
So, in short, the role played by language characteristics in cross-cultural differences is important and
thus, this is a factor that should not be overlooked.
Implications on research method for the cross-cultural study on children's song-learning and
singing
The issues discussed above illustrate the complexity of cross-cultural research in children's
song-learning and singing. Among children from different cultures such as the Chinese, African and
Western children, there are notable differences in the meanings of songs and singing; the amount and
kind of musical experience from the environment, including at school and at home; the styles of
singing; and the characteristics of native languages. All these issues have an influence on how
children learn and sing songs. Thus, we cannot come to any valid conclusion on cultural differences
unless we have adequate information on these various issues, taking into consideration the context
where the music and musical behaviour occur.
In conclusion, to draw cross-cultural comparison on children's song-learning and singing is a difficult
task because firstly, there is great difficulty in finding measurements of singing that are universally
appropriate and valid. Secondly, the music and musical behaviour of the people from different cultural
backgrounds have to be studied in context. Therefore, the selection of the research methods for such
cross-cultural study should be flexible. Due to the complexity of the issues, incorporating the use of
multiple approaches can help to provide more comprehensive information for cross-cultural
comparison in most cases.

Among the various measurements of singing used in past research for comparing children's singing
collected from different cultures, they fall within the basic categories of human ratings on pitch
accuracy and computer analysis on the sound as recorded from children's singing (Welch, 1994). The
human-based analysis offers us more information on the musical aspects of singing, based on human
perception, but there is a risk of obtaining data, which are biased due to the cultural background of the
raters. Thus, in cross-cultural research, it will help if the raters are selected from the different cultures
under studied. On the other hand, the computer-based analysis provides objective data free of human
bias. It gives precisely all the minute details of different properties of sound such as frequency,
amplitude, and spectrum, which cannot be observed by human. Yet it cannot infer musical meaning to
the sound, a quality which is only possible through the human ear and mind. Therefore, if it is
possible, there should be a balance between the two kinds of analyses. Human assessment is important
to inform us of the phenomena in singing which are significant to the human ear. At the same time,
scientific measurements can provide detailed and reliable data to support and verify human
judgement. Thus, the use of both methods can help to complement the inadequacy of each other.
In addition, it is important to include qualitative analysis of some data on the cultural background of
the children under examination. Such descriptive qualitative data can inform us more about the
possible cultural factors that can affect the findings, and thus, be used to supplement our knowledge of
the issue studied in context. If this aspect of analysis is not considered, the findings will be out of
context and thus, will not be significant.
To study cultural differences in music is an exciting area of research. However, it must be conducted
with caution, considering the various issues in context. If such research is designed and conducted in
an appropriate manner, it can be an important contribution to the knowledge of music and musical
behaviour.
Reference:
Chen-Hafteck, L. (1998) Pitch abilities in music and language of
Cantonese-speaking children, International Journal of Music Education, 31,
14-24.
Chen-Hafteck, L. (1999a) Tonal languages and singing in young children. In S. W.
Yi (Ed) Music, Mind, and Science, pp. 479-494. Seoul, Korea: Seoul National
University Press.
Chen-Hafteck, L. (1999b) Singing Cantonese children's songs: Significance of the
pitch relationship between text and melody, Music Education Research, 1, 1,
93-108.
Chen-Hafteck, L. (1999c) Discussing text-melody relationship in children's
song-learning and singing: a Cantonese-speaking perspective, Psychology of
Music, 27, 1, 55-70.
Rutkowski, J. & Chen-Hafteck, L. (2000) The singing voice within every child: a
cross-cultural comparison of first graders' use of singing voice. Paper to be
presented at the Ninth ISME Early Childhood Seminar, Kingston, Canada, and the
ISME World Conference, Edmonton, Canada.

Welch, G. F. (1994) The assessment of singing. Psychology of Music, 22, 3-19.
Back to index

The Time-Course of Pulse Sensation: Dynamics of Beat Induction
Proceedings paper

Petri Toiviainen, Department of Music, University of Jyväskylä
Joel Snyder, Department of Psychology, Cornell University
Introduction
The ability to infer beat and meter from music is one of the basic activities of musical cognition. It is a rapid process: after
having heard only a short fragment of music we are able to develop a sense of beat and meter and tap our foot along with it.
Even if the music is rhythmically complex, containing a range of different time intervals and probably syncopation, we are
capable of inferring the different periodicities of it and synchronizing to them. A rhythmic sequence usually evokes a number
of different pulse sensations, each of which has a different perceptual salience. The listener can switch the focus of attention
from one to another at will (Jones, Boltz, & Kidd 1982). Furthermore, for a given piece of music, the most salient pulse
sensation can vary between listeners.
The salience of a given pulse sensation depends on a number of factors related to the surface and structural properties of
music. These factors include the frequency of tone onsets that coincide with the pulse (Palmer & Krumhansl, 1990) and the
phenomenal accents of these notes (Lerdahl and Jackendoff, 1983). Phenomenal accents arise from surface properties of
music such as pitch, duration, and loudness. For instance, a long note is usually perceived as more accented than a short one
(Parncutt, 1994).
In addition to the temporal structure of music, pitch information may affect the salience of pulse sensations. Evidence for this
can be found from studies on the interaction between pitch and rhythm. For instance, memory recall of melodies is impaired
if pitch and rhythmic patterns are out of phase (Boltz & Jones, 1986; Deutsch, 1980; Monahan, Kendall, & Carterette, 1987).
In addition to melodic information, harmonic information has been found to be important in deducing the meter (Dawe, Platt,
and Racine, 1994). Snyder and Krumhansl (1999) found that pitch information affected the mode of tapping to ragtime
excerpts: when pitch information was present, the subjects tapped more frequently on the down beat than on the up beat. The
effect of pitch information on other performance measures, however, was not significant.
A further factor that affects the salience of pulse sensation is the pulse period (Fraisse, 1982; Parncutt, 1994; van Noorden &
Moelants, 1999; Clarke, 1999). According to these studies, the most salient pulse sensations have a period of approximately
600 msec, the region of greatest salience being between 400 and 900 msec.
Models of beat induction presented to date have been based on various computational formalisms. These include symbolic
systems (Longuet-Higgins & Lee, 1982), statistical approaches (Palmer & Krumhansl, 1990; Brown, 1993), optimization
approaches (Povel & Essens, 1985; Parncutt, 1994), control theory (Dannenberg & Mont-Reynaud, 1987), and connectionist
models (Desain & Honing 1989; Scarborough, Miller, & Jones, 1992; Gasser, Eck, & Port, 1999), and oscillator models
(Scheirer, 1998; Large & Kolen, 1994; McAuley & Kidd, 1998). All these models, except for the one by Scheirer (1998),
rely solely on the temporal structure, thus ignoring features related to pitch. Scheirer's model uses audio input that is passed
though a bank of band-pass filters.
The present study explores the time-course of pulse sensation and its dependence on various musical features, such as onset
time structure, pitch height, and harmonic structure. A system of resonating oscillators is used to model the process. The
stimuli used and the performance measures obtained from the tapping experiment by Snyder and Krumhansl (1999) were
used to optimize and evaluate the model.
Model of beat induction
The beat induction model used in this study uses pitch and onset time information as input. This information can be obtained
either using a MIDI representation or by preprocessing acoustical input. In the present study, a MIDI representation was
used. The model is based on a set of competing oscillators. Each oscillator represents a pulse sensation evoked by the input.
Oscillators are created dynamically as the music unfolds in time: new oscillators are created at each tone onset, their initial
periods being equal to the interval between that onset and a previous onset. In principle, all possible combinations of starting
points and periods can be considered. In practice, only oscillators that represent a pulse sensation not present up to that
file:///g|/Sun/Toiviain.htm (1 of 6) [18/07/2000 00:31:12]

instant need to be taken into account. If the music is performed and thus contains expressive timing, it is necessary to use
adaptive oscillators (Large & Kolen, 1994; Toiviainen, 1998).
The perceptual salience of each pulse sensation is modeled with the resonance value of the respective oscillator. The
contribution of each tone to the resonance of each oscillator depends on the degree of synchrony between the tone onset and
the oscillator's pulse, the inter-onset interval following the onset, and the pitch of the tone. To study the effect of these
different factors, three different models were used.
Model 1. Model 1 relies solely on the temporal structure of the music. The resonance dynamics are modeled with a damped
system driven by an external force. More specifically, the resonance value of oscillator is determined by
, (1)
where is the driving force, is the damping constant, and is the time constant. The first-order time derivative is
included in order to smoothen the resonance function; for the damping constant, the value is used. The parameter
models the length of the temporal integration window. With the absence of any external force, the resonance value decays
approximately by the factor of during an interval of .
The driving force has the form
, (2)
where is the output of oscillator at the most recent tone onset. According to Equations 1 and 2, the oscillators that are
at the peak of their output start to increase their resonance up to the next note onset. Due to the exponential decay of the
driving force, the increase of the resonance is proportional to the perceived durational accent of the respective tone (Parncutt,
1994). The resonance value of each oscillator is weighted according to its oscillation period: the closer the period is to the
period of most salient pulse sensations, the higher is the weighting.
At each instant, the oscillator with the highest resonance represents the perceived pulse. This oscillator is referred to as the
winner. To model the stability in maintaining the tapping mode observed in tapping studies, the winner is changed only when
the highest resonance value exceeds that of the winner by a switching threshold . In other words, a switch in the tapping
mode occurs when
. (3)
The model produces a tap whenever the winner oscillator has zero phase and its resonance exceeds the tapping threshold
.
Model 2. Model 2 is similar to model 1, with the addition that it takes pitch height into account. It does so by passing the
tone information through a bank of Gaussian filters that are equidistantly spaced on the pitch dimension. This filter bank
divides the input to several pitch channels, for each of which the resonance dynamics scheme is applied separately.
Therefore, the model segregates the input to a set of streams depending on the pitch height. For each pulse mode, the
resonance value is then obtained by summing the resonance values across all the channels. Each pitch channel has an
individual weight that depends on the center pitch of the channel according to
, (4)
where is the center pitch, with 64 corresponding to C4, and is the pitch weighting parameter. When , all channels
receive an equal
weighting; when , low pitches receive a higher weighting than high pitches.
Model 3. Model 3 is similar to model 2, with the addition that it weights the notes according to their tonal significance. It
assumes that tonally significant tones increase the salience of the pulses with which they co-occur more than do less
significant tones. The model uses the key-finding algorithm by Krumhansl (1990), with the modification that it uses an
exponential time window for integrating the pitch information. For each tone, the driving force of equation 2 is weighted by
the value of the respective component of the probe tone profile (Krumhansl & Kessler, 1982) of the current key. Whenever

several notes occur simultaneously, the average of their probe tone profile values is used.
Tapping experiment
Stimuli. The stimuli consisted of seven ragtime excerpts used in Snyder and Krumhansl (1999). Each excerpt had a
metronomic timing and an equalized MIDI velocity. Four versions of each excerpt were used: 1) full pitched, 2) full
monotonic, 3) RH pitched (a pitched version of only the right-hand notes), and 4) RH monotonic (a monotonic version of
only the right-hand notes). A total of 28 stimuli were thus used. The length of each stimulus was ~40 sec.
Subjects. Twelve musically experienced students participated in the tapping experiment. Each subject was asked to tap the
most comfortable pulse of each excerpt.
Performance measures. Six performance measures recorded in the tapping study were used in the present study. These were
1) the beat to start tapping (BST), the proportion of time spent in each of the following tapping modes: 2) on down-beat
(down), 3) on upbeat (up), 4) periodically but neither on the down-beat nor on the up-beat (neither), and 5) aperiodically
(aper); and 6) the number of switches between tapping modes.
Results. It was found that the subjects tapped significantly more on the down-beat and less on the up-beat for the pitched
than for the monotonic versions. For the other performance measures used in this study there was, however, no significant
difference between pitched and monotonic versions. For the RH version, the subjects tapped significantly less on the
down-beat and more on neither or aperiodically than for the full versions. Moreover, with the RH versions there were
significantly more switches than with the full versions.
Optimization of the models
Each of the three models described above was optimized with respect to the following parameters: time constant for temporal
integration, , pitch weighting, , tapping threshold, , and switching threshold, . The optimization was carried out
using the technique of simulated annealing (Kirkpatrick, Gelatt, & Vecchi, 1983) as follows. For a given combination of
parameter values, the six aforementioned performance measures were calculated using each of the 28 stimuli as input. An
error function was defined as the sum of absolute errors between the model's and the humans' performance measures, taken
across the 28 stimuli and the six performance measures. This error function was minimized with respect to the parameter
values.
Each of the three models had approximately equal optimal parameter values. These were , , ,
and . Thus the optimal values were obtained using a temporal integration length of 4 seconds, weighting the
pitches so that the weighting increases by a factor of approximately 1.4 for each descending octave, and accepting switches
only when the maximum resonance exceeds that of the winner by at least 20 percent. The meaning of the optimal tapping
threshold value, , is more difficult to interpret.
Comparison between human and model data
RMS errors. The total RMS error values for the optimized models were 1.78, 1.82, and 1.19 for models 1, 2, and 3,
respectively. In terms of the total RMS error, model 3 thus performed best, followed by model 1 and model 2, in that order.
Table 1 shows the root-mean-square (RMS) errors for each performance measure and model separately. For the performance
measures BST, down, up, and switches, the lowest RMS error was obtained with model 3. The lowest RMS errors for the
performance measures neither and aper were obtained with models 2 and 1,respectively.
TABLE 1. RMS errors between human and model data
BST down up neither aper switches
model1 0.245 0.317 0.432 0.153 0.226 0.770
model2 0.254 0.289 0.373 0.106 0.266 0.897
model3 0.242 0.240 0.312 0.135 0.248 0.354
Correlations. Table 2 shows the correlations between the human and models data for each performance measure and model.
The average correlations, taken across the six performance measures, are 0.417, 0.463, and 0.553 for the models 1, 2, and 3,
respectively. As can be seen, the highest correlation for all performance measures except switches was obtained with model
3. For model 3, all the correlations except that for BST are significant at the p<0.05 level. Figure 1 shows the performance

measures of the subjects and model 3 for each of the stimuli as scatter plots. As can be seen, model 3 can predict the
performance measures down, up, and neither relatively well. Further, the model produces considerably less aperiodic tapping
and slightly more switches between tapping modes than do the subjects.
TABLE 2. Correlations between human and model data
BST down up neither aper switches
model1 0.017 0.550** 0.397* 0.889*** 0.300 0.347
model2 0.087 0.544** 0.423* 0.901*** 0.260 0.563**
model3 0.313 0.687*** 0.510** 0.942*** 0.375* 0.492**
*p<0.05, **p<0.01, ***p<0.0001

Discussion. In terms of both RMS errors and correlations, model 3 fits better with the human data than do models 1 and 2.
This may suggest that, at least in ragtime music, the tonal cues may be used in determining the phase of tapping. Models 1
and 2 performed almost equally well in terms of the RMS error, whereas model 2 correlated with the human data slightly
better than did model 1. The main contribution to the latter difference comes from the correlations taken from the number of
switches. In terms of predicting the tapping mode, there was thus no significant difference between models 1 and 2. This may
imply that pitch height information was not used by the subjects.
Figure 1. Scatter plots of the six performance measures taken from subjects and model 3. Each point represents one stimulus;
its abscissa and ordinate correspond to human and model data, respectively.
Conclusion

We studied the dependence of pulse sensation evoked by ragtime music on temporal, pitch, and tonal factors. For this we
used three different models. Model 1 relied on temporal aspects only. Model 2 segregates the input to a set of streams using
pitch height information and weights each of the streams differently. Model 3 takes into account the tonal significance of
each note. The output of each of the models was compared with human data obtained using the same set of stimuli. It was
found that model 2 did not perform significantly better than model 1. Consequently, the subjects may not have used pitch
height information when tapping. Model 3, on the other hand, performed significantly better than the other two models. This
suggests that tonal significance of tones may affect the perception of pulse. More specifically, tones that are high in the tonal
hierarchy may be perceived as more accentuated.
References
Boltz, M., & Jones, M.R. (1986). Does rule recursion make melodies easier to reproduce? If not, what does?
Cognitive Psychology, 18, 389-431.
Brown, J. C. (1993). Determination of meter of musical scores by autocorrelation. Journal of the Acoustical
Society of America, 94(4), 1953-1957.
Clarke E. F. (1999). Rhythm and timing in music. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp.
473-500). New York: Academic Press.
Dannenberg, R. B. & Mont-Reynaud, B. (1987). Following a jazz improvisation in real time. In Proceedings of
the 1987 International Computer Music Conference. San Francisco: International Computer Music Association,
241-248.
Dawe, L. A., Platt, J. R., & Racine, R. J. (1994). Inference of metrical structure from perception of iterative
pulses within time spans defined by chord changes. Music Perception, 12(1), 57-76.
Desain, P. & Honing, H. (1989). The quantization of musical time: a connectionist approach. Computer Music
Journal, 13(3), 56-66.
Deutsch, D. (1980). The processing of structured and unstructured tonal sequences. Perception &
Psychophysics, 28, 381-389.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 149-180).
New York: Academic Press.
Gasser, M., Eck, D. ,& Port, R. (1999). Meter as mechanism: a neural network model that learns musical
patterns. Connection Science, 11, 187-215.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context.
Perception & Psychophysics, 32, 211-218.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220,
671-680.
Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a
spatial representation of musical keys. Psychological Review, 89, 334-368.
Large, E. W. & Kolen, J. F. (1994). Resonance and the perception of musical meter. Connection Science, 6(2-3),
177-208.
Lerdahl, F. & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
Longuet-Higgins, H. C. & Lee, C. S. (1982). Perception of musical rhythms. Perception, 11, 115-128.
McAuley, J. D., & Kidd, G.R. (1998). Effect of deviations from temporal expectations on tempo discrimination
of isochronous tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 24,
1786-1800.
Monahan, C. B., Kendall, R. A., & Carterette, E. C. (1987). The effect of melodic and temporal contour on
recognition memory for pitch change. Perception & Psychophysics, 41, 576-600.
Palmer, C. & Krumhansl, C. (1990). Mental representations of musical meter. Journal of Experimental
Psychology: Human Perception and Performance, 16, 728-741.

Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms. Music
Perception, 11(4), 409-464.
Povel, D. J. & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2(4), 411-440.
Scarborough, D. L., Miller, B. O. & Jones, J. A. (1992). On the perception of meter. In M. Balaban, K. Ebcioglu
& O. Laske (Eds.), Understanding music with AI: perspectives in music cognition. Cambridge, MA: MIT Press,
427-447.
Scheirer, E. D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society of
America, 103(1), 588-601.
Snyder J. & and Krumhansl C. L. (1999). Cues to pulse-finding in piano ragtime music. Society for Music
Perception and Cognition Abstracts. Evanston, IL.
Toiviainen, P. (1998). An interactive MIDI accompanist. Computer Music Journal, 22(4), 63-75.
Van Noorden, L., & Moelants, D. (1999). Resonance in the perception of musical pulse. Journal of New Music
research, 28, 43-66.
Back to index

Dr Gudrun Aldridge
Proceedings paper
MUSICALITY AND MUTUALITY : THE DEVELOPMENT OF MELODY
Dr Gudrun Aldridge
gudruna@uni-wh.de
Background:
In music therapy, melodies develop during the course of a series of

imporvisations. This developmental process is mutual and is based upon several
musical parameters. Melodic forms are a way of communicating human emotion in
Western culture, yet they are not simply present, they have to be composed in
performance.
Aims:
To discover how melody develops in mutual improvised playing.
Main contributions:
A qualitative study of the process of melody development in a man suffering

with depression demonstrates how there are phases in the therapeutic process
where musical development contributes to the eventual melodic production. This
expressive form is achieved through several musical excursions in mutual
improvisation, one of which is the establishment of a rhythmical structure
providing a stability from which the melodic form evolves. Harmony provides an
integrative milieu for expressive discovery.
Implications
Music therapy assists an individual to find his own expressive style that
relates to another person. Thus adult human communication can be seen as having
its foundations in an inherent musicality.
Back to index
file:///g|/Sun/Aldridge.htm [18/07/2000 00:31:12]

VIBRATO AND EMOTION IN SINGING VOICE
VIBRATO AND EMOTION IN SINGING VOICE
Tomoko Konishi, Satoshi Imaizumi, and Seiji Niimi
Department of Speech Physiology, Graduate School of Medicine, University of
Tokyo,
Background.
Singers may control the rate and magnitude of vibrato to effectively express
emotions in singing voice. The effects of controlling vibrato on emotional
contents in singing voice have not yet been clarified enough.
Aims.
Variability in acoustic characteristics of vibrato and its perceptual effects

were analyzed for sung vowels to estimate the mechanism of controlling vibrato.
Method.
Ten professional opera singers sung /a/ with four intended emotions of
happiness, anger, sadness and fear. Emotional profile of each recorded sample
was evaluated by 25 listening subjects using seven point rating scales. An
object-oriented acoustic analysis system was used to evaluate vibrato
characteristics including its rate and magnitude.
Results.
Although the vibrato rate varied only in a narrow range of 5.1 and 5.6 cycles
per second, the rate was found to have a significant effect on perceptual
emotions with the fastest rate for fear and the lowest for sadness. The vibrato
magnitude, which was the smallest for fear and the largest for anger, showed
also a significant effect on perceptual emotions. The rising time of vibrato at
the initial part of sung vowel also varied depending on emotions with a fastest
onset for anger and the slowest for sadness.
Conclusions.
These results suggest that the singers adjust fine characteristics of vibrato
so as to express emotions, which are reliably transmitted to the audience even
in short vowel segments.
Back to index
file:///g|/Sun/Konishi.htm [18/07/2000 00:31:13]

Sun PM
Proceedings apstract
CATEGORISATION PROCESSES IN MUSIC PERCEPTION DURING CHILDHOOD
Dr M Mélen, Irène Deliège (Liège) and Sandrine Praedazzer (Liège)

m.melen@ulg.ac.be
Background:
In music perception, like in any other perceptual domain, categorisation is the

process through which a subject compares the structures and gather them into
coherent relational networks. In music, this process helps building a
representation of the form of the piece. Previous studies show that adult
listeners realise these processes very consistantly, but the question of their
development during childhood remains open.
Aims:
The present study investigates categorisation processes in children aged from 7

to 10 years. The influence of formal musical training was investigated through
the comparison of musician and non-musician children.
method:
Participants were presented with two pieces. Both are build on two basic
structures (A & B), exposed at the beginning of the piece. The rest of the
piece consists in variations of both structures. After listening to the whole
piece, children performed three tasks. They first listened both parent
structures and had to judge which of them was more frequent in the piece. In
the second task, they listened to the derivatives of both structures and had to
categorise them as A-derivatives or B-derivatives. In the third task, thay were
asked to rate the degree of similarity between the parent structures and their
derivatives.
Results:
Both musicians and non-musicians should be equally able to perform the first
task. Although, both groups should perform above chance, an advantage should
appear for musicians in the second task. Musicians should perform much better
than non-musicians in the third task. Age should interact with musical tuition,
in a sense that these tendencies should be accentuated by older children.
Conclusions:
Children should able to categorise musical structures in a way comparable to

adult listeners.
Musical tuition should influence positively the performances.
Back to index
file:///g|/Sun/Melen.htm [18/07/2000 00:31:13]

Introduction
Proceedings paper
Testing Models as Predictors of the Rivalry Between Structure and

Surface in the Perception of Melodies
Isabel Cecilia Martínez & Favio Shifres
Universidad Nacional de La Plata
Introduction
Reductional Music theories propose a sort of musical listening which involves a mental processing of a series of events belonging to
the musical surface, establishing a hierarchy where some of them receive more structural importance, and, in that way, prolonging
their "existence". Harmony defines the tonal space in which prolongation develops itself, interacting with the melody which impulses
the movement (Salzer and Schachter, 1969). Both the notes which are prolonged as long as time passes by and the relationships of
tension and relaxation between events are important features of this interaction. This musical organisation is substantially governed
by the voice leading principles derived mainly from the strict counterpoint. Beyond the musical surface an underlying organisation
takes place, whose perceptual reality is intended to investigate. The process of abstraction of main events from the complete musical
piece, called reductional representation (Mc Adams, 1989), favours the attribution of tonal coherence to a musical piece (Salzer,
1962).
Research has been developed to study this phenomenon from a psychological perspective (Serafine, 1988; Serafine, Glassman &
Overbeeke, 1989; E. Bigand,1990,1994; N. Dibben, 1994; Martínez & Shifres 1999a, 1999b; Shifres & Martínez 1999). They used
different experimental paradigms: goodness of fit between a melody and the rendered reduction of its underlying voice leading (
henceforth UVL), similarity judgement between melodies with same or different UVL and family categorisation according to
resemblance of melodies.
Nevertheless, the study of reductional representation faces methodological difficulties in order to apply an experimental paradigm:
given that the events belonging to the underlying hierarchic levels also belong to the musical surface, when an structural event is
modified to create appropriate experimental conditions, the surface is simultaneously modified. Thus, it is difficult to specify the level
on which the listener´s response is based.
The study of the cognitive reality of the reductional representation should begin by the analysis of the reciprocal influences between
the surface and UVL attributes. Our own research (Martínez & Shifres 1999a, 1999b; Shifres & Martínez 1999), was devoted to
monitor this relationship using a similarity judgement paradigm. Results revealed that neither the contour hypothesis (according to
which two melodies are judged as more similar as higher is the association between their contours) nor the structure hypothesis
(which predicts that two melodies will be judged as more similar if they have same UVL) could explain separately the perceptual
similarity. Thus, a perceptual rivalry hypothesis was formulated.
This work follows a previous experiment which aimed to test such hypothesis, analysing further melodic components in order to
verify previous interpretations of the results, and to differentiate surface components from those of the UVL. It is an epistemological
exercise which asks which kind of melodic knowledge does the listener use while judging similarity of melodies. It is assumed that
the explanations provided by alternative models -more formalised and accepted than the one previously employed- may be useful to
the analysis of both components. It is expected to bring further support to the exploration of the psychological reality of the
assumptions of the reductional representation of the tonal hierarchic structure.
The Baseline Experiment section summarises relevant aspects of the cited investigation in order to clarify the actions which followed
it. Next section, The Models, synthesises the advantages of four models to analyse melodic attributes. Two of them (Combinatorial
Model and Oscillations Model) focus on surface components, studying the note to note level relationships and considering only the
particular melodic information of the musical examples. The other two (Tonal Weights and Melodic Anchoring) emphasise structural
aspects of the tonal melodies, because they are based on the study of invariants of the tonal system. The Method section describes the
analysis of the melodies used in the former experiment in terms of the mentioned four models. In Results section, empirical data from
the experiment are interpreted from the point of view of these new analysis. Finally, in the last section, contributions of these models
and different paradigms are discussed.
file:///g|/Sun/Martinez.htm (1 of 10) [18/07/2000 00:31:19]

Introduction
The Baseline Experiment

A previous study (Martínez & Shifres, 1999b) intended to create experimental conditions to test the hypothesis of Rivalry between
structure - understood as UVL - and surface - measured according to the note-to-note relationships of the melodic shape. Stimuli
consisted on 15 trios of melodies. They were the result of composing for each of the 15 selected tonal melodies one melody with the
same UVL and similar surface level and another of different UVL and similar surface according to the concept of melodic
diminutions (Schenker,1979; Forte & Gilbert, 1982) - Figure 1. In order to find a theoretical measure of similarity between surfaces, a
number of parametric controls was used while composing. One of them, the Correlational Model of theoretical melodic shape
similarity (Shifres & Martínez, 1999), appeared to be the most relevant to study the Rivalry. This model gives a theoretical measure
of similarity between melodic shapes as a function of the Pearson product-moment correlation coefficient between the series of
numbers which describe interval relationships of successive notes. A number is assigned to each interval between two successive
notes, with the absolute value representing the number of semitones of the interval and the sign + or - corresponding to its ascending
or descending direction. Thus, the association between series which represent a pair of melodies belonging to a trio, provides a
theoretic account of their relative similarity compared to the other two pairs of the trio. The model was used to monitor the
composition of the trios of melodies in order to obtain half of them with the highest association between melodies A and C (setting
the AC Group) and the rest with the highest correlation between B and C (BC Group). The lowest association always corresponded to
the pair AB. As melodies A and B had the same UVL, it was estimated that this treatment would create a rivalry condition between
both surface similarity and deeper structural levels similarity.
Figure 1. Procedure followed in the composition of the stimuli. Example No. 8: Chopin, Study Op. 25 No. 5. a)selection
of the fragment (Melody A); b) analysis of the underlying voice leading; c) reduction of the underlying voice leading
(R1); d) transformation of R1 into R2; e) Reconstruction of a melody (Melody C) from R2; f) Reconstruction of a melody
(Melody B) from R1, homologating changes between B and A to the changes between C and A.
146 adults with different levels of musical experience took part in the experiment. The experimental task consisted on listening to the
sequence AB - AC (or AC - AB) and to judge which of the two comparison melodies (B or C) was the most similar to A. Besides,
subjects had to estimate the level of certainty of the answer using a three point scale (very sure- not so sure- not sure). It was
hypothesised that listeners would judge melody B as the most similar given that it has the same UVL, although different degree of
association between surfaces would cause confusions in the responses. Thus, responses for melodies belonging to AC Group would
be less certain than responses for BC Group melodies. Data for B/C responses and certainty ratings were translated into a single score
ranging from 1 (very certain C) to 6 (very certain B), where 3 and below represent "C" and 4 and above represent "B". Thus, the test
value was 3.5.
Results confirmed the prediction: (a) subjects always tend to judge melody B (same UVL) as the most similar; and (b) different levels
of scores showed that structural and superficial attributes compete, causing different levels of perceptual Rivalry.
When the UVL is modified, inevitably the surface level is modified as well. For that reason, the experimental control of the
theoretical rivalry was based in the identification and exhaustive control of the surface attributes, which in spite of all the precautions
taken change anyway.
The purpose of the present work is then to test the pertinence of alternative theoretical models to describe those melodic attributes
which were modified while modifying the UVL. It is expected that if the models are useful in giving account of a different melodic
information they may help in finding a more precise estimation of the real incidence of the UVL in the similarity judgements. So,
stimuli used in the previous experiments were analysed according to the following models: Oscillations Model, Combinatorial Model,
Tonal Weights and Melodic Anchoring.

Introduction
The Models
Oscillations Model (Schmuckler, 1999)
Proposed by Schmuckler (1999), it is probably the most simple idea to describe the melodic contour, considering the most superficial
level of note-to-note relationships. In order to differentiate this measure from others used in former studies, and at the same time to
capture the melodic information avoiding any type of structural component, it was used the simplest version of the model which
consists on counting the number of ascents and descents of the melodic contour.
Combinatorial Model (Marvin & Laprade, 1987)

It was proposed by Marvin and Laprade (1987) in order to analyse melodic information about contour relationships (patterns of
ascents and descents) between non adjacent notes. These relationships may be represented in a n x n matrix, in which n represents the
number of notes of the melody and each entry is = 1 if it is an ascent or = 0 if it is a non ascent (being either a descent or a repetition).
The overall melodic direction is easily noticed observing the density of 1s or 0s that are placed above the diagonal of the matrix. The
CSIM (Quinn, 1999) is a measure of the similarity between contours of the same number of notes that reveals the proportion of
similar entries between the matrixes representing both contours. Its value ranges from 0 (if the matrixes have not common entries) to
1 (if all the entries are common to both matrixes). "This model is sensitive to the interval size as well as interval direction, but only
the interval size relative to the other intervals within a pitch pattern" (Quinn, 1999, p. 446). The model may give information about
the overall direction of the melody. It also focuses the problem of long term attending (which is present in the listening of UVL)
without considering the underlying structure.
Thus, both models (a) capture the information of ascents and descents without considering the interval dimension and (b) represent
two edges in relation to the temporal focus required (from the note-to-note level -Oscillations Model -, to the level which considers
the long term linear connections that may be present in this type of short melodies - Combinatorial Model ).
Tonal Weights (Krumhansl, 1990).

Tonal Weights describe the Tonal Hierarchy (Krumhansl, 1990) providing quantifiable information of the amount of stability of every
melodic tone related to a tonal context. As a psychological principle the Tonal Hierarchy explains the abstraction of the relative
stability as an invariant, a sort of implicit knowledge which is acquired by acculturation and stored in the long term memory as a
schema. It is activated while listening to a musical piece and contributes to generate expectations concerning to the more stable
incoming events. These, in turn, facilitate the codifying of each tone of the piece. It is expected that the model may give an alternative
explanation to the incidence of structural components (regardless to the voice leading) in the similarity judgements, providing a
measure for the overall tonal stability of the melody. Thus, if two melodies have similar level of tonal stability they will be judged as
more similar compared to other melodies with very different levels of stability. Besides, if this measure is combined with an
estimation of the note duration, then it would provide a situational description of the event hierarchy within the context of the melody.
As the stimuli analysed are constrained by the limits of the psychological present, it is assumed that Tonal Weights contributes to the
similarity judgement, in spite of appearing to be a static measure.
Model of Melodic Anchoring (Bharucha, 1984)

Melodic Anchoring (Bharucha, 1984a; 1996) is a psychological principle which characterises the listener's implicit knowledge to
assign a position of relative stability or instability to each incoming pitch event of a given melody. Through this principle the more
unstable tones tend to be assimilated by the more stable tones which form the tonal schema. Two constraints govern the activation of
this principle: (a)asymmetry, (the stable tone always follows the unstable tone and not the other way round) and (b) proximity (two
successive events may not be more distant than a major second).
Although the model analyses the note-to-note level, it distinguishes between different levels of stability and describes their
relationships providing a measure of the structural stability of the melody. Thus, this principle acts as another structural force, being
more dynamic than the Tonal Weights but more local than the UVL. So, it is assumed that this model might capture information that
listeners would take into account while judging similarity.

Introduction
Method
Oscillations Analysis
The number of reversals in the direction of the melody was counted. This provided a measure of the tendency of movement. Although
the original version of the model does not include repetitions, as in this test three melodies with the same rhythm were listened, it was
assumed that repetition would be clearly noticed as a change of direction (in that way reversal is understood as change of direction).
Two measurements were obtained:
1. Differences of oscillations (*) of melodies B and C compared to melody A:
RSIM = rsimAB - rsimAC
where rsimAB = 1/2RA - RB 1/2; and R is the number of changes of direction.

2. Classification of trios according to the highest rsimXY (**)-
AB Group: rsimAB<rsimAC. and CB Group: rsimAB>rsimAC.
Figura 2. Procedure followed to compare the theoretic similarity of the melodies according to the four models analysed.
Combinatorial Analysis
Each melody was represented with a matrix. The matrixes were compared into pairs counting the number of similar entries and
dividing it by the total number of entries of the matrix (without the diagonal- see figure 2). These proportions (csim) generated two
measures:
1. Difference of proportions (*):
csimAB - csimAC = CSIM
This value ranges from 1 to -1. If positive, A and B have the highest theoretic similarity; if negative, this relationship is
between A and C.
2. Classification of trios (**) according to the highest theoretic similarity corresponding to AB (AB Group) or to AC (BC Group)

Introduction
Analysis of the Tonal Weights

On the assumption that changes of the comparison melody do not affect the structural importance of the notes which do not change,
there were considered only different notes between the melodies. The procedure was as follows:
1. Given the pair A and B, the different notes between them were assigned the value of the probe tone ratings of Krumhanls´ test
[TW] (1990; p.30) -Figure 2.
2. Product of Tonal Weight:
TWP = TW x D
Where D is the duration of the note, taking the minimal duration of the fragment as unit.
3. Sum of Products of Tonal Weights:
∑TWP=TWPi + .... + TWPj

Where i, j are the different notes between A and B
4. Difference of Tonal Weights
∆TW=∑TWP[A]+∑TWP[B]
If the tonal weights are similar, ∆TW will be next to 0 - and the perceptual similarity between both melodies will be higher.
5. The same procedure was followed with AC. Then, the following measures were obtained
1. Proportion of differences between tonal weights (*). Given that subjects listened to the sequence AB - AC, it is possible
to think that the relative value of ∆TW in a pair within the total ∆TWs of the whole sequence may influence the
answer. Thus,
∆TW%= ∆TWAB/(∆TWAB+∆TWAC)
The more similar the amounts of tonal weights between B and C, the more the proportion tends to 0.5. In this case tonal
weights will not influence the subject's decision for any of the two melodies. As soon as this proportion is closer to 1 -
indicating that the difference in tonal weight of AB is higher than the difference of AC - the similarity judgements will
tend to C. Thus, it is possible to estimate a correlation between the ∆TW% and the means of perceptual similarity
observed in the experiment.
2. Classification of trios (**) according to their ∆TW. Trios were classified according to the lowest ∆TW found
between the pairs AB and AC:
AB Group (∆TWAB<∆TWAC), AC Group (∆TWAB>∆TWAC).
Analysis of the Melodic Anchoring

According to the model, the expectation strength of resolution is determined not only by the proximity between an unstable tone and a
stable one but also by the relative proximity of the two potential anchor tones which are placed at both sides of the unstable tone.
There are other factors which contribute to the expectation strength such as metric position , level of activation and components of
attention which are mentioned but not formalised in the model. A force vector, sum of all the forces which exert a pull on an event, is
proposed. It represents the direction (up or down) and strength of the pull, conveying a psychological force, an expectation of
resolution of a current event to the nearest stable tone. A tonal force vector in a particular direction is proportional to the activation of
the nearest anchor (stable tone) in that direction and inversely proportional to the distance -measured in semitones- from that anchor.
The value of a tonal force vector results from the difference between the values of both potential tonal force vectors (ascendant and
descendant).
Melodic Anchoring (Bharucha,1996) was applied to the 15 trios of melodies according to the following procedure:
1. It was analysed the implicit harmony of the melodies applying the principle of parsimony, if necessary, according to the criteria
of consonance, coherence of the harmonic chain, harmonic rhythm and strength of the harmonic relationships.
2. A value of the anchoring strength (ANCH) was assigned to every notes. The chord tones were assigned 0 representing the
condition of stability.
1. The ANCH for the non chord tones was calculated assigning:
1. a constant - 1 - which represents the category of unstable tone, plus...
2. the corresponding value of the tonal force vector tE which qualifies the relative instability of the tone, plus...

Introduction
tE=(A/a0) x [(a+/d+)-(a-/d-)] (Bharucha, 1996)

Where A represents factors of attention (it is consider constant in the current analysis).
a+ and a- are the levels of activation of the up and down anchor tones. Here, they are considered roughly the same (a+ = a-).
Thus:
tE=(1/d+)-(1/d-)
When the chord changes between the non chord tone and the chord tone, the non chord value is calculated averaging the value
of this tone regarding both chords.
It was added 1 to each value if the tone anchored in the predicted direction, 1,375 if it resolved in the opposite direction and
1,75 if it did not anchored.
3. The average of ANCH of all the tones of the melody was obtained. It represents the value of tonal stability of the melody
measured in terms of melodic anchoring. The higher the value the lower the stability of the melody. Thus, it was estimated:
1. Difference of anchoring (*)∆ANCH.
∆ANCH= (1/2ANCHA- ANCHB 1/2 − 1/2ANCHA- ANCHc1/2) x (-1)
According to this value, the model predicts that: if the value is negative, the most similar melody is C. And if the value is
positive, the predicted highest similarity corresponds to B.
2. Classification of trios (**) according to their ∆ANCH
AB Group (1/2ANCHA- ANCHB 1/2 < 1/2ANCHA- ANCHc1/2),
AC Group (1/2ANCHA- ANCHB 1/2 > 1/2ANCHA- ANCHc1/2).
Thus, each model provided two measurements: (*) a continuous one and (**) a categorical one (which allows the stimuli
classification in groups according to their theoretic similarity). They may contribute to analyse the concept of rivalry within each
category.
Results
The continuous measurements obtained were used to run a lineal regression analysis. It showed that none of the models could predict
the differences in the similarity judgements.
Predictions emerging from the classification of the models taken in pairs were compared: Anchoring vs. Oscillations; Anchoring vs.
Combinatorial Model; Anchoring vs. Tonal Weights; Tonal Weights vs. Oscillations; Tonal Weights vs. Combinatorial Model; and
Oscillations vs. Combinatorial Model. According to this comparison the melodies could represent different possibilities of
agreement/disagreement between models: Agreement B (when the trio belongs to the AB Group in both models and they agree on
predicting the highest similarity of B); Agreement C (the trio belongs to the AC Group in both models and they agree on predicting
the highest similarity of C); and Rivalry (when one model predicts the highest similarity of B and the other predicts the highest
similarity of C).
It was predicted that the ratings of perceptual similarity for the melodies classified as Rivalry in the comparative analysis of the
models would be intermediate between examples in Agreement B and examples in Agreement C (Figure 3).

Introduction
Figura 3. Values of predicted perceptual similarity for the cases of agreement and disagreement about the theoretic
similarity predicted by a pair of models.
A Anova 3 (Agreement/Rivalry) X 5 (Comparisons) repeated measures showed significant results for the factor Agreement/Rivalry
(F[2, 143] = 91.224; p<.000). Thus, Agreement B represented higher perceptual similarity of B, Agreement C represented higher
relative tendency toward C, and an intermediate value represented those cases in which the comparison of the two models resulted in
alternative predictions (Rivalry ) (Figure 4).
However, none of these means is lower than 3.5, showing that subjects always judge B as the most similar. This implies that the
models do not capture all the information which is being used in the similarity judgements.
The factor Comparisons was also significant (F[4,141] = 24.616; p<.000), indicating that the combined effect of the two models within
the pair presented differences. This means that, as it is observed in the figure, each pair of compared models show different values of
perceptual similarity on the agreement for B or C. For example, the perceptual similarity of B is better predicted by the agreement
between the Combinatorial Model and the Melodic Anchoring or the Tonal Weights than by the agreement between the Oscillations
Model combined with any of the other models.
The most curious result is that the Interaction between both factors also was significant (F[12,134] = 11.906; p<.000) indicating that if
a pair of models represents the best combination in order to predict the perceptual similarity of B, it is not the case for the predicted
similarity of C and vice versa. Besides, it can be observed in the figure that although the tendency of highest similarity can be
predicted by different combinations of two models, the rivalry between them only is well defined by the combination between the
Oscillations Model and the Anchoring Model.

Introduction
Figura 4. Values of observed perceptual similarity for the cases of agreement and disagreement in the predicted theoretic similarity by
5 different pairs of models
Discussion
This study tested different models of melodic analysis in order to investigate whether listeners capture the attributes which they
describe while judging similarities. The aim was to contrast the conclusions of a previous study according to which , under certain
constraints, the Underlying Voice Leading was the attribute that listeners took into account when they made similarity judgements,
even though if a Rivalry between the melodic contour and the UVL could be inferred. The results of the contrast of the alternative
models show that:
a. It also exists a Rivalry between the attributes -captured by the models- and the UVL;
b. In spite of that Rivalry, choices always favoured the melody with the same UVL, being the mean always higher than 3.5.
c. The UVL is an attribute which listeners seem to capture when they judge similarities. The different models do not explain these
responses.
d. Therefore, and according to the tendencies observed in these studies, it is considered that the UVL was isolated as an
experimental variable.
One of the main methodological difficulties that presents the experimental study of UVL as a plausible explanation of the cognition
of musical structure, is that principles that govern their relationships are neither described in parametric terms nor according to
categories defined in absolute terms. In Psychology, it is considered that the analysis of similarities and differences of data is a valid
methodological tool in order to capture the structure of the underlying structure (non measured) of the objects under observation.
Thus, the application of the paradigm of similarity judgements in the Baseline Experiment was pertinent to the purpose of the former
study.
Another methodological challenge has to do with the fact that, although the theory exhibits prescriptive terms, the derivation of these
principles is an interpretative matter. The current study praised models which analyse the processing of the melodic information in
parametric ways. Thus, the contrast with the explanations providing by such approaches with the "heuristics" of the UVL, intended to
abduct the later from the rest of the components. The measurements gave account of an important scope of features characterised by
the models, from the note-to-note level of the musical surface to the invariants of the musical structure.
Although the Melodic Anchoring principle characterises some aspects of the melody, invoking voice leading principles, there are
differences between this theoretical framework and the one provided by the schenkerian theory. Thus:
a. Melodic anchoring is prospective, while melodic diminutions may be either prospective or retrospective - prefix or suffix
(Forte & Gilbert, 1982) -.

Introduction
b. The constraint of temporal proximity is not pertinent to the principles prescribed by the theory of voice leading in which events
of the musical surface may be related to structural events which are relatively distant.
c. Unlike Melodic Anchoring, voice leading principles establish hierarchies between tones not only as a result of the relationship
chord tone-non chord tone but also as a result of their power to establish a coherent horizontal relationship between the
previous and the following tone at a particular level of the hierarchy. Thus, for example, the concept of melodic diminutions of
a structural tone may imply arpeggiation and consonant jump as a more superficial status, while from the Melodic Anchoring
point of view these notes represent chord tones of the same structural level.
Results indicate that none of these models gave a more parsimonious explanation of the similarity judgements than the one provided
by the theory of UVL. Therefore, we can go further in recognising the validity of this theory to bring an explanation of the facts.
Nevertheless, it is interesting to observe the way in which the models also rival, evidencing that each of them are giving account of
different attributes of the melody.
Considering the similarity judgement as a process of feature-matching that is the result of operations of contrast between collections
of features which allow to weigh what is common against what is different (Tversky, 1977), the present analysis show that, in fact,
the models provide only partial explanations of such differential features and that there is "something more" in the responses,
something that the listener grasps in order to configure the response. According to the characteristics of the stimuli, whose surfaces
were carefully monitored, this component is the UVL.
The relationship between Music Theory and Psychology of Music has involved the study of relatively basic concepts such as chords,
scales, intervals, etc. Success in this primary relationships between both disciplines has encouraged the development of experiments
in order to test the cognitive reality of more complex theoretical constructs (Krumhanls, 1995). This implies both a methodological
and an epistemological challenge. In this case, the results obtained encourage the future application of the model of contrast (Tversky,
1977) to induce aspects of the tonal structure at deeper levels.
References
Bharucha, J. J. (1984a). Anchoring Effects in Music: The Resolution of Dissonance. Cognitive Psychology, 16, 485-518.
Bharucha, J. J. (1996). Melodic Anchoring. Music Perception, 13-3. 383-400
Bigand, E. (1990). Abstraction of two forms of underlying structure in a tonal melody. Psychology of Music,18, 45-60.
Bigand, E. (1994). Contributions de la musique aux recherches sur la cognition auditive humain. In S. Mc Adams & E.
Bigand (eds.). Penser les sons.Psychologie cognitive de l'audition. Paris: Presses Universitaires de France. 249-298.
Dibben, N. (1994). The Cognitive Reality of Hierarchic Structure in Tonal and Atonal Music. Music Perception, 12 No
1, 1-25.
Forte, A. y Gilbert, S. ([1982] - 1992). Introducción al Análisis Schenkeriano. [trad: Introduction to Schenkerian
Analysis, Pedro Purroy Chicot]. Barcelona: Labor.
Krumhansl, C. (1995). Music Psychology and Music Theory: Problems and Prospects. Music Theory Specturm.Vol. 17
No. 1, 53-80.
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. New York. Oxford University Press.
Martínez, I. C. & Shifres, F. (1999a). Music Education and the Development of Structure Hearing. A Study with
children. In M. Barret, G. Mc Pearson & R. Smith (Eds.) Children and Music: developmental perspectives. Launceston:
University of Tasmania.
Martínez, I. C. & Shifres, F. (1999b). The rivalry between structure and surface while judging the similarity of melodies.
Paper presented to SMPC99. Evanstone, Illinois. USA
Marvin, E. W. & Laprade, P. (1987). Relating musical contours: Extensions of a theory for contour. Journal of Music
Theory, 31, 225-267.
McAdams, S. (1989). Psychological constraints on form-bearing dimensions in music. Contemporary Music Review, 4,
1-7.
Quinn, I. (1999). The combinatorial Model of Pitch Contour. Music Perception, Vol. 16, No. 4, 439-456.
Salzer, F. & Schachter, C. (1969). Counterpoint in Composition. New York. Columbia University Press
Salzer, F. ([1962]-1990) Audición estructural. Coherencia tonal en la música.[trad.: Structural Hearing. Tonal coherence
in Music. Pedro Purroy Chicot]. Barcelona: Labor.

Introduction
Schenker, H. ([1935]-1979). Free Composition (Der freie Satz). Translated and edited by E. Oster. New York: Schirmer
Books.
Schmuckler, M. A. (1999). Testing Models of Melodic Contour Similarity. Music Perception, Vol.16, No. 3, 295-326.
Serafine, M. L.; Glassman, N. & Overbeeke, C. (1989). The Cognitive Reality of Hierarchic Structure in Music. Music
Perception, 6 , 397-430.
Shifres, F. & Martínez, I. C. (1999). Control Experimental de la Estructura Tonal y la Superficie Musical. Boletín de
Investigación Educativo Musical CIEM, 17, 42 - 46.
Tversky, A. (1977). Features of Similarity. Psychological Review, 84, 4, 327-352.
Back to index

Introduction
Proceedings paper
SONG MELODY INFLUENCE ON SPEECH INTONATION MEMORIZATION

Anna Fomina
Kharkiv State Pedagogical University, Kharkiv 61168, Ukraine
Song employment in foreign language acquisition can significantly enliven the teaching process, create favourable
emotional atmosphere in class. The use of songs that enjoy popularity among the audience greatly raises the
learners' motivation for the language acquisition. Besides that, specially selected songs lyrics can present a suitable
material for teaching particular grammar structures of the language and enriching vocabulary. Folk and popular
songs are an important part of any nation's culture, and thus they have a great cultural potential for language
learning. That is why recently the employment of songs in foreign language acquisition is getting wider and wider.
(See, for example, Murphey, 1992; Griffee, 1992, Silver, 1996). There even appear some courses of teaching
foreign languages totally based on songs (e.g. Grenough, 1995).
An important reason for such active song exploitation in foreign language acquisition is the fact that songs have a
special property ? they are relatively easy to remember. The problem of song melody influence on text
memorization is an object for special study of psychologists, and music psychologists in particular. (See, for
example, Krashen, 1985, Ginsborg, 1999 and references there)
In the present work we will consider only one aspect, connected with song utilization in foreign language
acquisition, that is the influence of song melody on speech intonation memorization. This consideration is based on
the results of the experiments, which were conducted with 46 students of Kharkiv's universities, Ukraine.
Intonation is an important and indispensable part of language. It can significantly change semantic loading of a
phrase. Intonation systems of different languages can differ greatly from one another. Thus, a person studying a
foreign language has to master not only vocabulary and grammar structures, but also intonation patterns strange for
him.
The famous Russian musical critic (a philologist by education) B. V. Asafiev wrote that speech intonations, as a
primary element of audio communication and a means of information transfer, is the common root of both language
and music art; they are "branches of one sound stream." (Asafiev, 1965).
Nevertheless, in due course European music system moved farther away and got contrasted to speech intonation. E.
Ruchyevskaya assumes that the semitone scale and the corresponding notation in music have created a barrier of
qualitative distinction of melody from speech intonation. However, this barrier is absent in folk songs, the songs
that have preserved the natural connection with the typical speech intonations of various intention and emotional
colouring. (Ruchyevskaya, 1973).
The analysis of a number of English and American folk songs, e.g. "Red River Valley", "Shenandoah", "Come All
You Fair and Tender Ladies" and others, showed almost full concurrency of the motion of the melody line and the
speech intonation of the song lyrics.
Figure 1 presents the tonogramme of the text of the song "Come All You Fair and Tender Ladies", compared with
its melody line. A tonogramme is a grid in which dashes and dots represent stressed and unstressed syllables. The
speech intonation falling and rising at the end of the phrases are shown by upward and downward curves ( , ) or a
dash with a dot, in case the last syllable in a phrase is unstressed .
file:///g|/Sun/Fomina.htm (1 of 3) [18/07/2000 00:31:20]

Introduction
Of course, if compared with speech intonations, music intonations have a wider range of tones (several octaves)
with deeper differentiation and even redundancy. (While speaking we usually use the range of 4 tones of the octave
to give the content of the utterance. Emotionally coloured speech can cover 6 tones). However, there seem to be a
lot of concurrences in the general motion of music and speech intonations within one and the same phrase; i.e. the
tendencies of speech rising and falling intonations reveal themselves in melody modulations.
The similar analysis of modern popular songs, which are often used in the English language courses, showed that
this concurrence takes place far from often. See, for example, "Tom's Diner" figure 2. The melody phrase ending
with the word "actor" has a falling intonation, whereas, the speech intonation in this phrase should rise before the
subordinate clause "who had died while he was drinking". The subordinate clause concludes the affirmative
utterance, so its speech intonation falls, though the melody line in this place rises.
So the question arises, what influence the employment of songs either with concurrent or non-concurrent melody
and speech intonations can make on the formation of the correct speech intonation when learning foreign
languages.
In order to test our assumption that the melody of the song influences correct or incorrect speech intonation
memorization we conducted an experiment. It was held in Kharkiv National University, Ukraine, and lasted four
months. 24 first-year students of Foreign Languages Department, Kharkiv National University, were offered to use
9 songs with specially designed activities, aimed at practicing particular grammar structures and vocabulary.
During the lessons the students listened to each song from three to five times.
In six of the selected songs the melody line didn't differ from the speech intonation of the lyrics. But in three songs
the melody deviated from the speech intonations of the lyrics, in particular, the song "Tom's Diner" (fig. 2). At the
final stage of the experiment the participants were asked to reproduce some phrases from the lyrics in another

Introduction
context. The students' performance showed that in case of melody and speech intonation deviation, the phrases
stuck in their memory with the wrong intonation imposed by the melody.
The students themselves noted that the song melody they had memorized "wronged" their speech intonation.
A year later a similar experiment was conducted in Kharkiv State Pedagogical University with another 22 first-year
students of the Department of Foreign Philology. The results were the same. They allow us to conclude that there
really exists a certain influence of song melody on speech intonation memorization during a foreign language
learning process. We can state that quick melody memorization imposes speech intonation patterns, thus hindering
the use of the correct ones.
We believe that this question requires further investigation both by music psychologists and psycholinguists.
However, even now we can already suggest that, song material for teaching and developing different language
skills should be carefully selected, so that the melody fully coincided with the speech intonation patterns of the
lyrics.
We also believe that similar process may be observed in other languages. So, further development of the problem
under consideration could help not only the English language teachers but also teachers of other languages.
I would like to acknowledge dr.T.Merkulova for fruitful discussion on the topic.
References
Asafiev, B.V. 1965. "Speech Intonation." - Moskva-Leningrad." (in Russian)
Ginsborg, J. 1999. Unpublished PhD thesis, University of Keele, UK.
Grenough, M. 1995. "Sing it! Learn English through Songs." - McGraw-Hill
Griffee, D.T. 1992. "Songs in Action." ? Prentice Hall, Seigakuin University, Japan.
Krashen, S.D. 1985. "The Input Hypothesis" ? London: Longman.
Murphey, T. 1992. "Music and Songs" ? Oxford University Press.
Ruchyevskaya, E. 1973. "On Methods of Speech Intonation Realization and Expressiveness of its
Meaning." ? Poetry and Music. Moscow: Music. (in Russian)
Silver, J. 1996. "English through Songs." ? London: Silver Songs.
Back to index

Todd
A Sensory motor theory III: Human vs Machine performance in a beat
induction task.
N.P.McAngus Todd and S. Kohon.
Department of Psychology, University of Manchester, Manchester, M13
9PL.
1. Background
Over the last few years we have been developing a general theory of
rhythm and timing which is consistent with the idea that the
perception and production of rhythmic sequences in music and speech
is an elaboration of a more general sensory-motor sequencing faculty
of the left hemisphere. The theory is implemented as a computational
model in which an attempt is made to simulate the principal brain
structures involved in sequencing (Todd et al 1998, 1999). The model
takes sound samples as input and sychronises a simple dynamic system
to simulate beat induction.
2. Aims
The aim is to evaluate the model by comparison with the performance
of human subjects in a beat tracking task.
3. Method
In the first experiment 2 human subjects and the model were compared
in a beat tapping task for 160 samples of music (Todd and Kohon,
submitted) with a variety of rhythmic structures and tempi. The
second experiment focused in more detail, with a larger number of
subjects, on a subset of the samples, a performance of the 48 fugue
subjects from the Well-Tempered Clavier by J.S. Bach. In all cases
subjects were instructed to tap the beat in synchrony with the
music. The tapping responses were evaluated by a number of
parameters: including the main tapping period, the time to reach a
steady rate and the variability of the tapping compared to the ideal
beat.
4. Results
In the first experiment the model produced viable responses
(responses with a clear beat rate) in 70% of cases. The human
subjects produced viable responses in between 90 and 95% of cases.
Of the viable responses, the beat rates of the model corresponded
significantly to the beat rates of the human subjects. In the second
experiment the model was indistinguisable from humans in it's beat
rates. The model also compared well with other comparison measures
file:///g|/Sun/Todd.htm (1 of 2) [18/07/2000 00:31:21]

Todd
including time to reach a stable beat and position of the down beat.
5. Conclusions
The results provide support for the model but also raise several
issues for model improvement not least that of scene analysis and
the role of top-down processing in the extraction of a beat.
References
Todd, N.P.McAngus and Kohen, S. (submitted) Testing a sensory-motor
theory of rhythm perception: Human vs machine performance in a beat
induction task.
Todd, N.P. McAngus, Lee, C.S. and O'Boyle, D.J. (1998). A
sensory-motor theory of rhythm and timing in music and speech.
Proceedings of the International
Conference on Neural Information Processing. ICONIP'98. Japan,
October, 1998.
Todd, N.P. McAngus, Lee, C.S. and O'Boyle, D.J. (1999). A
sensory-motor theory of rhythm, time perception and beat induction.
J. New Music Research 23(1), 25-70.
Back to index
file:///g|/Sun/Todd.htm (2 of 2) [18/07/2000 00:31:21]

Prof Colwyn Trevarthen
TIMING AND EXPRESSION OF EMOTION IN NATURAL MUSICALITY
Prof Colwyn Trevarthen
c.trevarthen@ed.ac.uk
Background:
Research on the willing participation of infants and toddlers in rhythmic and

melodic communication with caregivers or with companions in play, and
experiments on young infants' discrimination of musical features, indicate that
the timing and emotional values of gesture and melody in human vocal
expression, and speech, are constrained by organisations within the human
brain. The regular parameters of this expressive behaviour, their universality,
and the efficient synchronisation and interchange between subjects widely
different in age and skill, suggest that the foundations of 'musicality' are in
innate motive impulses which co-ordinate all modalities by which bodily
movements are monitored in perception.
Aims:
This presentation will seek a unified theory of the Intrinsic Motive Pulse by
which, it is proposed, human movements are generated and integrated, and relate
this to movements of the body in bipedal locomotion, communicative gesture and
voluntary acts of all kinds.
Main contributions:
The proposed theory of a hierarchy of innate rhythms and expressive dynamic

forms offers a unified explanation of prospective perceptuo-motor control
directed to the environment with self regulation of intrinsic motive states,
and thereby gives a plausible explanation of both the development and education
of communicative skills, and the therapeutic effects of musical activity and
musical stimulation. It will relate the narrative and emotive structure of
human purposeful movement and mimicry to intrinsic timing of autonomic
regulations.
Implications
Skilled musical performance and spontaneous expressive musicality, alike, are

evidently satisfying because they release and support intrinsic parameters of
moving and perceiving. The power of music-making to generate memorable
narratives of feeling and to influence states of consciousness and recall
depends upon this foundation.
Back to index
file:///g|/Sun/Trevarth.htm [18/07/2000 00:31:21]

VIBRATO, QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE
Proceedings paper
VIBRATO: QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE

Renee Timmers and Peter Desain
'Music, Mind, Machine' group, NICI, University of Nijmegen, PO Box 9104, 6500 HE Nijmegen, The Netherlands; E-mail: timmers@nici.kun.nl or desain@nici.kun.nl ; Info: www.nici.kun.nl/mmm/
INTRODUCTION
Vibrato is the periodic fluctuation in pitch, amplitude and/or timbre of a musical tone. It is used by singers, string players, and, in some cases, by wind instrumentalists to ornament or
color a tone. Vibrato research has focused on the general characteristics of vibrato, such as its form (generally found to be sinusoidal, but mostly trapezoidal according to Horii,
1989b), its perceived central pitch (mean or median of pitch fluctuation, see, e.g., Shonle & Horan, 1980, Sundberg, 1978), and its mean rate and extent in musical performances (rate
between 5.5-8 Hz, extent between 0.6-2 semitones for singers and between 0.2-0.35 semitones for string players, see Seashore, 1938; Sundberg, 1987; Meyer, 1992). The modeling of
vibrato characteristics has suggested that pitch vibrato is the primary acoustic characteristic of vocal and string vibrato from which amplitude and timbre vibrato result (Horii, 1989a;
Meyer 1992). The (un)conscious control of vibrato characteristics by professional musicians has been a point of debate. While strong instruments are generally found to be able to
control vibrato rate and extent, singers are said to have very limited to no control, or to have only some control over vibrato extent (Seashore, 1932; Sundberg, 1987; King & Horii,
1993). Analyzed dependencies of vibrato to other performance aspects are (among others): short notes only contain an upper arch (Castellengo, Richards & d'Alessandro, 1989);
notes generally start with a rising pitch (Horii, 1989b) and end in the direction of the transition (Sundberg, 1979). Equal debate concerns the dependency between vocal vibrato and
pitch height, which is confirmed by Horii (1989b) and rejected by Shipp, Doherty & Haglund (1990).
In this paper, we will focus on vibrato as an expressive means within musical performances. In this respect, we assume that vibrato may be used by musicians to stress notes or to
convey a certain musical interpretation. It is an area of research that recently gained interest and is still in an explorative stage (see the contribution of Gleiser & Friberg to this
proceedings). We turn to musicians' hypotheses concerning the expressive function of vibrato and compare this to observations made on the relation between music structural
characteristics and vibrato rate and extent in actual performances. The analyses of the performance data are based on the predictions of expressive vibrato behavior (Sundberg,
Friberg & Frydén, 1991) and on predictions stemming from piano performance research that attributes expressive behavior to the pianist's interpretation of musical structure (e.g.,
Clarke, 1988). The comparison aims to show that the scientific inquiries could be inspired by hypotheses stemming from musicians/experts who devote their life to refining their
control of musical parameters for expressive means, and teaching that to students. Vise versa, the scientific results can achieve a musical meaningfulness and value, also for
musicians and teachers.
METHOD
Subjects
Five professional musicians participated in the study: a cellist, an oboist, a tenor, a thereminist, and a violinist. The musicians are all known-musicians for their performance in
file:///g|/Sun/Timmers.htm (1 of 9) [18/07/2000 00:31:24]
orchestras, chamber ensembles and/or as soloists. Each participant was paid for participation.
Material
The study used a notation of the first phrase of 'Le Cygne' by Saint-Saëns (1835-1921) for musicians to play from. Originally, 'Le Cygne' (translation: the swan) is for cello solo with
orchestral accompaniment. A piano reduction of the orchestral accompaniment is however very common, as is a performance of the solo part by other melodic instruments than cello.
The swan is in G major and 6/4 measure. The first phrase is the theme of the piece, is four measures long and consists of two sub-phrases of two bars. It is preceded by an
introduction from the accompaniment of one bar (see figure 1). The melody of the first sub-phrase starts with a descending movement in quarter notes (measure 2) and ends with a
counter-movement in longer notes. The melody of the second sub-phrase consists of one long ascending movement in eighth notes that starts and ends on a dotted half note (m3,
figure 1). The accompaniment consists of broken chords in sixteenth notes. The harmony of the broken chords is: the tonic chord in root position (m1-2), the ii sub-dominant chord
with a pedal on G in the bass (m3), a progression from ii to dominant-7 chord with a pedal on G in the bass (m4) and a return to the tonic in root position (m5).
The questions of the interview concerned the production of vibrato on the musician's instrument, the general use and function of vibrato, the specific expressive treatment of the first
phrase of 'Le Cygne', and the differences in this treatment between repetitions. Expressive treatment includes variations in amplitude, vibrato (general), and vibrato rate & extent.
Figure 1 Score of the first phrase of 'Le Cygne' with annotation of metrical structure (dots) and sub-phrase structure (bold lines).
Procedure
The musicians performed the first phrase of 'Le Cygne' by Saint-Saëns along with a metronomical accompaniment, which they heard over headphones. They performed the phrase six

times after each other without pauses. The tempo of the accompaniment was fixed at 60.0 beats per minute. This fixation of tempo was chosen to limit the use of expressive timing
and encourage the use of expressive vibrato. Changes in dynamics were left free. The performance took place in a sound proof cabin and was recorded as 11 kHz, 16-bit, mono audio
files.
After the recording session each musician was interviewed. The interview was recorded on video. The musicians were free to show examples on their instrument and to drift away
from the exact question to some extent.
Data processing
A spectral analysis was run on each file, the fundamental frequency was extracted and half-cycles were detected between subsequent local maxima and minima. The half-cycles were
interpreted as vibrato when their rate was within the range of 2-10 Hz and their extent was larger than 0.1 semitone. Note on- and offsets were detected on the basis of a dynamic
amplitude threshold (less than -40 dB as compared to the maximum amplitude) in combination with a dynamic pitch threshold (less then 0.3 semitone deviation from the mean pitch).
The resulting data to be analyzed consisted of collected note features: mean amplitude, mean vibrato rate of pitch vibrato, and mean vibrato extent of pitch vibrato per note.
RESULTS INTERVIEW
Cello
According to JI, vibrato is made on a cello by moving the left hand in a periodic and symmetric way up and down the neck of the instrument around a central pitch. The impulse is
rather large and comes from the arm. Vibrato is according to JI quite natural and easily learned. He saw the function of vibrato in 'Le Cygne' as aid in the production of a legato
performance, and of a warm and lyrical sound. Vibrato was used as part of the phrasing of the music. He used a kind of vibrato that is not too fast and not too exuberant. Some notes
of the phrase got stress by giving them a more full sound, which means that he performed those notes with a more expressive and faster vibrato, and with more "meat" of the fingers.
The end of phrases "died" away, which was accompanied by a smaller vibrato. In general no note was performed the same or with equal vibrato.
Oboe
According to HR, vibrato on an oboe can be made in several ways; by using the throat, the diaphragm, or even the lips or jar. HR used throat vibrato in 'Le Cygne', because it is quite
fast and expressive. Throat vibrato is produced by a rapid repetition of a short 'a' sound on the oboe. She teaches her pupils to perform a rhythmically and fast vibrato by
synchronization with a metronome. The result is a periodic fluctuation in pitch around a stable pitch center.
HR used small vibrato in her performance of 'Le Cygne' is, because of its soft and subtle character. She gave the first note and the fourth note more vibrato than the other notes of the
first bar. In the second bar, she played the a' intense and relaxed towards the end of the sub-phrase. The next e' got considerable vibrato, the following eighth notes did not get vibrato,
but "bellies", and the last note got extra vibrato. Dependencies between vibrato and other performance aspects are, according to HR: vibrato rate and extent increase with the
resistance of a tone, vibrato rate increases with the loudness of tone and is influenced by the rhythm of the accompaniment.
Tenor
According to AO, a singer will naturally sing with vibrato if he or she breaths correctly and the air flows fluently. So, AO does not produce vibrato, instead he let it come naturally as
a result of natural singing. In the recording session, he had to sing 'Le Cygne' on a single vowel, so without text. He found this a bit unnatural. AO sang the entire piece with the same
vibrato, the only differentiation that he made is one of stopping or starting the vibrato. The first measure, he performed legato, which naturally included vibrato. The long a' of the

next measure got vibrato only halfway. He decreased loudness towards the end of measure two, to start anew at the e' in the next measure. Then the music built up (loudness) towards
the high b''. The eighth notes hardly got any vibrato, because vibrato is too slow. Vibrato may become slower with increasing pitch height and faster or wider with increasing
loudness.
Theremin
A theremin is an electronic instrument controlled by moving both hands towards two antennas. The left hand determines the loudness and the right hand controls the pitch of the
electronically generated tone. According to LK, finger positions include all positions between a closed hand (finger position 0, relative low tone) and an open hand (relative high
tone). She makes vibrato by moving her hand to the left and to the right, which constitutes one vibrato cycle. The start is at the minimum pitch, which equals, according to LK, the
perceived pitch of the note. The general vibrato principle is to let the vibrato and volume change together. This means that a note starts soft and without vibrato and then builds up in
volume and vibrato. In 'Le Cygne', LK used lyrical vibrato, which is fast and wide and differs from melancholic, expressive, or nervous vibrato. The shorter notes in the piece did not
need much vibrato; longer notes did. The function of vibrato was expression. Special treatment of notes was done (however) by playing without vibrato. For example, LK gave the
long a' a long start without vibrato.
Violin
As RK told us, the vibrato on a violin is made by rotating the fingers of the left hand up and down the neck of the violin. This movement is a regular movement around a central pitch
and is controlled either by the fingers, hand or arm, or a combination of the three. RK himself has arm-vibrato. RK performed 'Le Cygne' with relative large vibrato, like a cello and
less like a violin. The function of vibrato was to color the tone. The first two measures were in his opinion quiet, while the second two measures were more intense. This, he wanted
to reflect in his performance: first measure: a calm and fluent movement; second measure: leaning on long a' and relaxation towards the end; third and fourth measure: in general
faster vibrato, leaning on e' and scale with equal intensity.
RESULTS DATA ANALYSIS

The collected note features (which include mean loudness, mean vibrato rate and mean vibrato extent per note) were analyzed in relation to the musical structure of the first phrase of
'Le Cygne'. Three structural descriptions were used in the analysis: a description of metrical stress, position of the note within a phrase and melodic charge.
Table 1
Coding of the notes within the analysis along three structural descriptions (metrical stress, phrase position and melodic charge).
Note Coding in analysis
Pitch Metrical Phrase Melodic

stress position charge

g'' 000
f#'' 215
b' 214
e'' 113
d'' 211
g' 210
a' 012
b' 314
c'' 1 2 (-) 2.5

e' 003
f#' 215
g' 310
a' 112
b' 314
c'' 2 1 (-) 2.5
d'' 311
e'' 213
f#'' 315
b'' 024
The metrical stress is related to the metrical hierarchy of 'Le Cygne' which is indicated in figure 1 and follows the metrical indication at the start of the piece and a hierarchical model,
such as described by Lerdahl & Jackendoff (1983). Metrical stress increases with metrical hierarchy (for the coding of individual notes see table 1) and the prediction would be that
vibrato rate and extent increase with metrical stress. For the phrase positions, we separated the start (first note), middle and end (last note) of each sub-phrase (see table 1). This is in
line with descriptions of rhythmic structure as groupings starting and ending with a structural downbeat (e.g., Cone, 1968) and with a common finding in performance literature that
performers tend to mark phrase boundaries (Palmer, 1989). The melodic charge of notes is coded according to Sundberg et al. (1991). Melodic charge increases with increasing
distance between the melody note and the tonic note of the prevalent key (G major). The prediction is that the vibrato rate and extent, as well as the loudness of notes increase with
increasing melodic charge. Each note is given a relative level of melodic charge (see table 1).
Below, we report the results of three different ANCOVA's. In each analysis, the independent variables are metrical stress (nominal), phrase position (nominal) and melodic charge
(continuous). The dependent variable is mean vibrato rate per note in the first ANCOVA, mean vibrato extent per note in the second ANCOVA, and mean amplitude per note in the
last ANCOVA.

Effect of musical structure on vibrato rate
The combined effect of metrical stress, phrase position and melodic charge on vibrato rate is significant for all instruments (11.8 < F (6) > 22.1, p < 0.0001), except for the tenor (p >
.05). This effect is strong for the cello, oboe, theremin, and violin (R squares are between 0.42 and 0.52).
The individual effect of meter is significant for the cello (F (3) = 10.8, p < 0.0001), oboe (F (3) = 10.7, p < 0.0001), theremin (F (3) = 8.8, p < 0.0001) and violin (F (3) = 17.1, p <
0.0001). The effect on vibrato rate is, for the oboe and violin, such that in average the vibrato rate increases with decreasing metrical stress. For the cello, the vibrato rate is in average
faster for notes on the metrically weakest position than for notes at other metrical positions. For the theremin, the vibrato rate is generally high for notes on a metrically weak
position, low for notes on the strongest metrical position and intermediate for notes with intermediate metrical stress. This is contrary to the prediction that vibrato rate and extent
increase with metrical stress.
The individual effect of phrase is significant only for the theremin (F (2) = 9.9, p = 0.0001). This effect of phrase position is notable in a slower vibrato rate at the end of phrases.
The individual effect of melodic charge is significant only for the violin (F (1) = 5.7, p < 0.02). The vibrato rate generally increases with melodic charge.
Effect of musical structure on vibrato extent

The combined effect of musical structure on vibrato extent is significant for all instruments (3.9 < F (6) > 15.3, p < 0.0001). The effect is strong for the violin (R square = 0.49), small
for the theremin (R square is 0.18) and intermediate for the cello, oboe and tenor (R square is between 0.28 and 0.33).
The individual effect of metrical structure is significant for the cello (F (3) = 6.7, p < 0.001), oboe (F (3) = 8.6, p < 0.0001), tenor (F (3) = 8.3, p < 0.0001), theremin (F (3) = 5.0, p <
0.005), and violin (F (3) = 12.5, p < 0.0001). For the cello, the effect of metrical stress is such that the most heavy and intermediate heavy notes get in average larger vibrato extent
than the notes that fall on a half-bar or an eighth note after beat. This may reflect a two level metrical preference that of the measure and the tactus. For the other instruments, the
vibrato extent generally increases with the metrical stress, except for the heaviest beats. Notes that fall on positions with highest metrical stress have in average intermediate to small
vibrato extent. This effect of increasing extent with increasing metrical stress may point to a communication of metrical level. The exception of the highest metrical level may reflect
other considerations for the start of a measure (such as "do not start with an accent") or may be due to conflicting considerations not taken into account in the analysis, or
experimental design, since there is no counter-balancing of side-effects.
The individual effect of phrase position on vibrato extent is significant for the cello (F (2) = 6.0, p < 0.005), oboe (F (2) = 4.2, p < 0.02), and violin (F (2) = 6.3, p < 0.005). For the
cello and oboe, the vibrato extent of notes at the start and end of the phrase is in average smaller than the vibrato extent of notes that fall in the middle of a phrase. For the violin, the
effect of phrase position is notable in a smaller average vibrato extent of notes at the end of a phrase. Small extent at the start and end and larger extent in the middle may reflect a
tension-relaxation strategy of a relative relaxed start, an increased tension in the middle, and resolution at the end.
The individual effect of melodic charge on vibrato extent is significant for the oboe (F (1) = 5.0, p < 0.05) and violin (F (1) = 44.9, p < 0.0001). In both cases, vibrato extent is
positively correlated with melodic charge.
Effect of musical structure on mean amplitude of notes

The combined effect of metrical stress, phrase position and melodic charge as independent factors and mean amplitude per note as dependent factor is significant for all instruments
(6.4 < F (6) > 27.2, p < 0.0001). This effect is strong for the theremin (R square = 0.60), intermediate for the oboe (R square = 0.36) and weakest for the cello, tenor and violin (R
square is around 0.27).
The individual effect of metrical stress is significant for the cello (F (3) = 6.7, p < 0.001), oboe (F (3) = 6.7, p < 0.001), and violin (F (3) = 10.8, p < 0.0001). For the cello, the
amplitude rises with decreasing metrical stress. For the oboe, notes at the two stronger metrical levels are performed in average louder than notes at the two weaker metrical levels.
For the violin, notes with weakest metrical stress are generally loudest, notes with strongest metrical stress and at quarter-note level are in average intermediately loud, and notes at
half-bar level are played in average softest.
The individual effect of phrase is significant for all instruments: cello (F (2) = 3.12, p < 0.05), oboe (F (2) = 6.6, p < 0.002), tenor (F (2) = 8.3, p < 0.001), theremin (F (2) = 26.0, p <
0.0001), and violin (F (2) = 3.6, p < 0.05). For the cello, the notes at the start of phrases are in average loudest, those in the middle of a phrase are in average intermediately loud and
endnotes are generally softest. For the oboe and the theremin, notes at the end of a phrase are generally softer than notes at other positions. For the tenor and oboe, the opposite is
case: endnotes are in average louder than other notes.
The individual effect of melodic charge is significant only for the oboe (F (1) = 39.8, p < 0.0001). For the oboe, the amplitude of notes rises with the melodic charge of notes.
Interrelation between amplitude, vibrato rate and vibrato extent

Vibrato rate and extent are significantly correlated negatively for the cello and tenor (r = -0.33 and -0.38 respectively). Amplitude and vibrato rate are significantly correlated for the
cello and theremin (r = 0.40 and 0.48 respectively). Amplitude and vibrato extent are significantly correlated for the oboe and the violin (r = 0.39 and 0.42 respectively).
COMPARISON BETWEEN INTERVIEW AND ANALYSIS

The cellist mentioned that he stresses notes by performing them with faster vibrato. Stressed notes also suggest higher amplitude, and indeed, there exists a positive correlation
between loudness and vibrato rate. The cellist also mentioned a dying away at the end of the phrase. This returns in the analysis as a general smaller vibrato extent and softer notes at
the end of phrases. The statement of the cellist that no note is performed the same, with equal vibrato is not confirmed.
The oboist mentioned that she performs the first and fourth note with much vibrato, the second with less vibrato and the third note with least vibrato. This grouping of note 1 and 4
against note 2 and 3 is captured by the description of metrical stress. In the analysis, the vibrato extent behaves as suggested: larger average vibrato extent at notes at metrically strong
positions than at metrically weak positions. Notes with heaviest metrical stress excepted: they generally get small vibrato extent. So, the oboist's mention of the long a', e' and b'' that
all fall on metrical heaviest positions to have extra vibrato is contradicted. The oboist's suggestion about relaxation towards the end of the phrase returns in the analysis as a relative
small vibrato extent and relative soft notes at the end of phrases. The oboist mentioned that she did not play the 8th notes with vibrato, but gave them bellies. The data analysis
suggests that these bellies (which concern notes at metrically weak positions) have a relative fast vibrato rate. The intuition of the oboist that vibrato rate will increase with the
loudness of notes is not confirmed. Instead, the vibrato extent is found to correlate with loudness. This might indicate that vibrato rate and extent are confused in the way suggested
by Vennard (1967): an increase in vibrato extent is heard as an increase in vibrato rate.
Although, the tenor mentioned that he sang 'Le Cygne' with the same vibrato throughout, the analysis did show some consistent differences in mean vibrato extent between notes.
Whether this is due to some control of vibrato extent by the tenor or by subtly controlling the starting and stopping of vibrato, as the tenor mentioned, is unclear; the analysis is not
suited to differentiate between causes of vibrato variability. The tenor suggested that he decreased loudness towards the end of the first phrase and increased loudness towards the end
of the second phrase. The analysis only tested for one general kind of treatment in respect to phrase position and confirmed the (stronger) increment of loudness at the end of the
phrase. The suggestion of the tenor that vibrato is too slow for the eighth notes was not directly tested. The analysis did, however, show that the tenor sometimes used vibrato on the
eighth notes, though with in average smaller vibrato extent than on longer notes.
The theremin mentioned that she changes vibrato and amplitude together. In the analysis this was only confirmed for the vibrato rate and amplitude, which did correlate positively.
She also mentioned that short notes do not need much vibrato, but longer notes do. This treatment was partly confirmed in the analysis, which indicated a smaller vibrato extent for
notes at a small metrical level and larger vibrato extent at notes with stronger metrical stress. It was contrasted by the high vibrato rate at note positions with weakest metrical stress,
which only included eighth notes.
The violin mentioned that he performed the first phrase of 'Le Cygne' in a calm and fluent way and the second halve more intense. This could explain the effect of metrical stress in
which notes at smallest metrical level were performed loudest and with highest vibrato rate. This level was only present in the second phrase. Another interpretation of the violin was
that he leaned on the long a' and e' and relaxed towards the end of the (first) phrase. The analysis did not confirm this interpretation. Instead, the effect of phrase position on loudness
was one of louder notes at the end of the phrase. The effect of metrical position was one of general intermediate to slow vibrato rate, extent and loudness at high metrical levels.

DISCUSSION
The most articulate answers of the musicians in the interviews concerned the production of vibrato on the instrument, the general characteristics of vibrato, such as its form and pitch,
and the general function of vibrato, such as production of a warm sound, expression or legato performance. When asked, the musicians also indicated a way to use special vibrato to
accentuate certain notes. Surprisingly, this special treatment often consisted of starting notes without vibrato. The musicians were explicit about their expressive intentions, such as
phrasing, contrasting first and second half, and tension and relaxation of the music. They suggested related variations in vibrato.
The strongest results from the analysis of the vibrato data concerned the general considerably strong effect of musical structure on amplitude, vibrato rate and extent, the general
consistency of vibrato characteristics over repetitions that is implied by this strong effect, and the limited relatedness between amplitude, vibrato rate and extent. Interestingly, all
instruments had a significant relation between metrical stress and vibrato rate, while phrase position was for all instruments significantly related to amplitude. The specific direction
of the effects differed between expressive aspects (e.g., amplitude, vibrato rate, and vibrato extent) and between instruments. The suggestion is that different expressive means were
used for different purposes.
In general, it is clear that only few aspects that are mentioned by the musicians return in the analysis, and, vice versa, only few clear results from the analysis are mentioned by the
musicians. This is not entirely surprising, since only part of expert behavior is conducted consciously and is therefore primed to be reported verbally (see Ericsson & Simon, 1980).
The performances are instead a result of both automated and consciously directed processes. Nevertheless, there are two inconsistencies between analysis and interview results that
are of direct importance for the study of expressive behavior. First, the musicians talk about expressive aspects of the performance in a sequential way, while the analysis tests for
similar expressive treatment of notes with similar structural descriptions. In some cases, the sequential viewpoint is easily translated into a structural one. In other cases, this is less
easily done and may only lead to confusion. In other words, a sequential viewpoint may be more beneficial. Second, the difference in viewpoint is especially strong if special
treatment of vibrato is concerned. While expressive behavior is theoretically most often related to an intensification of vibrato rate or extent (see, e.g., Sundberg et al., 1991), the
musicians actually mention to play without vibrato to mark a special note.
Acknowledgements
This research has been made possible by the Netherlands Organization for Scientific Research (NWO) as part of the "Music, Mind, Machine" project. We would like to thank
Henkjan Honing for his helpful comments and Huub van Thienen en Rinus Aarts for their help in the data collection and data processing in an earlier stage of the project.
References
Castallengo, M., Richard, G., & d'Alessandro, C. (1989). Study of vocal pitch vibrato perception using synthesis. Proceedings of the 13th International Congress on
Acoustics. Yugoslavia.
Clarke, E. F. (1988). Generative principles in music performance. In J.A. Sloboda (Ed.), Generative processes in music. The psychology of performance, improvisation
and composition (pp. 1-26). Oxford: Science Publications.
Cone, E. T. (1968). Musical Form and Musical Performance. New York: Norton.
Ericsson, K. A., & Simon, H. A. (1980). Verbal Reports as Data. Psychological Review 87 (3), 215-251.
Gleiser, J., & Friberg, A. (in press). Vibrato rate and extent in violin performance. Proceedings of the 6th ICMPC.
Horii, Y. (1989a). Acoustic analysis of vocal vibrato: A theoretical interpretation of data. Journal of Voice, 3, (1), 36-43.
Horii, Y. (1989b). Frequency modulation characteristics of sustained /a/ sung in vocal vibrato', Journal of Speech and Hearing Research, 32, 829-836.
King, J. B., Horri, Y. (1993). Vocal matching of frequency modulation in synthesized vowels. Journal of Voice, 7, 151-159.
Lerdahl, R., & Jackendoff, F. (1983). A Generative Theory of Music (pp. 69-104). Cambridge, MA: MIT Press
Meyer, J. (1992). On the Tonal Effect of String Vibrato. Acustica : journal international d'acoustique, 76 (6), 283-291.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331-46.
Shonle, J. I., & Horan, E. (1980). The pitch of vibrato tones. Journal of the Acoustical Society of America 67, 246-52
Seashore, C. E. (1932). The Vibrato. Iowa City, Iowa : University of Iowa.
Seashore, C. E. (1938). Psychology of Music. NY and London: McGraw-Hill Book Company, Inc.
Shipp, T., Doherty, T., & Haglund, S. (1990). Physiologic factors in vocal vibrato production. Journal of Voice 4, 300-304.
Sundberg, J. (1978). Effects of the vibrato and the 'singing formant' on pitch. Journal of Research in Singing, 5 (2), 5-17.
Sundberg, J. (1979). Maximum speed of pitch changes in singers and untrained subjects, Journal of Phonetics, 7, 71-79.
Sundberg, J. (1987). The Science of the Singing Voice. Illinois: Northern Illinois University Press.
Sundberg, J., Friberg, A., & Frydén, L. (1991). Common Secrets of Musicians and Listeners: An analysis-by-synthesis Study of Musical Performance. In P. Howell, R.
West, & I. Cross (Eds.), Representing Musical Structure (pp. 161-197). London: Academic Press.
Vennard, W. (1967). Singing, the mechanism and the technic. New York: Carl Fischer, Inc.
Back to index

Aoyagi
PERCEPTION OF MODALITY IN ARABIC MAQAM: EXPERIMENTS WITH MULTIPLE METHODS AND

SUBJECT GROUPS
Takahiro Aoyagi
hirotaoyagi@hotmail.com
Background:
In studies related to tonal hierarchy, many have used probe-tone method. In

probe-tone studies, a tone's importance is equated with its degree of fitness
to the context. However, the experimental task seems to be open to
interpretation, and a tone's importance can be found differently by subjects,
especially of another cultures. We should be able to examine tonal hierarchy in
another methods.
Aims:
A main purpose of the present study is to identify tonal hierarchy and the
defining elements of a non-Western mode. The differences in cognitive schema of
native and non-native subjects are examined.
This paper proposes a new method of measuring tonal hierarchy. This method
allows less of subjective interpretation and is robust against
misinterpretations of the experimental task.
method:
There were two experiments of distinct methods, in which both Arab and non-Arab
musicians participated.
In Experiment 1, subjects compared a model and its lure. A model conforms to

the scale structure of the maqam Rast, a mode in Arab music. A lure was
constructed to have one of its scale tones differ by a quartertone. The
subjects rated their degree of dis/similarity. In Experiment 2, the probe-tone
method was used for the same maqam Rast stimuli.
Results:
The lower scale degrees were generally perceived more salient. Seemingly, the
relationships among them are more fundamental and contribute to the sense of
modality. The response patterns of non-Arab (Western) musicians conform to the
diatonic scale structure.
Conclusions:
The differences in response patterns of native and non-native subjects may be

the difference in their sensitivity to the mode structure, i.e., cognitive
schema.
There can be distinct ways of defining the importance of tones and

understanding tonal hierarchy. The newly introduced model-lure direct
comparison method can be used as a confirmatory/complimentary method to the
probe-tone.
file:///g|/Sun/Aoyagi.htm (1 of 2) [18/07/2000 00:31:25]

Aoyagi
Back to index
file:///g|/Sun/Aoyagi.htm (2 of 2) [18/07/2000 00:31:25]

Luzchap1
Proceedings paper
OFF BY HEART: EXPERT SINGERS' MEMORISATION STRATEGIES AND RECALL

FOR THE WORDS AND MUSIC OF SONGS
Jane Ginsborg
Department of Psychology, University of Manchester, UK
Introduction
Two questions provided the impetus for the present study of expert singers' memorisation strategies
and their recall for the words and music of songs. The first question relates to how expert singers, as
opposed to instrumental musicians, practise and memorise songs. The second question relates to the
interaction of words and music in memory.
Although there is a wealth of anecdotal and pedagogical literature on singers' practising and
memorising strategies, dating from the 18th century, there has been very little empirical research
focusing on what singers actually do when they practise and memorise. This is in contrast to the
literature on instrumental musicians: their practising activities have been explored by music
psychologists and educationalists in a series of case studies (e.g. Miklaszewski, 1989; Chaffin and
Imreh, e.g.1994, 1996ab; Nielsen, 1997; Lehmann & Ericsson, 1998); effective memorising strategies
for pianists and other instrumentalists have also been investigated (e.g. Rubin-Rabson, 1937; Ross,
1964; Nuki, 1984; Hallam, 1997). How, then, are singers' strategies similar to those of
instrumentalists, and how are they different, given that singers perform words as well as music from
memory? Expert musicians practise more, for example, than novice musicians (Ericsson,
Tesch-Romer and Krampe, 1990; Ericsson, Krampe & Tesch-Romer, 1993; Sloboda, Davidson, Howe
and Moore, 1996). However expert musicians' practice strategies differ from those of novice
musicians and therefore the practice strategies of musicians must change with the development of
expertise (Gruson, 1988; Hallam, 1994, 1997).
It has been shown that the words and music of songs are integrated in memory (e.g. Serafine, Crowder
& Repp, 1984) and that music enhances recall for text (e.g. Rubin, 1977; Hyman & Rubin, 1991;
Wallace, 1994). However the methods used by these researchers required largely
non-musically-trained listeners to make familiarity judgments or to write down the words of texts
including songs with music. Expert singers are accustomed to memorising and performing songs from
memory on a daily basis: what can their strategies tell us about the extent to which they integrate
words and music at different stages of the memorising process? What can they tell us about the extent
to which music influences recall for words, or vice versa?
The research programme took the form of a pilot interview study and three main studies, one
observational and two experimental, which will be outlined in turn.
Interview study
Semi-structured pilot interviews were carried out with five professional singers. They reported
file:///g|/Sun/Ginsborg.htm (1 of 15) [18/07/2000 00:31:31]

Luzchap1
learning the words and music of songs separately before combining them. Their memorising strategies
were primarily for the words rather than the music. These included studying the meaning of the text,
translating it into English if necessary, as well as memorising the words phonetically, by rote. Overall,
the singers described a three-stage process: 1) initial study, 2) learning and 3) deliberate, rather than
implicit, memorisation. These findings echo those of an interview study carried out by Wicinski
(1950, cited by Miklaszewski, 1989b, p.96) who interviewed ten eminent pianists of the day on the
topic of preparing pieces for performance from memory. Seven reported that initial study of the whole
piece formed a first stage; this was followed by a second stage in which they worked on technical
difficulties, while the third and final stage consisted of rehearsals of the whole piece in order to
perfect the final 'interpretation'.
Observational study
The aims of the observational study were, first, to determine the practising and memorising strategies
available to singers and to compare them with those used by instrumental musicians; second, to
compare the strategies used by singers of different levels of expertise; third, to try to distinguish more
effective from less effective strategies.
According to the literature on pianists of varying levels of expertise practising and learning music,
their activities include 'analytic pre-study' and 'mental rehearsal' (Rubin-Rabson, 1937), playing with
hands together and separately (Rubin-Rabson, 1937; Gruson, 1988; Miklaszewski, 1989), playing at
different speeds (Miklaszewski, 1989) and repeating single notes, bars and sections of music (Gruson,
1988; Miklaszewski, 1989). Singers, too, might well carry out 'analytic pre-study' and 'mental
rehearsal'; indeed some of the activities reported by the interview respondents could be defined as
such. Singers, too, might well sing at different speeds and repeat different portions of the music to be
learned. However, playing with hands together and separately is clearly not an option for singers.
Nevertheless, a comparison might be made, for example, between the pianist learning to play the
music for the right hand and the music for the left hand separately, and the singer learning words and
music as independent components of a song.
To what extent would the strategies used by singers of different levels of expertise at different times
during the learning and memorising process be different, and to what extent would they be similar?
We know that musicians' practising strategies change with the development of expertise. Would this
be the case also for singers?
For example, in an observational study of expert, intermediate and novice pianists who learned three
new pieces over the course of ten practice sessions, Gruson (1988) found that in the first session
experts were more likely than the other groups to repeat sections of a piece rather than single notes or
bars, to play with left and right hands separately, to spend more time on each piece and to use
'self-guiding' speech. That is, they began by dividing the task into discrete units and working
methodically and systematically on them. As the sessions progressed, experts spent more time playing
uninterruptedly, repeated fewer notes and slowed down less. While the novice and intermediate
pianists increased their speed of playing steadily throughout the sessions, the experts decided on their
final tempo at an early stage - which tended to be faster than the tempi used by the other groups - and
achieved it by the seventh session. This suggests that they had clearer goals and had a better idea of
how to meet them than the less expert pianists. In other words they had better metacognitive
awareness, which seems to have been confirmed in the course of interviews during which they were
able to describe the varied and complex nature of their strategies.
Hallam's (1997) phenomenographic study of expert and novice instrumental musicians explicitly
investigated their memorising strategies. Like the expert pianists in Gruson's study, the expert

Luzchap1
musicians in Hallam's study showed more metacognitive awareness than novice musicians insofar as
the experts were more likely to report using analysis in the course of memorisation. This involved, for
example, noting features of the material to be remembered such as key changes, harmonic structure,
the length of rests and difficult 'exit points'. However they sometimes also used the technique of
memorising largely without conscious awareness, and then linking short memorised sections together
to form longer sections until the whole piece was memorised. Most of the expert musicians reported
combining the two approaches as appropriate to the demands of the particular music to be memorised.
To expand on the aims of the present study as outlined above, then, the first was to find out what
activities would be undertaken by singers that might be directly or indirectly comparable to those of
instrumentalists, identified by Gruson. The second was to compare the activities carried out by singers
of different levels of expertise over the course of the learning and memorising process. For example,
would they work on different lengths of 'practice unit', as Gruson found that pianists repeated longer
or shorter sections according to their level of expertise? Third, would expert singers be as aware of
their goals for practice and memorisation and how to meet them, as the expert instrumentalists in
Gruson's and Hallam's studies?
13 singers (students, amateurs and a group of experienced professional singers who had not taken part
in the pilot interviews) were asked to learn and memorise the same new song, over the course of six
15-minute practice sessions, and to provide a concurrent verbal commentary. The practice sessions
were audio-taped, and the tapes were transcribed and analysed.
The strategies used by the singers were defined initially as 'modes of attempt'. They included singing
the words and music together, either reading from the score or singing from memory, speaking the
words without the music and playing or singing the music without the words: playing or vocalising
the melody; playing the accompaniment; counting beats aloud. As suggested earlier, these modes of
attempt could not be compared directly with the strategies of instrumental musicians identified in
Gruson's and Hallam's studies. However, some similarities were found between the singers in the
present study and Gruson's expert pianists in that the singers - whether experts, amateurs or students -
chose to work on practice units that gradually increased in length and corresponded to compositional
units such as phrases and verses. As shown in Figure 1, the expert singers were differentiated from the
other groups in that they made more attempts using different modes of attempt and were more likely
than the other groups to speak the words, count aloud and sing from memory.

Luzchap1
The expert singers appeared to be more goal-oriented, in that, as reported by the interview
respondents, they memorised deliberately and from the beginning of the practice sessions. This is
illustrated in Figure 2.
Another strategy that distinguished them from the other groups involved focusing on the words
separately from the music. However, although the interview respondents had suggested that this was a
strategy to be used in the earliest stages of familiarisation with a song, the expert singers in the present
study were more likely to speak the words aloud, a way of studying the words and music separately,

Luzchap1
once they had already begun memorising them together.
The aim of distinguishing more effective from less effective higher-level strategies was met initially
by defining the 'best' memoriser and the 'worst' memoriser. The 'best' was the first of the 13 to sing the
whole song entirely accurately from memory. The 'worst' was the singer who took longest to
memorise and made the most errors when she sang from memory. The verbal commentaries provided
by these two singers were then analysed along with their practice and error data.
The 'best' memoriser sang the words and music of the song together rather than separately. She started
memorising early and tested her memory throughout the practice sessions. She worked on a variety of
lengths of practice unit. She made plans and implemented them, monitored and corrected her errors,
and explicitly evaluated her practice. Her approach to practising and memorising thus resembled the
approaches made by the expert pianists described by Gruson (1988): her strategies were varied and
complex; her verbal commentary was detailed and 'self-guiding'. All in all, she appeared to possess a
high degree of meta-cognitive awareness. In contrast, the 'worst' memoriser implemented plans,
monitored errors and evaluated her practice to a much lesser extent; she preferred to sing the music
only, started to memorise comparatively late and consistently repeated the whole song rather than
smaller sections.
The hypothesis that the strategies of the 'best' memoriser were indeed more effective than those of the
'worst' memoriser, in that they would consistently produce better performance outcomes, remains to
be tested. One way to do this would be to undertake an intervention study in which participants of
equivalent levels of expertise would memorise new songs using both types of strategies identified.
Experiment 1
The results of the two analyses of data gathered in the observation study included two apparently
contradictory findings. The professional singers who took part in the pilot interviews reported learning
the words and music of songs separately, and the expert singers in the observation study made more
attempts on the words separately from the music than the other groups did. In contrast, the 'best'
memoriser in the second analysis preferred to sing the words and music together rather than
separately. An experiment was therefore carried out in order to find out if there would be any
advantage, in terms of accuracy and confidence in performance from memory, of memorising words

Luzchap1
and music separately, prior to memorising them together, or memorising them together throughout the
whole memorising process.
A new unaccompanied song was constructed and 60 singers were asked to memorise it. The melody
was a folk song, 'tweaked' slightly to remove direct repetitions within the melody, and the text was the
second verse of a not-very-well-known poem. Half the participants were expert memorisers and half
were novice memorisers of songs. They were randomly divided between three conditions. In one
condition they memorised the words first, then the music, and then the words and music together. In
the second condition they memorised first the music, then the words, and then both. In the third
condition they memorised the words and music together all the time. At the end of the memorising
phase, which lasted 20 minutes, the participants were asked to sing the song from memory. They were
then interviewed about their musical education and experiences for ten minutes. At the end of the
interview they were asked to sing the song again. Their word errors, music errors and hesitations in
both performances were scored and analysed. Hesitations represented pauses either because the singer
had forgotten what came next, or to correct errors.
There proved to be no statistically significant differences between the expert and novice memorisers
either in terms of accuracy or confidence. There were no differences between the participants who had
memorised the words and music of the song separately and together in terms of accuracy. However
the participants who memorised the words and music together made significantly fewer hesitations
than the participants who memorised the words and music separately, as shown in Figure 4.
If I were asked to offer practical advice to singers on the basis of this finding, I would suggest that,
when time is short, singers are better advised to memorise words and music together than to memorise
them separately if their aim is to 'keep going'.
Why were no significant differences found between participants with more and less experience of
memorising songs? It may be that singers do not become expert memorisers simply by memorising
many songs; expertise is acquired as a result of deliberate practice and memorising, per se, is rarely
the focus of most singers' practising or memorising activities. On the other hand, many of the
participants in this study who were deemed 'novices', on the basis that they were choral singers who
rarely memorised vocal music with words, were also instrumental musicians with experience of
memorising music.

Luzchap1
In order to test the hypothesis that the ability to memorise songs accurately is related to the ability to
learn songs accurately, which in turn is a skill acquired through the development of musical expertise,
the participants hitherto deemed expert and novice memorisers were therefore re-grouped on the basis
of the levels of musical education they had attained. 35 participants had 'high', and 25 participants had
'low' levels of musical expertise. Although there seemed to be no effects of experience of memorising
on accurate memory for the song there turned out to be significant effects of musical expertise. That
is, the more musical training a singer had, and therefore the more musically expert he or she was, the
easier it was to learn the song accurately and therefore to recall it accurately.
What was more interesting, from a theoretical as well as a practical point of view, is that as well as the
significant effect of expertise on accurate memory for the music, there was also a significant effect of
musical expertise on accurate memory for the words: participants who made fewest music errors also
made fewest word errors. Furthermore, memorising the words and the music of a song together
proved a more effective strategy for these comparatively more expert musicians, in terms of accuracy,
than memorising them separately, as shown in Figures 5 and 6.

Luzchap1
These findings complement the evidence I have already presented to suggest that singers who
memorise words and music together are more confident when they perform from memory. It also
supports previous findings by Rubin (1977), Hyman and Rubin (1990) and Wallace (1994) that recall
for words is enhanced by music.
Experiment 2
The final experiment, undertaken with the help and encouragement of Anders Ericsson and Andreas
Lehmann, investigated three questions. First, are the words of songs recalled primarily in terms of
their semantic meaning, as suggested by the professional singers who took part in the pilot interviews?
Or are they recalled in terms of their 'structural' qualities (defined by Wallace, 1994, as alliteration,
assonance, prosody, rhythm and rhyme) which are emphasised by their musical setting? In other
words, how important is it to understand the meaning of the words of a song for the purposes of
memorising them? This question does not refer to the 'interpretation' of a song in performance, for
which understanding the meaning of the words of the song is clearly paramount. Rather, is it possible
to explain the ability of singers to sing from memory, in languages they do not understand or speak, in
terms of the relationship of the words to the music to which they are set?
We addressed this question by asking expert singers to memorise songs with semantically-meaningful
words and non-semantically-meaningful words, in this case digit strings, and to perform them from
memory within a variety of constraints. If the words of songs are memorised and recalled in terms of
their semantic meaning, songs with non-semantically-meaningful words should be harder to memorise
and recalled much less easily and less accurately than songs with semantically-meaningful words.
Second, how separable are the words and music of newly-memorised songs when they are recalled?
Serafine, Crowder and Repp (1984) played folksongs with interchangeable words and melodies to
non-musically-trained listeners and asked them to rate the words and melodies for familiarity when
they were played a second time with either the same or different melodies and words. They found
what they called an 'integration effect' for the words and music of songs. That is, listeners
remembered the songs better when they heard both the words and the music for a second time, than
when they heard familiar words set to a different melody or a familiar melody with different words.
We wanted to know if there was an integration effect when singers are asked to recall the words and

Luzchap1
music of songs as well as when listeners are asked to recognise the words and music of songs. We
addressed this question by comparing recall for the texts of the songs with and without melody, and
the extent to which text and melody 'triggered' recall for each other.
Third, what is the relationship between speed of acquisition for songs and effective memorisation?
Lehmann and Ericsson (1995) propose that musicians hold abstract mental representations of the
music they perform that underlie both memorisation and performance skills. This proposal is
supported by their finding that pianists who memorise quickly are able to carry out complex tasks
from memory, such as transposition. We used Lehmann and Ericsson's memorising paradigm to
explore the relationship between speed of acquisition and the ability to 'manipulate' the texts and
melodies of the songs once they had been memorised. This involved showing each participant the
musical score of the song to be memorised and simultaneously playing a recording of the melody and
accompaniment. The score was then removed and the participant was asked to sing as much of the
song as he or she could remember to the recorded accompaniment. These pairs of trials, singing first
with the score and then without, were repeated until the participant was able to sing two consecutive
accurate performances of the whole song from memory. Speed of acquisition, then, was measured by
the number of pairs of trials preceding memorisation to criterion: the fewer pairs of trials the
participant needed, the faster the song was memorised.
20 singers with high levels of musical expertise, most of whom had participated in the first
experiment, took part in this study. Each carried out two sessions in which they memorised an
unfamiliar song, one with a word-text and one with a digit-text, to different but matched melodies.
Once each song was memorised to criterion they then performed a series of 15 tasks designed to
assess the extent to which they could retrieve the text and melody independently, modify the text and
melody, and respond to different types of cues. The experimental sessions were recorded on
audio-tape and the participant's performance on each task was transcribed and scored. Measures
included accuracy, latency and task duration.
The first question was whether the words of newly-memorised songs are memorised and recalled
primarily in terms of their semantic meaning or as a component of the melody. We predicted that
participants would take more time to memorise songs with digits instead of words than songs with
semantically-meaningful words in English, and that if understanding the meaning of the words of the
song were crucial for recall then the songs with word-texts would be recalled more easily and more
accurately than the songs with digit-texts. Eight post-memorisation tasks measured recall for text.
Figure 7 shows that, as predicted, songs with digit-texts took longer to memorise.

Luzchap1
However it was not the case that participants recalled the songs with word-texts consistently more
accurately and faster than digit-texts. In fact, as shown in Figure 8, digit-texts were recalled more
accurately than word-texts in one task, and word-texts were not recalled any more accurately than
digit-texts in the other tasks.
On the other hand, as shown in Figure 9, recall was slower for digit-texts when participants were

Luzchap1
asked to recall the text of the whole song at speed both with and without the melody, and in two other
tasks.
So understanding the meaning of the song clearly does play an important part in recall, though
perhaps not as much as is sometimes thought.
The second question was how separable the words and music of newly-memorised songs are when
they are recalled? Again, the answers were equivocal. We found that, as shown in Figures 10 and 11,
participants recalled digit-texts but not word-texts both faster and more accurately with than without
the music.

Luzchap1
The results of the cueing tasks, however, showed that both types of text and melody are more likely to
be encoded and retrieved as integrated than independent components. Although participants were not
able to retrieve the appropriate text when cued with a fragment of melody any faster than they were
able to retrieve the appropriate melody when cued with a fragment of text, they found it much harder

Luzchap1
to sing the appropriate melody without also singing the text than they did to speak the words without
also singing the melody.
The final question concerned the relationship between speed and effectiveness of memorisation. We
predicted significant correlations between speed of acquisition and performance, such that the faster
participants memorised the quicker and more accurately they would perform on the 15 tasks devised
to show different aspects of memory for the song. We found significant correlations between speed of
acquisition and performance on seven tasks, and also memorising ability as measured, in terms of
accuracy and confidence, in the previous experiment. Six of these tasks, however, measured speed
rather than accuracy of recall, for example speaking the words of the whole song at speed; retrieving
fragments of the text and fragments of the music 'reversed', responding to cues, both 'forward' and
'reverse', and singing the phrases of the song in reverse order. The seventh task involved singing the
pitches of the melody of the song only, without rhythm, to the regular beat of a metronome.
Lehmann and Ericsson (1995) argue that the ability to form rapid mental representations of a piece of
music, measured as speed of acquisition, underlies the ability to produce performances from memory
that are both stable and flexible, as exemplified by their transposition task. It may well be, then that
this same ability also underlies the ability to perform from memory at speed and to carry out certain
tasks involving 'manipulation' of the memorised song. On the other hand, it may be that performance
on the tasks that did not correlate with speed of acquisition are better explained in terms of the
automatisation of performance resulting from the rote memorisation of text and melody together.
These tasks included the accurate performance of the song with accompaniment and at the same
tempo as that at which the song was originally memorised, as required in circumstances more usual
than that of this study.
Conclusion
How are singers' strategies similar to those of instrumentalists, and how are they different? Because
singers have words to memorise as well as music, they have some options for practice that are
unavailable to instrumental musicians. On the other hand, like instrumental musicians, singers choose
to practise 'chunks' that correspond to units of the compositional structure of the song they are
learning. There is also evidence that singers with higher levels of expertise, like instrumental
musicians with higher levels of expertise, show more metacognitive awareness, in that they use a
wider variety of strategies than less expert singers and are more likely to memorise deliberately rather
than implicitly.
Although the small sample of experienced professional singers who took part in the pilot interview
study reported learning words and music separately in the initial stages of memorisation, the evidence
from the observational study suggests that singers are in fact more likely to practise singing the words
and music together. The expert singers spoke the words of the song aloud more than less expert
singers did, suggesting that they were focusing on the semantic meaning of the words of the song, but
they did so much later during the memorising process than might have been predicted from the
interview data.
In fact closer inspection of the observational data from the most and least 'effective' memorisers
showed that singing the words and music together rather than studying them separately was associated
with earlier memorising and more accurate performance from memory. This finding was borne out by
the results of the first experiment: singers of varying levels of musical expertise were significantly
more confident in their performances from memory, and singers with high levels of musical expertise
were also significantly more accurate, both in their recall for words and music, when they had
explicitly memorised words and music together rather than separately.

Luzchap1
These studies, taken together, have practical implications for singers. Although they might hold
implicit theories about what constitutes efficient practice and memorisation, singers either fail to
practise according to their theories, or their theories are wrong. In other words experienced singers,
even those who can also be defined as expert musicians, do not necessarily practise and memorise as
efficiently as they might: memorising words and music together is clearly a more effective strategy
than memorising them separately.
What does this tell us about the extent to which music influences recall for words, or vice versa? The
second experiment provides evidence to support an 'integration effect' for recall as well as recognition
memory (e.g. Serafine et al., 1984). Further, while I would not wish to discount entirely the effect of
semantic memory for the meaning of the words of songs, it is worth noting that, when memorised and
performed from memory with music, recall for semantically-meaningless texts was in many ways no
different from recall for semantically-meaningful words. This supports the notion that music
structures and therefore enhances recall for words (e.g. Wallace, 1994). These findings show also that
studies involving the participation of expert singers can be a useful way of exploring the interaction of
words and music in memory, and provide a basis for further research.
References
Chaffin R. and Imreh, G. (1994). 'Memorising for piano performance: A case study of expert
memory'. Paper presented at 3rd Practical Aspects of Memory Conference at University of Maryland,
Washington, DC, July/August 1994
Chaffin, R. and Imreh, G. (1996a). 'Effects of difficulty on expert practice: A case study of a concert
pianist'. Poster presented at the 4th International Conference on Music Perception and Cognition,
McGill University, Montreal, Canada, August 11-15
Chaffin, R. and Imreh, G. (1996b). 'Effects of musical complexity on expert practice: a case study of a
concert pianist'. Poster presented at meeting of the Psychonomic Society, Chicago, 3 November, 1996
Gruson, L. M. (1988). 'Rehearsal skill and musical competence: does practice make perfect?' In J. A.
Sloboda (ed.), Generative Processes in Music: the Psychology of Performance, Improvisation and
Composition. London, Oxford University Press
Hallam, S. (1994). 'Novice musicians' approaches to practice and performance: learning new music',
Newsletter of the European Society for the Cognitive Sciences of Music, 6, 2-10
Hallam, S. (1997). 'The development of memorisation strategies in musicians: implications for
education', British Journal of Music Education, 14 (1), 87-97
Hyman, I. E. and Rubin, D. C. (1990). 'Memorabeatlia: a naturalistic study of long-term memory',
Memory and Cognition, 18 (2), 205-214
Lehmann, A. C. and Ericsson, K. A. (1995). 'Expert pianists' mental representation of memorised
music'. Poster presented at the 36th meeting of the Psychonomic Society, Los Angeles, California,
November 10-12
Lehmann, A. C. and Ericsson, K. A. (1998). 'Preparation of a public piano performance: the relation
between practice and performance'. Musicae Scientiae, 2 (1), pp. 67-94
Miklaszewski, K. (1989b). 'A case study of a pianist preparing a musical performance', Psychology of
Music, 17, pp. 95-109

Luzchap1
Miklaszewski, K. (1995). 'Individual differences in preparing a musical composition for public

performance'. In: M. Manturzewska, K. Miklaszewski, A. Bialkowski (eds.) Psychology of Music
Today. Proceedings of the International Seminar of Researchers and Lecturers in the Psychology of
Music, Radziejowice, Poland, 24-29 September, 1990. Warsaw, Fryderyk Chopin Academy of Music
Rubin, D. C. (1977) 'Very long-term memory for prose and verse', Journal of Verbal Learning and
Verbal Behavior, 16, 611-621
Serafine, M. L., Crowder, R. G. and Repp, B. H. (1984). 'Integration of melody and text in memory
for songs', Cognition, 16, 285-303
Wallace, W. T. (1994). 'Memory for music: effect of melody on recall of text', Journal of
Experimental Psychology: Learning, Memory and Cognition, 20 (6), 1471-1485
Back to index

MODULATED RHYTHMS
Proceedings paper
MODULATED RHYTHMS
-A New Model of Rhythmic Performance-
Carl Haakon Waadeland
Trondheim Conservatory of Music, Faculty of Arts,
Norwegian University of Science and Technology,
7491 Trondheim, Norway
carl.haakon.waadeland@hf.ntnu.no
1. Introduction:
...It makes no diff'rence if it's sweet or hot,-

Just give that rhythm ev'ry thing you got.
It don't mean a thing if it ain't got that swing,-...
These lyrics by Irving Mills to the famous composition by the jazz legend Duke Ellington, "It Don't Mean a Thing If It
Ain't Got That Swing", have in a quite fundamental way inspired the present work. 1 In particular this is true with
reference to the statement: "Just give that rhythm ev'ry thing you got." - When making a musical performance swing, the
performing musician is giving "life" to the rhythm through a process by which (more or less) conceptualized structural
properties of rhythm are transformed into live performances of rhythm. Such processes of rhythmic "transformation" are
often denoted ´timing´, or éxpressive timing´ (cf. Clarke, 1999, pp.489-490). Sounding musical consequences of
expressive timing are "artistic deviations", or, "deviations from the exact", as denoted by C.E. Seashore (cf. H.G.
Seashore, 1937, p.155). Various such deviations have been studied in empirical rhythm research by investigating
systematic variations of durations, SYVARD (see, e.g. Bengtsson, Gabrielsson & Thorsén, 1969; Bengtsson, 1974;
Bengtsson & Gabrielsson, 1977, 1983), and are also discussed by Keil under the notion participatory discrepancies, PD
(Keil, 1987, 1995). Moreover, different occurences of PDs have been empirically studied by Prögler (1995) and Alén
(1995). What can be learned from these empirical investigations is that different kinds of rhythmic deviations may in
varying degree be characteristic of musical styles of performance, and contribute in fundamental ways to a
communication of motional and emotional musical qualities from the performer to the listener and within the group of
performing musicians (for complementary information and reviews on these matters see Palmer, 1997; Clarke, 1999;
Gabrielsson, 1999).
The majority of research on rhythm performance has been concerned with investigating attack-point rhythm, i.e. attack
points and durations, and is thus based on information of a finite number of discrete points along the one-dimensional
axis of time. The musical performance itself, however, is unfolded as a continuous movement in time and space, created
through an interaction between the musician, a musical instrument, and different physical/social environments related to
which the performance takes place. It therefore seems natural to pose the following epistemological question: What
information is lost by projecting the continuous, multi-dimensional phenomenon of rhythmic performance onto a
discrete, one-dimensional registration of points of time? This paper suggests that essential qualities of a "timbral" nature
are lost in the attack-point description of rhythmic performance. In addition to investigating attack points and durations,
it is important to view musical performance as a continuous phenomenon, and take gestural aspects of the performance
into consideration.
From this point of departure, a major concern of the present project is to formulate a new description of musical rhythmic
activity, through which gestural aspects of performed rhythm are taken crucially into account. We try to do this by
establishing a terminology and model for rhythm analysis and rhythm synthesis where movement activities of the rhythm
file:///g|/Sun/Waadelan.htm (1 of 11) [18/07/2000 00:31:38]

MODULATED RHYTHMS
performer are fundamentally implemented. A basic idea in this respect is to view performed rhythm as a result of mutual
interactions of different movements (oscillations), and to construct a theoretical model describing rhythmic activity by
means of frequency modulated rhythms, where mathematical, trigonometric functions are representatives of "atomic"
movements. The construction of this model is done in a stepwise manner, providing solutions to the following problems:
A. Present a model of rhythmic structure, where information of note values is represented as continuous movements
through attack points.
B. Construct a model of expressive timing, where performed rhythm is viewed as a result of continuous interactions
of movements: one movement modulating another movement.
This, somewhat complementary relationship between rhythmic movements characterizing rhythmic performance of
music, and modulated rhythms used as a new technique of making rhythm synthesis, constitute an important axis in the
development of our concepts and models. An illustration of the basic idea we are trying to pursue is given in the figure
below:
Figure 1. Illustration of the relationship between rhythmic movements and modulated rhythms.
The left hand side of Figure 1 illustrates expressive timing, understood as a process where structural properties of
rhythm are transformed into live performances of rhythm, whereas the right hand side indicates the construction of
our models (A) and (B). The elements of model (B) are various non-metronomical movement curves, naturally
interpretable as movements associated with syntheses of rhythm performances. Thus, our idea is that rhythmic
movements typical of live, rhythmic performances are approximated by modulated rhythms.
Within our theoretical model the various syntheses of live rhythmic performances that we are able to create exist
on a purely formal and rather axiomatic level, where mathematical functions and graphic illustrations are
interpreted as different representations of rhythmic performances of music. The musical performance itself,
however, "exists" in an interaction with a temporal unfolding of sound. In collaboration with Sigurd Saue a
MIDI-based computer program has been constructed which converts the mathematical model into audible
syntheses of rhythm. The computer implementation of rhythmic frequency modulation is outlined in non-technical
terms by Waadeland & Saue (1999), Waadeland (2000, Chapter 6), and is given a more technical description by
Saue (2000). At the end of this paper we briefly present some examples of rhythmic performances that can be
simulated by means of modulated rhythms.
2. A Continuous Representation of Rhythmic Structure:

We start our model construction by a simple observation of rhythmic movements of the body: When tapping the
hand against a table in perfect synchronization with a metronome, a possible curve describing the hand's
movement could be something like the following:
Figure 2. Graphic illustration of a possible movement curve when tapping the hand against a table in
perfect synchronization with a metronome. Time is displayed along the horizontal axis, and the hand's
distance from the table is measured along the vertical axis.

MODULATED RHYTHMS
Observe that the points where the curve touches the horizontal axis correspond to the instants when the hand
touches the table, producing the audible tap. To be quite explicit, the curve above is given by the mathematical
function:
(*) pf(t) = A[1 - cos(ft)],
where t is time, f is the frequency, i.e. a measure of the speed by which the tapping is performed, and 2A is a
measure of the hand's maximum distance from the table. (In the figure above f = 1 and A = 1.) This curve is, of
course, not the only possible movement curve, but certainly a plausible one. It is, for instance, interesting to note
that on the basis of empirical investigations Viviani (1990) states that sinewaves are easy to approximate by
human movements, and are among the simplest predictable motions. In the following we call pf a pulse, and the
minimal points of pf (i.e. the points where the curve touches the horizontal axis) will be denoted pulse beats. With
reference to Figure 2 we choose the following numbering of the pulse beats of pf :
b is the pulse beat at t = 2(i - 1)π/f , i = 1, 2, 3, ...

i
More generally we now make the following definition:

Definition of a pulse:
A pulse is any function, t p(t), where:
(**) p(t) = A[B + sin(ft + φ )].
Observe that pf given in (*) is a special example of pulse as defined in (**), which is obtained by setting B =1 and
φ = 3π/2.
Without loss of generality (w.l.o.g.) we may assume that the movement curve in Figure 2 represents a metronomic
performance of quarter notes. If so, a metronomic performance of eighth notes is naturally represented by the pulse
pulse p (t) = A[1 - cos(2ft)], whereas a metronomic performance of eighth note triplets is in a similar manner
2f
associated with the pulse p3f(t) = A[1 - cos(3ft)], etc. Figure 3 below illustrates this situation:
Figure 3. An illustration showing the relation between different movement curves associated with
metronomic performances of different note values.

MODULATED RHYTHMS
Observe that in accordance with ordinary musical terminology p represents a subdivision of p in 2, and p is
2f f 3f
naturally interpreted as a subdivision of pf in 3. We now make the following definition:
Definition of subdivision in k :
Let pf(t) = A[1 - cos(ft)], k = 1, 2, 3, ...
The subdivision of pf in k is the pulse pkf(t) = A[1 - cos(kft)].
In order to represent complex sequences of note values by continuous movement curves we also need to define an
operation making ties of pulse beats. Note that whereas a subdivision of a pulse is given in a unique way, the
operation of making ties of pulse beats is multivalued, being dependent on the pulse beat on which the tie is to
start. If, for instance, we wish to make ties of n pulse beats, n = 1, 2, 3, ..., we have the following n possibilities:
The tie may start on the first, the second, the third, ..., or the n-th beat. Since the pulse pf has frequency f, a tie of pf
making ties of n pulse beats should be a new pulse with frequency f/n. Thus, we make the following definition:
Definition of n-tie:
Let p (t) = A[1 - cos(ft)], n = 1, 2, 3, ... A

f
n-tie of p is a pulse:
f
It is straightforward to verify that τ is a n-tie with minimal points on the pulse beats of p with numbers: l, l+n,
l f
l+2n, l+3n, ...
We are now ready to define a model of metronomic performance of rhythm, MPR:
MPR = Combinations of pulses, pf , subject to the operations of making subdivisions and ties.
Since every note value, as defined in traditional musical notation, is given as a result of subdividing or making ties
of some chosen reference value, it follows that every sequence of note values may be represented by continuous
movement curves using elements of MPR. Therefore, we claim to have obtained a solution of problem (A)
formulated in the introduction. An illustration of how note values, pulses, and movement curves are related, is
given in Figure 4 below. In this figure a quarter note is represented by the pulse: p1(t) = 1 - cos(t) (and
phase-translations thereof). The pulses representing the other note values are calculated using the definitions of
subdivisions and ties.
Figure 4. An illustration showing the connections between a sequence of notes, a movement curve, and a
mathematical representation of pulses as continuous functions, all being related to a robot-like rhythmic
performance executed in perfect synchronization with a metronome. The horizontal axis displays time, t,
where the first beat occurs at time t = 0. Observe how the choice of phase-translations make the minimal

MODULATED RHYTHMS
points of the different components of the movement curve coincide at the different beats of the metronomic
performance.
3. Frequency Modulated Rhythms:

In the model MPR every movement curve is created by connecting together various pulses, all of the form: p(t) =
A[1 - cos(ft + φ)]. This mathematical trigonometric function is naturally interpreted as a movement performed
when tapping the hand against a table in perfect synchronization with a metronome. Another, more general,
interpretation of a pulse is to view a pulse as a representation of a constant, static, isochronous alternation between
two states of energy, e.g. tension- relaxation, breath in- breath out, ebb- tide, fast- slow, hot- cold. Subject to this
interpretation pulses and the model MPR exceeds the domain of music and performance of musical rhythm.
We now address the following question: How should our model define interaction of movements? In MPR,
movements are defined as pulses, which in the most general form may be expressed as: p(t) = A[B + sin(ft + φ)].
Defining interaction of movements should thus mean defining how one pulse, p´(t), affects or interacts with
another pulse, p(t). Hence, what we are looking for is a binary operation, * , such that (p * p´ )(t) is a continuous
function, which in a natural way may be interpreted as a movement. There may indeed be several possible choices
of definition of this binary operation, and some of these may certainly be more relevant to interpretations in terms
of rhythm performance than others. Searching for one possible and, hopefully, relevant choice of a such binary
operation, we will borrow some ideas from synthesis techniques of other aspects of music performance than the
performance of rhythm.-
A well known technique of sound synthesis using various alterations or distortions of the frequency of an
oscillator in order to achieve parameter control over the timbral qualities and spectral richness of the sound is
frequency modulation, FM, pioneered by Chowning (1973). We now set out to investigate to what extent the
technique of frequency modulation, previously applied to syntheses of sound, may be transformed and adjusted to
a technique of frequency modulation of rhythm, thus providing a new tool for creating syntheses of live
performances of rhythm. Pursuing this idea, the flowchart we will be studying in the most basic applications of
rhythmic frequency modulation, RFM, is the following:
Figure 5. Flowchart for basic rhythmic frequency modulation. (Compare this figure with the flowchart for
basic FM-instrument, Chowning, 1973.)
In accordance with the figure above we now propose the following definition:
Definition of frequency modulated rhythms:

Let p and q
f f ´,φ
be the following pulses: pf(t) = A[1 - cos(ft)] , qf ´,φ(t) = sin(f ´t + φ).
A frequency modulation of pf by qf ´,φ is any function:
r(t) = A[1 - cos[ft + dsin(f ´t + φ)]]

MODULATED RHYTHMS
We use the notation:

to denote that r is a frequency modulation of pf by qf´,φ , with peak deviation, d.
If d = 0, there is no rhythmic modulation (i.e. as we see it; no interaction of movements) and the output of the
operation illustrated in the figure above is simply pf. To give a brief illustration of what might happen when d = 0 ,
we look at a concrete example:
Example:
Let p3(t) = 1 - cos(3t). A possible interpretation of

this pulse is a drummer playing with one hand on one drum, in perfect syncronization with a
metronome. W.l.o.g. we may assume that the drummer is playing a sequence of quarter notes. The
distance between two successive pulse beats is in this case the same for every pair of successive
beats. Now concider q (t) = sin(t + π/2) = cos(t). A frequency modulation of p3 by q1,π/2 with d =
1,π/2
1 is:
Looking at the graph of this frequency modulated rhythm, we immediately observe that the distance between the
modulated pulse beats is no longer the same for every pair of successive beats. If we, for the following discussion, let ∆
j
denote the distance between the pulse beats j and j+1 (modulated or not), we find that ∆1 is approximately 0.42T, ∆2 is
close to 0.32T, whereas ∆ is approximately 0.26T. Moreover, it can be shown that r is a periodic function with period T
3
= 2π. Thus, the distances between successive pulse beats make the pattern: L(long) - I(intermediate) - S(short);
L-I-S;...etc. Based on the interpretation of the unmodulated pulse as a drummer playing a sequence of quarter notes in
perfect syncronization with a metronome, it is now tempting to interpret this modulated pulse as a movement curve
associated with a new, non-metronomic performance, theoretically constructed by applying an "interaction" of one
movement, p , with another movement, q . Since the modulated pulse is periodic with three beats in each cycle, it
3 1,π/2
seems reasonable to interpret r as a performance of quarter notes in 3/4 meter, where the first beat is performed
lengthened, and the third shortened compared to a strict metronomic performance. Observe, however, that the length of
the measure in this non-metronomic performance is T (=2π), which is equal to the length of the measure in the
metronomic performance interpreted as a performance in 3/4 meter. It should also be noted that the modulated first pulse
beat in the graph above occurs before t = 0. Hence, a performance in accordance with the movement curve of this
example would perform the first beat of every measure a bit early compared to a strict metronomic performance where
the first beat is at t = 0.
The example above illustrates some important features of modulated pulses. However, in order to be able to create
syntheses of live performances of more complex rhythms, we need to investigate rhythmic frequency modulation of

MODULATED RHYTHMS
subdivisions and ties as well. The manner by which subdivisions and ties of modulated pulses are constructed, is
expressed in the following algorithm:
Algorithm for constructing modulated subdivisions and ties:
If n = 1, (α) creates subdivisions in m.

If m = 1, (α) creates n-ties.
If n > 1, m > 1, (α) creates a combination of subdivision in m and n-ties. In this case, the beats of the
n-tie, r , coincide with the beats with numbers l, l+n, l+2n, l+3n,... of the modulated m-subdivided
l
pulse.
The condition 0 ≤ d ≤ f/f´ ensures that the modulated pulse has the same number of beats as the unmodulated pulse
over any interval of length 2π. If d = 0, (α) reduces to the definition of subdivisions and ties in the unmodulated
case. It now makes sense to propose the following definition of a model of live performance of rhythm, LPR:
As illustrated in the example above, the technique of rhythmic modulation applied to pulses creates different
movement curves representing rhythmic performances characterized by various deviations from metronomic
regularity. Carried on to more complex rhythms, we are now able to construct syntheses of a wide variety of
non-metronomic performances of rhythm utilizing repeated applications of the algorithm (α). However, whether
such syntheses of non-metronomic rhythmic performances also yield relevant approximations to live performances
of rhythm is a question which can be answered only on the basis of empirical investigations. In the next section we
give some examples indicating that RFM might create some interesting simulations to live rhythmic performances.
Through our stepwise constructions of the models MPR and LPR, we have now arrived at a situation as illustrated
by the following figure:

MODULATED RHYTHMS
Figure 6. Illustration of interrelations between rhythmic structure and the models MPR and LPR.
Observe that if m ε MPR, then m is a movement curve associated with a metronomic performance of rhythm, and
δ (m) is a movement curve associated with a non-metronomic performance of rhythm, which in some cases also
i
represents a relevant simulation of live performances of rhythm. Moreover, it should be noted that on the basis of
the defined relations between rhythmic structure and movement curves illustrated in Figure 6, δ , δ , δ , ....., δ
1 2 3 k
represent transformations of structural properties of rhythm into (approximations of) live rhythmic performances,
and may thus be seen as representations of expressive timing, as Clarke defines this notion (Clarke, 1999, p.490).
Hence, we now claim to have obtained a solution of problem (B) formulated in the introduction. A theoretical
interpretation offered by our model construction is to describe representations of expressive timing as non-linear,
continuous transformations of rhythmic structure; or, to put it another way, to view expressive timing as a result of
rhythmic structure being "stretched" and "compressed" by actions of movements. It is at this point interesting to
note that Beek, Peper and van Wieringen (1992), modeling coordinated rhythmic movements such as breathing
and walking, cascade juggling, and polyrhythmic tapping conclude that: "Constrained movement involving more
than one limb segment often leads to modulation." (ibid., p.604). This conclusion of Beek et al. seems to support
our choice of frequency modulation as a mathematical expression of interaction of movements.
4. Syntheses of Rhythmic Performance:

In this section we present some examples of syntheses of live rhythmic performances that are constructed by
means of rhythmic frequency modulation. For complementary remarks and additional examples we refer to
Waadeland (2000), which also contains an audio CD providing sounding examples of rhythm syntheses made by
means of RFM.
Example (a): Simulating Vienna Waltz Accompaniment:
Empirical rhythm research has documented that in performances of music in 3/4 meter, various
patterns of beat durations; e.g. S-L-I, or L-I-S, are characteristic of musical style. As noted by
Bengtsson & Gabrielsson (1983), a well-known feature of performances of Vienna waltzes occur at

MODULATED RHYTHMS
the beat level in the accompaniment; the first beat is shortened and the second beat is lengthened,
whereas the third beat is close to one third of the measure length. In the example of Section 3 we
created a movement curve making a cyclic pattern: L-I-S of beat distances, where L = 42%, I = 32%,
S = 26% of the total measure. Hence, if a Vienna waltz accompaniment starts on the third beat of this
movement curve, the following cyclic permutation of L-I-S occurs: S-L-I, where, as before, S = 26%,
L = 42%, I = 32%. According to Bengtsson & Gabrielsson (ibid.) this represents a typical distribution
of beat durations in a Vienna waltz accompaniment. Observe that if d (the peak deviation) is allowed
to vary between d = 0 (no modulation) and the maximum value d = 3 (given by the condition d ≤ f/f´
in the algorithm (α), see section 3), the values of S-L-I are changed between the limits:
S = 33%, L = 33%, I = 33% 88 S = 17%, L = 61%, I = 22%
It should be noted that the values of the beat durations above, refer to Dii ("duration in-in"), i.e. the
duration from the onset of a tone to the onset of the following tone. As strongly emphasized by
Bengtsson & Gabrielsson (ibid.), the values of Dio ("duration in-out"), i.e. the duration from the onset
of a tone to the end of the same tone, are also of crucial importance in rhythmic performance of
music. In the computer implementation of RFM, Dio corresponds to the distance between the MIDI
messages: Note on - Note off. To make a simulation of a Vienna waltz accompaniment sound right,
Dio should be "long" for the first beat and "short" for the second and third beat of every measure (cf.
Bengtsson & Gabrielsson, ibid.).
Example (b): Simulating Non-Synchronization Between Musicians Playing Together:
In live performances involving multiple voices and different musicians playing together, perfect
synchronization between the voices seldom, if ever, occurs. When applying different parameters of
modulation to the different voices of a synthesized, polyphonic performance, various such
occurrences of non-synchronization, by Keil (1987, 1995) called participatory discrepancies (PDs),
may be simulated by means of RFM. In Waadeland (2000) several such examples are given,
including simulations of non-synchronization between the bass player and the drummer in a jazz
rhythm section, and syntheses of rhythmic "phasing" as applied in many compositions by Steve Reich
(e.g. Steve Reich: "Drumming", composed in 1971).
Example (c): Syntheses of Accelerando and Ritardando:
In example (a) above rhythmic modulation was applied to create deviations on the beat level. By
applying some more complex algorithms of modulation, e.g. "modulation of modulation" (serial
modulation), various deviations on the measure level can also be constructed (cf. Waadeland, 2000).
Moreover, by choosing the ratio between "carrier frequency" and "modulating frequency", f/f´, "large
enough" various syntheses of accelerando and ritardando may be created (ibid., Section 7.2).
5. Conclusions:
A main concern of this project has been to give a new description of rhythmic performance of music where
fundamental aspects of movements are incorporated. In doing so we have focused on gestural rhythm understood
as a continuous unfolding through attack points, rather than attack-point rhythm where a finite number of discrete
registrations of rhythmic performance are studied. Some interesting achievments of our investigations are:
1. A model of live performance of rhythm, LPR, is presented where expressive timing is simulated by applying
rhythmic frequency modulation as a new technique of rhythm synthesis. The model MPR is naturally
embedded in LPR. From our model construction it follows:
■ Structure of attack-point rhythm is transformed into structure of gestural rhythm. Thereby we
suggest a shift from a discrete to a continuous representation of rhythmic performance.
■ Written representations of music are transformed into representations of live performances of
music.
■ The technique of RFM is shown to provide a very simple temporal control over movement
curves associated with quite complex rhythms. In other words: by manipulating a small number
of parameters we are able to create a large variety of different simulations of live performances

MODULATED RHYTHMS
of rhythm.
1. Apart from approximating real performances, an interesting application of RFM is also to create various
unreal (or, "pathological") unfoldings of rhythm. The construction of such pathological performances is
interesting for several reasons. On the one hand, an understanding of parameters determining a pathological
performance may give valuable insight into what kind of adjustments should be made to make the
pathological performance non-pathological. Moreover, if we were able to correlate these model-constructed
adjustments to physical movements of the performing musician's body, this knowledge would indeed be
quite significant to music education. On the other hand, by appreciating the pathological performance as
valuable in its own right a new standard, or maybe put more appropriately, a new esthetics of rhythmic
performance is suggested.
2. The possibilities of creating various unreal rhythmic performances indicate that RFM synthesis may be
applied as a new compositional tool of electro-acoustic music. This is also illustrated in our simulation of
the phasing technique of Steve Reich.
A full understanding and comprehensive application of the RFM technique seems only in its beginning. Various
theoretical developments of our model can certainly be made and new interpretations and applications to a larger
class of rhythmic unfoldings than here presented may find their support in empirical investigations. Some such
ideas are presented in Waadeland (2000).
Acknowledgments:
The author is very grateful to Ola Kai Ledang, Department of Musicology, for encouraging support and critical
comments during these studies, and would also like to express his sincere gratitude to Sigurd Saue, Department of
Telecommunications, Acoustics, for fruitful cooperation in the development of the computer implementation of
rhythmic frequency modulation. Economical support to this project was provided by Norwegian University of
Science and Technology.
Notes:
1. This paper presents some basic results from the author's doctoral dissertation: "Rhythmic Movements and Moveable
Rhythms" (Waadeland, 2000). Some of these ideas have also been presented in complementary manners in Waadeland (1999).
References:
Alén, O. (1995). Rhythm as Duration of Sounds in Tumba Francesa. Ethnomusicology, 39 (1), 55-71.
Beek, P.J., Peper, C.E. & van Wieringen, P.C.W. (1992). Frequency locking, frequency modulation,
and bifurcations in dynamic movement systems. In Stelmach & Requin (Eds.), Tutorials in Motor
Behavior II (599-622). North-Holland.
Bengtsson, I. (1974). Empirische Rhythmusforschung in Uppsala. Hamburger Jahrbuch für
Musikwissenschaft, 1, 195-219.
Bengtsson, I. & Gabrielsson, A. (1977). Rhythm research in Uppsala. In Music, room, acoustics
(19-56). Stockholm: Publications issued by the Royal Swedish Academy of Music, No.17.
Bengtsson, I. & Gabrielsson, A. (1983). Analysis and synthesis of musical rhythm. In J. Sundberg
(Ed.), Studies of music performance (27-59). Stockholm: Publications issued by the Royal Swedish
Academy of Music, No.39.
Bengtsson, I., Gabrielsson, A. & Thorsén, S.M. (1969). Empirisk rytmforskning. Swedish Journal of
Musicology, 51, 49-118.
Chowning, J.M. (1973). The Synthesis of Complex Audio Spectra by Means of Frequency
Modulation. Journal of the Audio Engineering Society, 21 (7), 526-534.
Clarke, E.F. (1999). Rhythm and Timing in Music. In D. Deutsch (Ed.), The Psychology of Music,
Second Edition (473-500). Academic Press.

MODULATED RHYTHMS
Gabrielsson, A. (1999). The Performance of Music. In D. Deutsch (Ed.), The Psychology of Music,
Second Edition (501-602). Academic Press.
Keil, C. (1987). Participatory Discrepancies and the Power of Music. Cultural Anthropology 2 (3),
275-283.
Keil, C. (1995). The Theory of Participatory Discrepancies: a Progress Report. Ethnomusicology, 39
(1), 1-19.
Palmer, C. (1997). Music Performance. Annual Review of Psychology, 48. 115-138.
Prögler, J.A. (1995). Searching for Swing: Participatory Discrepancies in the Jazz Rhythm Section.
Ethnomusicology, 39 (1), 21-54.
Saue, S. (2000). Implementing Rhythmic Frequency Modulation. In C.H. Waadeland, Rhythmic
Movements and Moveable Rhythms, Appendix II (252-276). Trondheim: Department of Musicology,
Norwegian University of Science and Technology.
Seashore, H.G. (1937). An objective analysis of artistic singing, In C.E. Seashore (Ed.), University of
Iowa studies in the psychology of music: Vol.IV. Objective analysis of musical performance
(12-157). Iowa City: University of Iowa.
Viviani, P. (1990). Common Factors in the Control of Free and Constrained Movements. In M.
Jeannerod (Ed.), Attention and Performance XIII (345-373). Lawrence Erlbaum Associates, Publ.
Waadeland, C.H. (1999). Rhythmic Frequency Modulation - A New Synthesis of Rhythmic
Expression in Music. In Feichtinger & Dörfler (Eds.), DIDEROT FORUM on Mathematics and
Music. Computational and Mathematical Methods in Music (335-350). Vienna: Österreichische
Computer Gesellschaft.
Waadeland, C.H. (2000). Rhythmic Movements and Moveable Rhythms - Syntheses of Expressive
Timing by Means of Rhythmic Frequency Modulation. Dissertation. Trondheim: Department of
Musicology, Norwegian University of Science and Technology.
Waadeland, C.H. & Saue, S. (1999). Computer Implementation of Rhythmic Frequency Modulation
in Music. In J. Tro & M. Larsson (Eds.), Proceedings 99 Digital Audio Effects Workshop,
Trondheim, December 9-11, 1999 (185). Trondheim: Department of Telecommunications, Acoustic
Group, Norwegian University of Science and Technology.
Back to index

Professor Yoshiyuki Horii
PERCEPTION AND PRODUCTION OF VOCAL VIBRATO RATE AND EXTENT

horii@spot.colorado.edu
Background:
Theories of frequency modulations in vocal vibrato must address controllability of rate and extent of
fundamental frequency. It is reasonable to assume that one would not develop motor control unless
the results of the control are perceptible. The literature is equivocal on the issue of perceptibility and
controllability of rate and extent of vocal vibrato among professional singers. In particular, detailed
psychoacoustic data regarding auditory discrimination of vibrato rates and extents for singers are
needed.
Aims:
This paper examines perceptual discrimination of vocal vibrato rate and extent using synthesized
signals and controllability of the rate and extent by singers.
method:
Auditory stimuli were (1) synthesized /a/ with 3 rates (3, 5, 7 Hz)and 3 extents (0.5, 1, 1.5 semitones)
and (2) synthesized /a/ with 2 rates (5 and 7Hz) and 12 extents (2% to 7.5% with a 0.5%
increment),generated by a VAX computer using a program SPEAK at the Recording and Research
Center, the Denver Center for Performing Arts. Twelve singers attempted to match the synthesized
stimuli (1) above. Recorded voices were analyzed for accuracy of the match. Another group of ten
singers listened to a randomly ordered pairs of the synthesized stimuli (2) above, and made
"same/different" decisions for each pair.
Results:
Results of the singers attempting to match given rate and extent of vibrato strongly indicated that the
rate of vibrato is under voluntary control (within 10% accuracy) for the range examined while the
extent is not (within 60%). Results for extent discrimination indicated that the listeners needed
approximately 2.5% difference in extent in order to detect difference of two vibrato extents. Acoustic
analyses also revealed frequency modulation is a potential source of amplitude modulation in vocal
vibrato.
Conclusions:
In spite of some claims that vibrato extent is easier to controll than rate, the present study revealed the
opposite: the rate is easier to control than the extent. In addition, acoustic analyses of the singers'
production of vibrato indicated underlying relationships of frequency modulation and amplitude
modulation during vibrato.
file:///g|/Sun/Yoshiyuk.htm (1 of 2) [18/07/2000 00:31:40]

Back to index
file:///g|/Sun/Yoshiyuk.htm (2 of 2) [18/07/2000 00:31:40]

MUSICAL SCHEMATA IN CHILDREN FROM 10 TO 12 YEARS OF AGE:
Proceedings paper

A STUDY ON SEGMENTATION AND MENTAL LINE ELABORATION
Dimitra Koniari (1), Marc Mélen (2 & 3) and Irène Deliège (3)
(1) Department of Musical Studies
Aristotelian University ofThessaloniki
(2)Chercheur qualifié au Fonds National de la Recherche Scientifique
(3)URPM - CRFMW
1. Introduction
Numerous developmental studies have shown that at least some of children's knowledge is organized as schemata forms for familiar events, objects, people or places
and have focused on the role of schemata organisation in children's memory (see for a review Davidson, 1996). The present study is concerned with the ability of
musician and non-musician children (from 10 to 12 years of age) to form musical schemata during listening to a piece. As stressed by Neisser (1976, p.54) « A
schema ... is internal to the perceiver, modifiable by experience and somehow specific to what is being perceived. The schema accepts information as it becomes
available at sensory surfaces and is changed by that information. » Thus, the term schemata refers to mental structures which organise information received from our
senses and are continuously altered by that information. Regarding music information derived from listening to a piece, the elaboration of a schema, as pointed out by
Deliège (1997), should be understood as a reduction of the musical piece, rather than the reconstitution of its score.
The main problem in understanding the cognitive processes underlying real-time listening are related to memory for events evolving in time. In the late eighties,
Deliège (1987 for a first sketch) suggested that listening should be considered as a schematisation process built on cues picked up from the musical surface by an
abstraction mechanism (referred to as cue extraction in Deliège 1987; 1989; Deliège & El Ahmadi 1990). The role of this mechanism is to provide landmarks of the
temporal flow of the musical piece and to generate the segmentation and categorization of the musical structures (for a more general view of the model, see Deliège a
& b, and Mélen & Wachsman, this symposium). More closely in relation with the mental processes involved in the present experiments, i.e. the elaboration of a
mental line of a piece of music, are the concepts of "cognitive maps" and of "carte mentale" put forth, respectively by Tolman (1948) and Pailhous (1970), as cited by
Deliège (1991; 1998), which suggest that the animal or the individual are building up some kind of maps to summarize a more important amount of information.
Indeed, this idea is likely to be rather close to the processes involved in the elaboration of a mental line during listening to music.
The validity of this last proposal was tested, with adults musicians and non-musicians, using pieces from the contemporary repertoire (Deliège 1989) and the cor
anglais solo from Tristan und Isolde by Wagner (Deliège 1998). However, the development of this process has not been studied in details yet. Mélen (1999; Mélen &
file:///g|/Sun/Koniari.htm (1 of 10) [18/07/2000 00:31:43]

Wachsman, this symposium) found evidence that elements of the cue abstraction mechanism should already exist in infancy. These findings are in accordance with
the general remark of Dowling (1999) that many of the perceptual mechanisms of adults for music processing are found to be built on elements already present in
infancy. Deliège & Dupont (1994) and Mélen and Deliège (1995), in a series of experiments with children and adults, in relation to cue abstraction, also supported
evidence that this mechanism might already be present at these ages. Giving these arguments, the questions which are still waiting for response are: (i) what is the
development of the cue abstraction process and (ii) to what extent does music experience and training influence its role in the mental representation of a musical piece
during real time listening.
Several experimental methodologies have been elaborated to investigate the cue abstraction hypothesis. In the present study two methods are used: a) the
segmentation process, which underlines the organisation of musical events into groups and b) the mental-line procedure which examines the participant's ability to
reconstruct the piece after listening. These methods constitute the experimental designs from Deliège's study with adults (1998).
The pieces used in the experiments reported here are chosen in a children's repertoire: the Rondo Finale of the Sonatina no 2 in C by Anton Diabelli and the Laendler
no 10, extract of the Dances for Piano vol 1, D145, op. 1 by Franz Schubert. These pieces were chosen, on the one hand, as it was very unlikely that they would be
familiar to any of the participants and, on the other hand, because they presented an archetypal structure in respect to the classical forms of tonal compositions
(alternation of variations of an A and B motif).
2. The segmentation task

2.1. Method
Participants: There were forty-one (41) children attending Grade 5 classes and drawn from a primary school and a Musical Academy in Brussels (20 girls, 21 boys ;
mean age 11 years 1 month, ranging from 10 years 2 months to 11 years 8 months). They were divided in two groups depending their music training: twenty-one
children had never followed music lessons (NM) and twenty children had received a continuous music training, two hours per week, for at least 2 years and maximum
3 years (solfège and instrument) (M). Within the groups, children were assigned to two conditions depending on the familiarisation factor (i.e. number of
presentations of the piece - 1 or 3 - before the segmentation task). The groups will be hereafter refered to, respectively, following the factor training and the number of
listenings, as: NM1, NM3, M1, M3. All children participated in the experiment of segmentation of two short musical pieces, alternating within groups the order of
presentation of the pieces.
Experimental materials and equipment: Two short pieces from the classical piano repertoire:
a) the Rondo Finale of the Sonatina n° 2 in C by Anton Diabelli (total duration 24'') performed by Pietro Galli (Cassiopée 965256) and b) the Laendler n° 10, extract
of the Dances for Piano vol. 1, D 145, op. 1 by Franz Schubert (total duration 25'') performed by Alice Adler (Chant du Monde, LDC 278876).
The presentation of the music was generated using a recording of real piano sounds. For playback during the task two loudspeakers connected with a Macintosh
Power PC were placed equidistant from the participant. The MAX software (version 2.5) controlled the data collection.
Procedure: All participants were tested individually in one session for both pieces. They first completed a questionnaire concerning their identity and musical
experience. The volume of the music to be listen to during the tasks was adjusted to a comfortable level. The tasks for the two pieces (segmentation and
reconstruction) were then carried out: first the segmentation task, second the reconstruction of the piece. A ten-minute break was taken after the testing of the first
piece. The two pieces were presented in alternate order for each participant.
For the segmentation task, the experimental design was built in two phases. In the first phase a performance of the entire piece was presented 1 or 3 times, depending
on the group. Participants had to listen carefully to the music, as if they would be listen to a story, and to locate in their mind (inherently) the moments corresponding
to a punctuation. In the second phase the performance was repeated three times and participants had to introduce their punctuations (here refered to as
"segmentations") by pressing the space-bar of the keyboard. The first time was to become familiarised with the experimental material, and the two following ones
constituted the experimental data and were thus recorded.
Results and comments: Figure 1 and 2, respectively, show a general view of the segmentations of the piece by Diabelli and Schubert. Black and white columns
represent, respectively, the segmentations of the first and second recordings, and dashed columns are related to the number of segmentations that were confirmed by
the same participant and at the same place of the piece. As can be seen, a maximum of 12 segmentations were recorded for the Schubert piece and 14 for Diabelli, but
all of these were not necessarily introduced by each participant. However, it is worth noting that all the segmentations that were recorded, coincided with the main
articulations of the piece, as they would appear in a classical morphological analysis: i.e ends of musical phrases and motifs.
(a) M1 & M3 (b) NM1& NM3

Figure 1: Segmentations of Diabelli by musician (a) and non-musician (b) children. Black columns represent segmentations during the first recording, white columns segmentations during the
second recording. The dashed columns indicate segmentations that were confirmed by the same participant at the same moment.

(a) M1 & M3 (b) NM1& NM3

Figure 2: Segmentations of Schubert by musician (a) and non-musician (b) children. Black columns represent segmentations during the first recording, white columns segmentations during the
second recording. The dashed columns indicate segmentations that were confirmed by the same participant at the same moment.
a) The segmentation choice

Again, as in previous experiments with adults (Deliège 1989, 1998), some segmentations are more often chosen (see figure 1 and 2). For example, in Schubert (figure
2), it appears that four segmentations (n°s 3, 6, 9 and 12, respectively, end of bar 4, end of bar 8 - central cadence to the dominant -, end of bar 12 and end of the piece
- final cadence) are common in most participants' choices. In Diabelli (figure 1) only two segmentations predominate in all groups: n° 7, end of bar 8 (central cadence
to the dominant), and n° 14, end of the piece (final cadence).
Although, the two pieces share the same morphological structure (two periods of 8 bars divided into two phrases of 4 bars, constituted by 2 motifs A and B, of 2 bars
each, and their variations), participants appeared more sensitive to the four main articulations of the piece by Schubert than by Diabelli. A possible explanation for
this might reside in the difference of tempo of the pieces: the Schubert's tempo being slower than the Diabelli. Additionaly, the flow of Diabelli's temporal rate is
more continuous than the Schubert one, in which some kind of pauses are heard, indicating appropriate segmentation places.
No significant effects were found in the 2(training) x 2(familiarisation) x 2(composer) ANOVA mixed-design test (p>0.05), showing that the segmentation processs is
not significantly influenced neither by musical training, nor by familiarisation or by composer. This tendency was already observed in adult musicians and
non-musicians results, a reason for Deliège (1998) to consider the segmentation process as a rather automatic psychological behaviour.
b) The stability of the participants in their segmentations
As Deliège (1998) has observed in experiments with adults, the number of segmentations seems to be related to the personal psychological behaviour of the

participant. Results in this study show the same general tendancy in children (with mean distance average 1.2). A child who is segmenting generously during the first
segmentation task, will be repeating a similar behaviour in the second one; and in the opposite, a more economical behaviour will be also maintained. However there
were two exceptions in the M1 Diabelli group, i.e. children who segmented 9 and 13 times and 11 and 5 times respectively in the first and the second segmentation
task.
An analysis of the coherence of the segmentations was developed. In a given overall segmentation task of each child the number of the non-repeated segmentations of
the first task was added to the number of the new ones founded on the second segmentation task. By this way the distance of each child from a stable performance
was calculated. The results were analysed by an overall mixed model ANOVA 2(training) x 2(familiarisation) x 2(composer). Data showed significance only for
training (F(1,36)=6.278, p=0.0169), supporting the hypothesis that musician children are more stable in their segmentation choice. The difference in segmentation
between participants of the two familiarisation groups was not significant and no interaction was observed.
3. The reconstruction task: "mental line process"
3.1. Method
Participants: The children having participated in the segmentation task were here employed.
Experimental materials and equipment: The same pieces by Schubert and Diabelli were each cut into 8 segments of different lengths at the ends of musical
phrases. The segments were transmitted via MIDI interface and the MAX software into 8 keys of a device named ScaleGame (for details see Deliège, Delges, Oter &
Sullon, 1998), which permitted the real time listening of the musical information sent in each key.
Procedure: All participants were tested individually. They had been informed that they would be invited to reconstruct the piece, after 4 or 6 listenings, by using 8
keys of the device ranged in a different random order for each subject. The duration of the task was not limited. The MAX software (version 2.5) collected the data.
Results and comments:
a) Schubert: 3 children out of the total of 41 rebuilt correctly the piece (7.56%). They were all members of the musicians group (1 of the M1 group and 2 of the M3
one). Within the 38 wrong reconstructions an additional analysis was performed to observe if the children had been sensitive to the deep structure of the piece, i.e.
alternance of variations of motifs A and B. These results are interesting but not yet significant. 10 children in the remaining 38 (28.9%: 5 M and 5 NM) were sensitive
to the deep structure of the piece. However, as in similar research with adults (Deliège 1998), an effect of primacy and recency was found. 14 children (36.8%: 9 M
and 5 NM) chose correctly the first key for the beginning and 18 children (47.3%; 9 M and 9 NM) chose the last key to finish the rebuilding of the piece. Table 1
shows the distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants. Table 2, on the other hand, refers to
the possible locations chosen by the participants for each key.
Table 1
Distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants in the Schubert piece.
M1 M3 NM1 NM3
Segm Modes Mean M. Modes Mean M. Modes Mean M. Modes Mean M.
Dist. Dist. Dist. Dist.
1 1 3 2 1 2.5 1.5 1 2,9 1.9 7 4,3 3.3

2 2/6 5.1 3.1 2 4.3 2.3 2/6 4,5 2.5 2 4,1 2.3
3 3 4.9 2.1 3 4.5 1.7 5 4,5 2.4 4 5 2.2
4 4 4.9 1.3 4 4.7 2.1 2 3,3 1.6 5 5,2 1.8
5 3 4.1 1.2 5 4.4 0.8 6 5 2 5/7 4,7 1.9
6 1 4.9 2 6 5.2 1.6 8/7 4,8 2.3 3 3,5 3
7 2 3.7 3.3 7 4.7 2.4 7 5,4 1.7 2/1 2,8 4.2
8 8 5.2 2.7 8 5.7 2 8/4 5,5 2.4 8 6,4 1.6
Table 2
Localisation attributed by participants to the eight segments
In relation with the eight possible locations (bold characters), the numbers in the columns indicate which segments have been localised in that place, and the number
of participants (in parentheses) who have attributed this location to this segment.
Segm 1 2 3 4 5 6 7 8
1(17) 1(1) 1(2) 1(5) 1(3) 1(6) 1(7) 1(1)
2(1) 2(12) 2(4) 2(4) 2(3) 2(5) 2(9) 2(4)
3(8) 3(3) 3(10) 3(5) 3(7) 3(6) 3(2) 3(1)
4(2) 4(3) 4(4) 4(9) 4(5) 4(2) 4(5) 4(12)
5(8) 5(4) 5(5) 5(5) 5(12) 5(4) 5(4) -
- 6(13) 6(5) 6(4) 6(5) 6(7) 6(2) 6(6)
7(6) 7(2) 7(9) 7(4) 7(4) 7(6) 7(11) -
- 8(4) 8(3) 8(6) 8(2) 8(6) 8(2) 8(18)
b) Diabelli: Results are slightly better for both groups but the dominance of the musician group is relevant. 3 musicians (M3) and 1 non-musician (NM1) out of 41
children correctly reconstructed the piece (9.75%). Within the 37 wrong reconstructions, 8 respected the deep structure (21.6%; 7 M and 1 NM). As for the primacy
and recency effect, 23 children (62.1%; 10 M and 13 NM) put correctly the first key and 33 children (89.1%; 16 M and 17 NM) ended correctly with the last key.
Table 3 shows the distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants. Table 4 shows the possible
locations chosen by participants for each key.

Table 3
Distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants in the Diabelli piece.
M1 M3 NM1 NM3
Segm Modes Mean M. Modes Mean M. Modes Mean M. Modes Mean M.
Dist. Dist. Dist. Dist.
1 5 3 2 1 2 1 1 3.2 2.3 1 1.5 0.5
2 2 3.8 2 2 3 1.2 5 4 2.2 5 4.7 2.5
3 3 3.3 1.5 3 3.4 1 3 3.7 1.4 3 4.1 0.9
4 4 4.5 1 4 4.3 0.7 2 3.3 1.5 4 4 1
5* 6 4.1 1.5 5 4.1 1.1 5 4.6 1.3 5 4.5 0.9
6 6 5.2 1.4 6 4.8 1.6 7 4.9 1.8 6 4.7 1.7
7* 3 3.5 3 7 6.6 0.4 4 3.7 3.3 7 4.5 2.1
8 8 7.6 1 8 7.6 0.4 8 7.6 0.4 8 7.6 0.4
Table 4
Localisation attributed by participants to the eight segments
In relation with the eight possible locations (bold characters), the numbers in the columns indicate which segments have been localised in that place, and the number
of participants (in parentheses) who have attributed this location to this segment.
Segm 1 2 3 4 5 6 7 8
1(23) 1(2) 1(5) 1(1) 1(3) 1(3) 1(4) -
2(4) 2(13) 2(3) 2(8) 2(4) 2(6) 2(3) -
3(2) 3(4) 3(15) 3(2) 3(5) 3(4) 3(9) -
- 4(3) 4(5) 4(16) 4(4) 4(2) 4(7) 4(4)
5(8)* 5(11)* 5(9)* 5(8)* 5(18)* * * -
6(4) 6(7) 6(4) 6(5) 6(6) 6(15) - -
* * * * * 7(10)* 7(18)* -
- 8(1) - 8(1) 8(1) 8(1) - 8(37)
* Segments 5 and 7 are identical. In order to calculate the distribution parameters, n° 7 is considered as n° 5 when is located in positions 1-5, and n° 5 as n° 7 when is located in positions 6-8.
c) General remarks
The absolute values of the difference between the right key location and the location chosen by each child were calculated and analysed with a 2(training) x
2(familiarisation) x 2(composer) x 8(segment's position) ANOVA. Results revealed a main effect of the segment's position (F(1,37)=6.258, p=0.0001), corroborating
the primacy and recency effect already mentioned above, and of composer (F(1,37)=22.670, p=0.0001) indicating better results for the Diabelli piece. The difference
between musicians and non-musicians was not significant (F(1,37)=1.696, p=0.2009). Perhaps musician children's training, in terms of years and hours of tuition, was
not yet sufficient to reach significatively different results. Familiarisation was not significant either (F(1,37)=1.540, p=0.2224). However, the performance of the M3
group was better than the non-musicians', as shown especially in tables 1 and 3, where it can be observed that for both pieces this group of participants realised a
completely correct mode regarding the rebuilding of the piece.
4. General Discussion
The aim of the present study was to investigate the role of early music practice in the schematisation process by children, during real-time listening. Musician children who
participated in our experiments were at the first steps of their music practice and presented a mean time of 2.5 years of music training. Data collected from the analysis of
the results showed that their performances differenciated slightly from the non-musician children's ones. In similar studies with adults, musicians and non-musicians, a
greater differenciation in the performance of the two groups was observed (Deliège 1989; 1998). However, it is worth noting that data analysis reported the same results'
pattern, as the one observed with adults in all previous experiments for both tasks of segmentation and reconstruction of a piece (Deliège 1989; 1998). Music practice, even
from its early stages, seems to influence processes involved during real-time listening. Additionally, the factor familiarisation had a more prominent role in musician
children's memory, showing better skills to grasp the musical features.
Analytically, data from the segmentation task showed no effect of the music training factor in the process of grouping formation, i.e. the segmentation, during listening.
Similar groupings were observed in both groups of musician and non-musician children and they were in accordance with the main articulations of the piece. However,
differences in the coherence of the performance between the first and the second segmentation by musicians and non-musicians provide evidence that processes during a
re-representation of a musical piece might be influenced by musical experience. Musician children, even in the first stages of their music practice, are more stable in their
groupings. This provide evidence that, although segmentation process is suggested as a rather automatic psychological behaviour (Deliège 1998), music practice has an
effect in stability of its results in knowledge's representation.
Performance in the reconstruction task was slightly better for musicians. In addition, they seemed more sensitive to the deep structure of the piece than non-musicians. An
effect also of familiarisation was founded mainly in the piece by Diabelli. Children in the NM3 and M3 groups performed better than children from the NM1 and M1
groups. In Schubert, the comparison between the results of the two familiarisation groups showed a slightly better performance for the groups which received three previous
listenings. Children's ability to remember musical schemata occurring in listening seems highly related with familiarisation, music practice and the particularities of the
attended piece. As in general memory investigation, an effect of primacy and recency also appears in memory for musical structures. Children presented better results in the
reconstruction task for the begining and the end of the attended pieces.
Participants' performance differenciated also in accordance with the musical piece that had been listen to. Results from Diabelli were clearly different from those collected
for the Schubert piece. This differenciation might be highlighting the fact that even in musical pieces of the same style and period, particularities in their surface involve a
different impact in listeners' processing. Cue abstraction, schematisation process and memory of the resulted schema of a musical piece is influenced by its characteristics
as, for instance, the flow of the temporal rate.
In general, processes exhibited by both categories of children listeners, musicians and non-musicians, in the formation of musical schemata are not different. They are
simply used more efficiently (and perhaps result in more explicit representations, from a cognitive point of view) by children with music training than by children without
music training. Similar observations have been reported in previous studies with adults (see Deliège & Mélen, 1997). These remarks, underlined also by the fact that
experiments with infants (Mélen, 1999; Melén & Wachsman, this symposium) have shown evidence of the presence of the cue abstraction mechanism already in the initial
state of human cognition, suggest that this mechanism might be a predisposition of the human mind/brain, which during development and experience is modularized, in the
sense of Karmiloff-Smith's theory (1992).
Karmiloff-Smith developed the hypothesis that the human mind/brain starts out with cognitive predispositions that already exist in its early age. As development proceeds,
these predispositions are moduled from external influences and result to specific brain circuits that are activated in responce to domain-specific inputs and, in certain cases,
to relatively encapsulated modules' formation. This modularization process of the human mind/brain sustain the structure of its behaviour and is responsible for its particular
way of acquiring knowledge. During modularization, on the one hand, representations of the information already stored (both innate and acquired) are continuously altered
via a process of redescription or, more precisely, of an iteratively re-representation of knowledge in different representational formats, and on the other hand, implicit
information from these procedural representations is rendered via a process of "explicitation" in a more explicitly one.
Thus, in the light of Karmiloff Smith's theory, the cue abstraction mechanism might be an innate predisposition of the human mind which during development is influenced
and modulated by environmental constraints such as experience, training or culture. However, additional researches are needed to validate this hypothesis and to provide
more evidence for the role of the cue abstraction mechanism in the schematisation process of children of other ages and different levels' of musical experience.
Bibliography
Davidson, D. (1996). The role of schemata in children's memory. Advances in Child Development and Behavior, 26, 35-58.
Deliège, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff's grouping preference rules. Music Perception, 4(4),
325-360.
Deliège, I. (1989). A perceptual approach to contemporary musical forms. In S. McAdams et I Deliège (Eds). Music and Cognitive Sciences. Contemporary
Music Review, 4, 213-230. Trad. Frs : Approche perceptive de formes musicales contemporaines. In S. McAdams et I Deliège (Eds) La Musique et les Sciences
Cognitives. Bruxelles: Pierre Mardaga, pp. 305-326.
Deliège, I. (1991). L'organisation psychologique de l'écoute de la musique. Des marques de sédimentation - indices et empreinte - dans la représentation
mentale de l'œuvre. Thèse de Doctorat. Université de Liège: Non publié.
Deliège, I. (1997). Similarity in processes of categorisation : Imprint formation as a prototype effect in music listening. A preliminary experiment. In M.
Ramscar, U. Hahn, E. Cambouropoulos & H. Pain (eds), Proceedings of SimCat 1997: An Interdisciplinary Workshop on Similarity and Categorisation.
November 28-30, Edinburgh University, Edinburgh, pp. 59-65.
Deliège, I. (1998). Wagner "Alter Weise": Une approche perceptive. Musicae Scientiae, Special Issue, 63-90.
Deliège, I. (this symposium, a). Prototype effect in music listening. An empirical approach on the notion of imprint.
Deliège, I. (this symposium, b). Perception of Similarity and Related Theories.
Deliège, I., Delges, P., Oter, J-C. & Sullon J-M. (1998). Annexe: Le ScaleGame, un outil MIDI multi-fonctionnel. Musicae Scientiae, Special Issue, 117-121.
Deliège, I. & Dupont, M. (1994). Extraction d'indices et catégorisation dans l'écoute de la musique chez l'enfant. In I. Deliège (Ed). Proceedings of the Third
International Conference on Music Perception and Cognition/Actes de la Troisième Conférence Internationale sur les Sciences Cognitives de la Musique.
Brussels: ESCOM Publications, pp. 287-288.
Deliège, I. & Mélen, M. (1997). Cue abstraction on the representation of musical form. In I. Deliège & J. Sloboda (eds). Perception and Cognition of Music.
East Sussex: Psychology Press, pp. 387-412.
Deliège, I., Mélen, M. & Bertrand D. (1997). Development of Grouping Process in Listening to Music : An Integrative View. Polish Quarterly of
Developmental Psychology, 3(1), 21-42.
Dowling, J. (1999). The development of music perception and cognition. In D. Deutsch The Psychology of Music, San Diego: Academic Press, 2nd ed, pp.
603-625.
Karmiloff-Smith, A. (1992). Beyond Modularity: A developmental perspective on cognitive sciences, Cambridge MA: MIT Press.
Mélen, M. (1999). Les principes organisateurs du groupement rythmique chez le nourrisson. Musicae Scientiae, III(2), 161-191.
Mélen, M. & Deliège, I. (1995). Extraction of Cues or Underlying Harmonic Structure : Which Guides Recognition of Familiar Melodies? European Journal of
Cognitive Psychology, 7(1), 81-106.
Mélen, M. & Wachsman, J. (this symposium). Categorisation of musical structures in 6- to 10- month-old infants.
Neisser, U. (1976). Cognition and Reality. New York: WH Freeman & Co.
Back to index

Pitch-distributional effects on the perception of tonality
Proceedings paper

Nicholas A. Smith
Mark A. Schmuckler
Division of Life Sciences, University of Toronto at Scarborough
Tonality is an important structural property of music, and has been described by music theorists (Meyer, 1956; Lerdahl, 1988) and psychologists (Krumhansl, 1990) as a
hierarchical ordering of the pitches of the chromatic scale such that these notes are perceived in relation to one central and stable pitch, the tonic. This hierarchical structure is
manifest in listeners' perceptions of the stability of pitches in tonal contexts. For example, Krumhansl and Kessler (1982) had listeners rate how well each pitch of the chromatic
scale fit into a variety of tonal contexts (e.g., scales, cadences) and found a consistent pattern of perceived stability; these ratings have come to be called a "standardized key
profile." This profile, which appears in Figure 1 (with reference to a C major tonality) can be generalized to other keys by simply shifting the
profile such that the tonic value is aligned with any of the twelve notes of the chromatic scale.
file:///g|/Sun/Smith.htm (1 of 11) [18/07/2000 00:31:49]

Figure 1: The standardized key profile, taken from Krumhansl and Kessler (1982).
Given that the perceived stability of a pitch depends on the tonal context in which it occurs, it is important for musical processing that listeners apprehend the tonal structure of a
musical passage. One approach to how listeners establish a sense of key (but see also Butler, 1989) stems from the recognition that, within a tonal context, those pitches that are
perceived as most psychologically stable are also those pitches that are played most frequently, and for the greatest total durations; similarly, psychologically unstable pitches
occur infrequently, and for the shortest durations. This observation has lead to models of tonality perception, such as the Krumhansl-Schmuckler key-finding algorithm (described
in Krumhansl, 1990), that propose that listeners are sensitive to distributional information in music, and identify the key of a musical context based on the degree to which it
matches with such acquired representations of tonal structures. In this vein, a number of studies (Coady, 1994; Laden, 1994; Oram & Cuddy, 1995) have demonstrated listeners'
sensitivity to distributional information.
The Processes of Tonality Perception
What psychological properties underlie listeners' sensitivity to distributional information? One possibility is that sensitivity to this type of structure reflects two complementary
processes: differentiation and organization. By differentiation we mean the distinguishing of pitches from one another in terms of some relevant dimension, such as their total
duration or frequency of occurrence. In contrast, organization refers to a sensitivity to relations between differentiated pitches and the form in which the differentiated pitches are
represented. In a series of experiments we examined these two processes by manipulating the distributional properties of random orderings of the chromatic scale.

Differentiation
The degree to which the pitches in the melodies were differentiated was manipulated by applying a power transformation to the standardized key profile values (Krumhansl &
Kessler, 1982); the exponent for this power transform will be referred to as the tonal magnitude of the profile. This transformation has some interesting properties. First, changing
the tonal magnitude has profound effects on the absolute differences between the values within the profiles. Thus, at a tonal magnitude of 0.5, the value for the 0th and 1st scale
degrees are 2.52 and 1.49 (or .11 and .07 when expressed as proportions of the sum of all values), whereas at a tonal magnitude of 3.5 these same scale degrees have values of
645.22 and 16.56 (or .41 and .01) respectively. In contrast, varying the tonal magnitude of the profile only slightly influences the pattern of relative values in the profile (when
expressed as standardized values). Thus, the correlation between the original and transformed profiles is quite high across a range of tonal magnitudes. Duration profiles for tonal
magnitudes of 0.5 and 3.5 are shown in Figure 2.

Figure 2: Duration profiles for hierarchical, nonhierarchical, and binary stimuli, at tonal magnitudes of 0.5 and 3.5
Second, as the tonal magnitude approaches zero each value progresses toward the mean of all values, resulting in a profile that becomes flat, with the pitches less differentiated.
In contrast, as the tonal magnitude increases the pitches become increasingly differentiated, although at very high tonal magnitudes the value for the tonic pitch so far exceeds the
remaining pitches that the tonic dominates the profile, becoming the one and only pitch.
Organization
Organization was examined by either preserving or destroying the hierarchical structure present in the distributional information. In the hierarchical condition, the
duration/frequency-of-occurrence values mirrored those of the standardized key profile, such that the longest (0th scale degree) and second longest (7th scale degree) pitches were
a perfect fifth apart, and so forth. In contrast, in the nonhierarchical condition the duration/frequency-of-occurrence values were randomly assigned to pitches, thereby destroying
the typical hierarchical relations between the pitches, while preserving the degree of differentiation among the pitches. If the perception of tonality reflects simple memory for
longer or more frequently occurring pitches, then ratings of tonal stability should be similar in the hierarchical and nonhierarchical conditions. If, however, tonality perception
requires hierarchical organization of distributional information, then ratings of the tonal stability of pitches will correspond to the duration/frequency of occurrence of pitches
only in the hierarchical condition alone.
A third condition examined the possibility that listeners could extrapolate a hierarchical structure of tonality onto a set of pitches based solely on differentiating the tonic from all
of the remaining pitches. In this binary condition the value for the tonic was identical to that of the corresponding value for the tonic in the hierarchical and nonhierarchical
condition; in contrast, the remaining nontonic pitches all had the same value. If listeners can indeed extrapolate tonal structure by differentiating only the tonic, then ratings of
tonal stability should demonstrate some (presumably hierarchical) differentiation among the nontonic pitch despite these pitches being undifferentiated in their durational
properties. Sample hierarchical, nonhierarchical and binary duration profiles are shown in Figure 2.
Experiment 1
Experiment 1 examined whether listeners' perceptions of tonality were influenced by (a) the degree of differentiation (i.e., tonal magnitude) of the duration profiles, and (b) the
presence or absence of hierarchical structure in these profiles.
Method
Participants.
Forty students at the University of Toronto at Scarborough, all meeting a 3-year minimum musical training requirement, participated in this experiment in exchange for payment
or course credit. These listeners were assigned to one of two conditions: hierarchical and nonhierarchical. The mean years of musical training for listeners in each condition were
9.9 and 8.2 respectively; this difference in training was not statistically significant, t(19) = 1.38, p > .05.
Materials.
The stimuli consisted of a series of algorithmically composed melodies that were all 10 seconds in length and contained 24 notes, with each pitch of the chromatic scale occurring
twice. The duration of each pitch was determined in the following way. For the hierarchical condition each value in the standard major key profile (Krumhansl & Kessler, 1982)
was raised to one of ten tonal magnitude exponents (0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5). These transformed values were then expressed as a percentage of the sum for all
12 values, multiplied by 10000 (the duration of the melody in milliseconds) and divided by two (the number of times each pitch occurs in the melody. Melodies were then created
by randomly ordering the 24 notes, with the onset of a note immediately following the offset of the previous note. For the nonhierarchical condition, a randomized version of the
standardized key profile was created by randomly assigning duration values to the different pitches; all other aspects of stimulus generation were the same as in the hierarchical
condition. Thus, hierarchical and nonhierarchical melodies contained the same number of long and short pitches, but differed in how these pitches were organized.

All melodies were played on a Yamaha DX7 synthesizer, set to an electric piano timbre. The synthesizer was connected to a Macintosh 8100 AV computer via a MIDI interface,
and was controlled by a program written in the MAX programming language. Audio output from the synthesizer was fed into a Mackie 1202 mixer, and was presented to
listeners through two Boss MA-12 micro monitors at a comfortable listening level.
Design and Procedure
The study employed the probe-tone method developed by Krumhansl & Shepard (1979). Each trial consisted of a presentation of a melody, followed by a 1 s silent interval, and a
2 s probe tone. The probe tone was one of the 12 pitches of the chromatic scale, and was played with the same timbre, pitch height (i.e., octave) and loudness as the melody. After
each probe tone listeners rated on a 7-point scale the psychological stability of the probe tone, or how well they felt the probe tone fit into the context of the melody they heard.
Each block of the experiment contained 14 trials, with the same melody occurring on all trials within a given block; a different random order for the melody notes was employed
for all listeners, however. The first two trials in the block were considered practice, and were intended to familiarize the listeners with the melody for that block. Trials 3 to 14
contained the 12 probe pitches presented in different random orders for all listeners. The entire experiment consisted of ten blocks of trials, corresponding to the ten tonal
magnitudes tested. The order of the blocks of trials (i.e., different tonal magnitudes) was randomized across listeners. To avoid carry-over effects between blocks of trials, the
melodies were randomly transposed to a different tonic in each block. The entire experimental session lasted approximately 45 minutes, after which each listener filled out a
subject information form, and was debriefed.
Because the melodies were randomly transposed to different keys, probe-tone ratings reflect a goodness-of-fit measure relative to a given scale degree (e.g., 0th or 1st), rather than
to an absolute pitch (e.g., C4 or C#4); for expository purposes, however, probe-tone ratings are presented here with reference to the key of C major. To facilitate comparison of
the hierarchical and nonhierarchical conditions, the probe-tone ratings for the nonhierarchical condition were re-organized in terms of the scale degree of that pitch's duration. For
example, the rating for the probe tone whose pitch had the longest duration in the melody was assigned to the pitch C. The rating for the probe tone whose pitch had the second
longest duration was assigned to the pitch G, and so forth. Thus, the hierarchical organization of pitch durations destroyed by the nonhierarchical presentation has been
reconstructed for purposes of data presentation and analysis.
A preliminary analysis of the data investigated the degree to which listeners' probe-tone ratings were intercorrelated at each level of tonal magnitude in the hierarchical and
nonhierarchical conditions. Towards this end, a mean intersubject correlation for each listener (e.g., average correlation for listener 1 with listener 2, listener 3, and so on) was
calculated, with these mean correlations analyzed in a two-way Analysis of Variance (ANOVA) with the within-subject factor of tonal magnitude (0.0, 0.5, ... 4.5) and the
between-subject factor of condition (hierarchical versus nonhierarchical). Intersubject correlations varied as a function of both factors, with significant main effects for condition,
F(1,38) = 70.00, p < .001, and tonal magnitude, F(9,342) = 15.61, p < .001, as well as a significant interaction effect between condition and tonal magnitude, F(9,342) = 10.19, p
< .001. Generally, the mean intercorrelation increased with increasing tonal magnitude, but only in the hierarchical condition. This finding suggests that in the hierarchical
condition at high tonal magnitudes there is a common structure available to listeners that allows them to respond similarly.
The next step in the analysis involved examining whether listeners' ratings reflected the increased pitch differentiation produced by increasing tonal magnitudes, using a series of
one-way ANOVAs, with the factor of pitch class. In the hierarchical condition, no significant differences were found in the ratings given to different probe-tones at tonal
magnitudes 0.0 and 0.5 (both p's > .05). However, there were significant differences at tonal magnitudes of 1.0 and above, suggesting that listeners were differentiating between
probe-tones at these tonal magnitudes. In the nonhierarchical condition, no significant differences were found in probe-tone ratings for the different pitches at any tonal
magnitude.
Finally, the degree of organization of the differentiated pitches was examined by creating mean probe-tone ratings at each level of tonal magnitude, for hierarchical and
nonhierarchical conditions separately, and correlating these probe-tone profiles with the standardized key profile of Krumhansl & Kessler (1982). Figure 3 shows the results of
this analysis as a function of tonal magnitude and condition. In the hierarchical condition, correlations increased as a function of tonal magnitude, with high tonal magnitudes
producing strong correlations. Thus, as tonal magnitude increased, probe-tone ratings more closely matched the standard description of tonal structure. In contrast, for the
nonhierarchical condition there was no increase in correlations as a function of pitch differentiation.

Figure 3: Correlations between hierarchical and nonhierarchical probe-tone ratings and the standardized key profile, as a function of tonal magnitude.
In summary, Experiment 1 found that perceptions of tonality were affected by both the differentiation and organization of pitches. In terms of differentiation, the fact that a
minimum level of tonal magnitude was required demonstrates that absolute, rather than relative, differences in pitch durations are important for perceiving tonality. In terms of
organization, the finding that listeners only perceive tonality when the differentiated pitches are organized in a hierarchical fashion shows that differentiation alone is not
sufficient, but that the relations between differentiated pitches are important.
Because listeners perceived tonality in melodies in which the pitches were differentiated by their total duration, but completely undifferentiated in terms of their frequency of
occurrence (each pitch occurred twice in each melody), the present findings demonstrate that duration differences can be a crucial distributional property of music for tonality
perception. It is an open question, however, whether or not frequency-of-occurrence information can also play as significant a role in perceiving tonality; this issue is addressed in
Experiment 2.
Experiment 2

Experiment 1 demonstrated that hierarchical organization of pitches differentiated on the basis of note duration leads to a perception of tonality. Along with duration, however,
frequency-of-occurrence information is typically differentiated hierarchically, such that duration and frequency of occurrence are naturally correlated. Interestingly, experiments
that have teased apart the influences of duration and frequency of occurrence (Lantz & Cuddy, 1998) have observed that duration, but not frequency of occurrence, typically leads
to a percept of tonality. A similar result is indirectly suggested by the findings of Experiment 1, in which perceptions of tonal stability were related to varying note duration of
pitches, with frequency-of-occurrence information held constant. One way of interpreting this result is that undifferentiated frequency-of-occurrence information (which is the
case when this variable is held constant, as in Experiment 1) does not counteract the impact of differentiated duration information. Unfortunately, such an interpretation says
nothing about whether or not differentiation of frequency-of-occurrence information in and of itself can produce a percept of tonality. Investigating this question was the goal of
Experiment 2.
Method
Participants.
Forty students at the University of Toronto at Scarborough participated in this experiment in exchange for course credit or cash. These listeners were assigned to either the
hierarchical or non-hierarchical conditions. Listeners in the hierarchical and nonhierarchical conditions had an average 7.8 and 8.8 years of musical training respectively; this
difference was not statistically significant, t(35) = 0.99, p = >.05.
Materials.
Melodies were created that systematically varied the frequency of occurrence and total duration of their constituent pitches. All melodies were 10 s in length, and contained 100
notes. This study employed a more limited range of tonal magnitudes (0.5, 1.0, 1.5, 2.0 or 2.5), with the transformed values for each pitch expressed as a percentage of the sum of
the transformed values, and with this percentage then determining the proportion of the 100 notes assigned to a given pitch. Thus, frequency-of-occurrence information was able
to reflect variation in tonal magnitude.
The actual note duration for each pitch was calculated in one of two ways. In the uncontrolled duration condition all notes were sounded for 100 ms regardless of their pitch. As
such, the total duration for each pitch increased with frequency of occurrence, providing a natural correlation between the two. In the controlled duration condition, the total
duration for each pitch was set at 833 ms, with the duration of each individual occurrence of that pitch determined by dividing this total duration by its frequency of occurrence.
Thus, the total duration of each pitch is held constant, thereby breaking the natural correlation between duration and frequency-of-occurrence information.
As in Experiment 1, melodies were either hierarchical or nonhierarchical, with the association between the frequency-of-occurrence values and pitch either preserving or
destroying the hierarchical structure of tonality. In other respects the procedure was identical to that of Experiment 1.
Preliminary analyses examined the extent to which listeners' ratings were intercorrelated, using the same procedure as in Experiment 1. A three-way ANOVA on the average
intersubject correlations was calculated, with the within-subject factors of tonal magnitude (0.5, 1.0, 1.5, 2.0, 2.5) and duration type (controlled versus uncontrolled), and the
between-subject factor of condition (hierarchical versus nonhierarchical). This analysis revealed a main effect of condition, F(1,38) = 6.16, p < .05, with listeners in the
hierarchical condition showing significantly more agreement than those in the nonhierarchical condition. There were also main effects of duration type, F(1,38) = 35.96, p < .001,
with listeners' ratings showing more agreement in the controlled duration than in the uncontrolled duration condition, and tonal magnitude, F(4,152) = 8.08, p < .001, with more
intersubject agreement in higher tonal magnitude conditions. All 3 two-way interactions were significant: condition by duration type, F(1,38) = 0.44, p > .05, condition by tonal
magnitude, F(4,152) = 5.05, p < .001, and tonal magnitude by duration type, F(4, 152) = 4.47, p < .01. However, the three-way interaction effect of tonal magnitude, duration
type and condition was not significant, F(4, 152) = 1.96, p > .05. These results suggest that a commonly perceptible structure is more readily available to listeners in the
hierarchical condition, at higher tonal magnitudes, and when duration is uncontrolled.
Pitch differentiation was again examined in a series of one-way ANOVAs, with the factor of pitch class. These analyses found significant differences in probe-tone ratings for the
uncontrolled hierarchical and nonhierarchical conditions, suggesting that differentiation was a function of pitch duration and not frequency of occurrence. In addition, the strength
of this effect increased with increasing tonal magnitude, suggesting that listeners were differentiating pitches on the basis of their absolute, rather than relative, duration
differences.
Finally, the degree of organization was examined by correlating mean probe-tone profiles for the various conditions with the standardized key profile of Krumhansl and Kessler
(1982); the results of these correlations are shown in Figure 4. For all conditions the strength of the correlation increased as a function of tonal magnitude. However, the
correlation only reached statistical significance in the hierarchical uncontrolled duration condition, suggesting that the perception of tonal structure is especially dependent on
hierarchically organized absolute differences in duration.
Figure 4: Correlations between hierarchical and nonhierarchical probe-tone ratings and the standardized key profile, as a function of tonal magnitude and duration type.
In summary, Experiment 2 confirmed the finding that listeners' perceptions of tonality were affected by both the degree to which pitches were differentiated and the way in which
the differentiated pitches were organized. Adding to these findings is the result that perceptions of tonal stability were primarily dependent upon increasing duration information,
and were not driven by frequency of occurrence per se. When duration information was held constant (i.e., the controlled duration condition), listeners showed some sensitivity to
pitch differentiation through the general increase in correlation with the standardized key profile, although not enough sensitivity to recover the typical sense of tonal
organization.
Experiment 3
Although the previous studies suggest that it is the hierarchical organization of the complete set of differentiated pitches that leads to a percept of tonality, it might be that less
severe forms of hierarchical structure can similarly drive tonal perception. Is it, for example, necessary that there be multiple levels of differentiation and organization among
pitches, or can the percept of tonality be instantiated with a minimum amount of such organization? This question can be examined by creating a new type of melody whose
duration profile has only two possible levels of differentiation. In these binary profiles the duration value for the tonic is preserved, with the durations of the remaining pitches set
to a different level; thus, the only differentiation is between tonic and nontonic pitches. With these binary profiles, the tonal magnitude manipulation thus varies the ratio between

the duration of the tonic and the remaining pitches.
Method
Participants.
Twenty students at the University of Toronto participated in this experiment in exchange for payment or course credit. They all met the 3-year minimum musical training
prerequisite, having an average 8.15 years of training.
Materials.
The melodies were identical to the hierarchical melodies of Experiments 1 and 2, except that the duration values for the 11 nontonic pitches were set to the same value. Tonal
magnitude values varied as in Experiment 1, from 0.0 to 4.5, in 0.5 increments.
Procedure.
The procedure was identical to that of Experiments 1 and 2.
A preliminary analysis examined the degree of intersubject correlation in the probe-tone ratings using a one-way ANOVA with the factor of tonal magnitude. Overall, there was a
general increase in the average intercorrelation (from -.02 to .12), indicating increased levels of agreement between subjects as a function of tonal magnitude; these differences
were significant F(9, 171) = 7.51, p < .001.
The analysis of differentiation in listeners' probe-tone ratings, at each level of tonal magnitude, revealed significant differences between ratings of different pitch classes at tonal
magnitudes of 2.5 and higher. To determine whether these effects were due to the differentiation of the tonic from nontonic pitches, or from differentiation among the nontonic
pitches themselves, a second set of ANOVAs was performed on the nontonic pitches alone. Although significant differences were found at tonal magnitudes of 2.5 and 3.0, there
were no differences in probe-tone ratings at higher tonal magnitudes.
Finally, mean probe-tone profiles at each level of tonal magnitude were correlated with the standardized tonal hierarchy ratings of Krumhansl and Kessler (1982). As shown in
Figure 5, a general increase in the strength of the correlation was found as a function of increasing tonal magnitude, reaching statistically significant levels at tonal magnitudes of
4.0 and 4.5. To examine the possibility that these correlations were exaggerated by a high leverage tonic value, the correlations were recalculated for nontonic pitches only.
Dropping the tonic from the analysis eliminated the trend, lowering all correlation coefficients to nonsignificant levels.
In summary, Experiment 3 demonstrated that the differentiation of the tonic from nontonic pitches alone was not sufficient to evoke perceptions of tonality, and that further
differentiation among the nontonic pitches, as occurred in the hierarchical condition of Experiment 1, is necessary. One open issue with these findings involves the degree of
differentiation, or the number of hierarchical levels, necessary to invoke a percept of tonality. Although this question has been at least indirectly considered in some previous
work (e.g., Takeuchi, 1994; Temperley, 1999), a thorough investigation of this issue has yet to be undertaken. Regardless, the suggestion here is that, at the very least, multiple
hierarchic levels are required for adequate tonal perception. Interestingly, such findings have implications for models of musical key finding (e.g., the Krumhansl-Schmuckler
key-finding algorithm, in Krumhansl, 1990), in which tonality is assessed via a pattern-matching process with the standardized key profile values.
General Discussion
These experiments illustrate the importance of distributional information as an indicator of tonality in music, and that listeners' sensitivity to this information reflects both the
differentiation and organization of pitches. The consistent effect of tonal magnitude on perceptions of tonality suggests that listeners differentiate pitches on the basis of absolute,
rather than relative, differences in the total duration of pitches. Comparisons of hierarchical, nonhierarchical and binary melodies suggests that hierarchical organization of
duration information is similarly crucial to the perception of tonal structure. Furthermore, independent manipulation of the total duration and frequency of occurrence of pitches
lends further support to the claim that duration plays the primary role in pitch differentiation.

A further finding in these experiments is that random orderings of notes can invoke perceptions of tonality in listeners. This result contrasts, for example, with the work of West
and Fryer (1990), who failed to find tonality percepts when they asked listeners to provide tonality ratings after presenting random orderings of a set of pitches that uniquely
specifies a given key. One likely reason for this difference in findings is that West and Fryer failed to provide any differentiation among the pitches in their melodies, employing
essentially a single level hierarchical organization (presence versus absence of various pitches). Given the importance of both pitch differentiation and multiple hierarchical levels
demonstrated by the current experiments, the failure of West and Fryer's stimuli to produce tonal percepts is understandable.
The present experiments also somewhat undermine the importance of rare intervals in the perception of tonality (Butler, 1989). Given that these melodies were chromatic, the
rare interval information presumably critical for perceiving tonality was rendered highly ambiguous; nevertheless, listeners were able to perceive the tonal structure inherent in
these stimuli. Thus, theories of tonality perception that argue that listeners identify the tonality of music on the basis of rare intervals would have a difficult time accounting for
the current findings.
The finding that the tonal stability of a pitch is related to its proportion of the total duration may reflect a very general principal of auditory pattern processing, and fits well with
the proportion-of-the-total-duration rule (Kidd & Watson, 1992) in which selective attention to an element in an auditory pattern is a function of the proportion of the total
duration of the pattern for which that element is present. This finding, among others, has been incorporated into Lufti's (1993) component-relative-entropy (CoRE) model of
auditory pattern analysis that proposes that the discriminability of an element in a pattern is a function of the degree to which its duration (or other dimension) contributes to the
variance in durations among all elements. Increasing the tonal magnitude of a duration profile has precisely the effect of increasing the tonic duration's proportion of the total
duration, as well as its contribution to the variance of the duration of all pitches.
References
Butler, D. (1989). Describing the perception of tonality in music: A critique of the tonal hierarchy theory and a proposal for a theory of intervallic rivalry. Music Perception, 6,
219-242.
Coady, L. (1992). Perception of tonality as a function of note duration in novel melodic sequences. Unpublished honours thesis. Queen's University, Kingston, Ontario, Canada.
Kidd, G. R., & Watson, C. S. (1992). The "proportion-of-the-total-duration-rule" for the discrimination of auditory patterns. Journal of the Acoustical Society of America, 92,
3109-3118.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford.
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89,
334-368.
Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception
and Performance, 5, 579-594.
Laden, B. (1994). Melodic anchoring and tone duration. Music Perception, 12, 199-212.
Lantz, M., & Cuddy, L. L. (1998). Total and relative duration as cues to surface structure in music. Canadian Acoustics, 26(3), 56-57.
Lerdahl, F. (1988). Tonal pitch space. Music Perception, 5, 315-350.
Lufti, R. A. (1993). A model of auditory pattern analysis based on component-relative-entropy. Journal of the Acoustical Society of America, 94, 748-758.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional information in melodic sequences. Psychological Research, 57, 103-118.
Takeuchi, A. (1994). Maximum key-profile correlation (MKC) as a measure of tonal structure in music. Perception & Psychophysics, 56, 335-346.

Temperley, D. (1999). What's key for key? The Krumhansl-Schmuckler key-finding algorithm reconsidered. Music Perception, 17, 65-100.
West, R. J, & Fryer, R. (1990). Ratings of the suitability of probe tones as tonics after random orderings of the diatonic scale. Music Perception, 7, 253-258.
Back to index

SunPM6_4 Langner
Proceedings paper
RHYTHM, PERIODICITY AND OSCILLATION

Jörg Langner
Humboldt University of Berlin
Musikwissenschaftliches Seminar
Unter den Linden 6
D-10099 Berlin
Germany
Phone: +49-(0)30-20932065
Fax: +49-(0)30-20932183
E-mail: jllangner@aol.com
Introduction
Periodicity (the recurrence of events at regular time intervals) is generally viewed as a fundamental aspect of rhythm. There is also general
agreement that in music, variously strong periodicities of different sizes can be superimposed. Examples for this are the metric structures of
the various time signatures; a three/four meter for example can be seen as a superposition of a shorter periodicity (from beat to beat) with a
three times longer one (from down-beat to down-beat).
A series of music psychology studies have dealt with the modelling of listeners' perception of periodicity (e.g. Povel & Essens 1985,
Rosenthal 1992, Miller, Scarborough & Jones 1992, Brown 1993, Parncutt 1994, Large 1994, Todd & Brown 1996, Toiviainen 1997 and
Todd, Lee & O'Boyle 1999). The models available mostly display one or more limitations in their applicability or efficiency. For example,
they may not allow the treatment of the finer fluctuations of tempi present in actual performances (this is the case with Povel & Essens 1985,
Miller, Scarborough & Jones 1992 and Parncutt 1994), or they may react too slowly to such fluctuations (as is the case with Large 1994 and
Toiviainen 1997). A thorough discussion of these properties of the models can be found in Langner (1999, pp. 11-19). More remarkable still
are the differences between the rhythm-theoretical backgrounds of the studies named. Most of them concentrate on the metrical structure of a
piece of music. Underlying this is the notion that there is one correct solution (oftenly the time signature given by the composer), and the
model is only deemed to be successful when it comes to the same conclusion. However, the studies by Povel & Essens (1985) and Parncutt
(1994) go one step further. The authors argue, or at least suggest, that actual musical meaning possibly depends on the simultaneous
appearance of very different periodicites, which are in fact incompatible with one particular time signature. This idea is supported by the
studies of Yeston (1976), whose analyses of musical works consistently demonstrate the often very rich and highly complex network of
diverse periodicities, both simultaneous and successive: the analyses clearly view this as an important characteristic of artistic quality.
Seen from this perspective, it becomes obvious that models which can deal with more than the perception of meter are essential. If the aim is a
musically relevant analysis of rhythmic structures, it is necessary to detect the complex network of all the periodicities perceived by listeners.
The central question of this study is thus:
What kind of musical relevant periodicities are present in the rhythmic structure of a piece of music at any particular moment?
Method
In recent music-psychological research into rhythm, oscillation models play an important role. The aforementioned studies by Miller,
Scarborough & Jones (1992), Large (1994) and Toiviainen (1997) for example are based on such models. In these cases, it is presumed that
so-called oscillators can be activated by the periodicity of certain periodically occurring events present in the music; these oscillators
consequently function as periodicity detectors. Oscillators can be regarded either as abstract, mathematically-describable objects, or as
concrete populations of nerve cells. (These kinds of neural oscillators are known from neural science: see e.g. Dudel, Menzel & Schmidt
(1996, pp. 367 and 519-537). In both cases, however, the activation of oscillators is simulated by computer, i.e. abstractly.
An oscillation model of this type is also applied in the present study. Its basis is a set of 4080 oscillators, each with a fixed frequency and
phase. The frequency spectrum comprises 85 frequencies, which stretch logarithmically over the range from M.M.= 7.5 to MM = 960; in each
case, the phase spectrum is formed from 48 phases, which cover the region from 0° to 360° at an constant interval of 7.5°, so that any kind of
temporally shifted periodicites can also be detected. (The total number of oscillators results from 85 x 48 = 4080.)
Each oscillator contains what is known as an activation window, which opens and closes periodically, in exact correspondence with the
frequency and phase concerned. The oscillator is only sensitive to input when the window is open: in other words, it can only be activated at
these times. If a musical event occurs while the window is open, it activates the oscillator. In this manner, the dynamics of the music also
come into play, since louder events produce stronger activation. The calculation of activation proceeds step by step through the course of a
file:///g|/Sun/Langner.htm (1 of 5) [18/07/2000 00:31:52]

SunPM6_4 Langner
piece of music (5 ms has proved a suitable value for a temporal step). The activation of each oscillator is subject to decay - this means in
particular that if no new stimulus occurs, the oscillator gradually comes to rest again.
The loudness curve of the piece is used as input. An additional onset-detection is undertaken, in accordance with the importance of the
onset-times for the perception of rhythm. For the present study, a loudness curve in Sone was always taken as the basis, calculated from the
tone signal according to Zwicker's model of loudness (Zwicker & Fastl 1990, pp. 197-214): this ensures a good approximation of the
dynamics actually perceived by listeners, and has proved itself superior to the application of simple dB values. (The computer programme
used in this connection was written by Bernhard Feiten & Markus Spitzer, Berlin, for the Musikhochschule Hannover; further information can
be found in Langner, Kopiez & Feiten 1998, pp. 18-20.)
The activation values of the individual oscillators are compiled for each frequency as an oscillation level. But this procedure entails more than
a mere summing-up of the contributions of the various components; in particular, diverse interaction mechanisms between the individual
oscillators are implemented, effecting an intensification of the contrasts. Such intensifications of contrast are well-known from visual
perception (see e.g. Goldstein 1999, pp. 57-60 ).
The oscillation levels are also modified in such a way that a particular weakening occurs towards the edges of the frequency region, so that the
optimal activation occurs only in the middle region around MM = 100. This is supported by a series of tapping experiments, in which a
preference for the middle tempo range was repeatedly demonstrated (see e.g. Fraisse 1982, pp. 153-154 oder Kopiez & Langner 1998).
The output takes the form of so-called oscillograms (see Fig. 1). These diagrams represent the temporal course of the activations over a whole
piece of music: the darker the colour, the more intensely the corresponding frequency is activated. For better orientation, the onsets are
marked on the upper horizontal axis (the louder the onset, the darker the shading); at the right edge of the oscillogram, the note values
corresponding to a particular frequency in the concrete piece of music are named.
A complete presentation of the procedures used is beyond the scope of this paper; this can be found in Langner (1999).
Results
Fig. 1 presents the oscillogram for a conga-performance of the „Bonanza" rhythm by a percussion student (Example No.15 in Langner 1999).
The darkest colours appear at those frequencies which correspond to quarter-, eighth- and sixteenth-notes. Thus the strongest activations are
found among the note values present in the piece. (NB: the dark quarter-note band lies somewhat beneath MM = 120, this corresponds to the
recommended tempo of MM = 108.) If we regard the oscillators as periodicity detectors, then this is a plausible result: the very noticeable
darkening on the quarter-note level is particularly revealing, as the rhythm essentially arises from units with the length of a quarter-note
(eighth-note + sixteenth-note + sixteenth-note = quarter-note). Please note when reading the oscillogram that all oscillations emerge with a
certain delay compared with the events which stimulate them. For example: one can find no dark shading at all before the second tone has
been played. Before this point, the system cannot „know" which frequency belongs to the first two tones.

SunPM6_4 Langner
Fig.1: Oscillogramm for a conga performance of the „Bonanza" rhythm. The activation of the oscillations is indicated by shading at the
corresponding frequency (the more intense the oscillation, the darker the shading). The note values belonging to a particular frequency are
named at the right edge of the graph. The strongest activation can be found at frequencies corresponding to the note values of units present in
the rhythm. The oscillogram reveals the fine tempo deviations of the performer, especially at the level of the sixteenth notes.
On the level of the sixteenth-note, numerous tempo fluctuations made by the player can be observed: there is no consistent, horizontal
oscillation band. However, these fluctuations occur merely on the level of microtiming: even the quarter note band is consistent and almost
completely level.
Some of the shadings may be surprising, such as the light grey band representing the dotted eighth-notes. At first glance, there seems to be no
periodicity at this frequency. A careful inspection of the score reveals however that there are many tones with such a distance between their
onsets, for example the first, third and fifth as well as the fourth, sixth and eighth note of the rhythm. (A dotted eighth-note has the same value
as three sixteenth-notes.) Thus we can state that the procedure is sensitive to such hidden periodicities. It remains to be seen whether this
feature is relevant from a psychomusical point of view.
Discussion I
The example in Fig. 1, along with many more rhythmic analyses (see Langner 1999) demonstrate that, all in all, the procedure enables a
reliable periodicity detection - reliable in the sense that it can indicate all the periodicities clearly present in the rhythm under analysis. In
particular, oscillograms allow the visualisation of many details of performance, such as the fine tempo fluctuations of the player, and also
whether these fluctuations only occur on the microlevel or impact on larger units. It is exactly this possibility - of presenting the
tempo-figuration of a piece of music on more than one level - which seems a particularly beneficial application of the procedure (more
examples can be found in Langner, Kopiez & Feiten 1998 and Langner 1999).
It remains to be seen if all of the many details presented in the oscillograms are actually relevant for musical perception. In particular it must
be ascertained (1) whether the intensity levels indicated by darkening are correct in this form, and (2) whether the aformentioned hidden
periodicites actually play a musical role.
Concerning (2), we can refer to the studies by Yeston (1976), Povel & Essens (1985) and Parncutt (1994) mentioned in the introduction,
which hint the musical significance of the simultaneous occurrence of a great variety of different periodicities. With regard to (1), it has to be
noted that the oscillation intensities increase when their periodicities occur several times successively (at present to an upper limit of 5
repetitions, this is adjustable in the model); this behaviour of the model appears reasonable - it is to be accepted that the perception of a
periodicity becomes more intense when the repetition rate increases. Despite these positive indications, the question of verifying the model
remains. In the course of applying this method to various examples, a possible means of doing this was discovered quite unexpectedly. A
definite trend became apparent when various performances of the same rhythm were compared: those rhythmic performances which were
musically convincing demonstrated on the whole higher oscillation intensities and a more varied oscillation pattern than the badly-played
versions. This observation led to the idea that two values could be derived from each oscillogram: the overall intensity of oscillation and the
overall intensity of change (the latter is a measure for the amount of variation within an oscillogram); from these two values, an explanation
for the musical quality of a performance could be achieved. An examination of the models would thus be possible in an indirect way: a series
of performances could be evaluated by listeners, and in conclusion the usefulness of the model for explaining these evaluations could be
determined by regression analysis.
Experiment
Within the confines of this paper, only an overview of the method and its most important results can be offered. An in-depth description can
be found in Langner (1999), which also includes a CD with the sound examples used.
The stimuli of the experiment were 62 different conga-performances of 10 different rhythms. It dealt with relatively simple rhythms, such as
the „Bonanza" from Fig. 1 or the famous „Bolero" drum-rhythm. Most of these performances were played by students of the Musikhochschule
Hannover, but deadpan and average versions were also integrated. The various versions of a rhythm are differentiated only in their dynamic
shaping and their timing: the average tempo on the other hand, and the timbral and articulatory parameters, were uniform for all versions. To
achieve this, the performances had been transformed on to a drum computer. 47 of the performances were evaluated by 24 expert subjects
(people who study or studied music, ages ranging from 16 to 53 years), 40 of the performances were evaluated by 127 school pupils (from a
comprehensive school: around half had musical training; age range from 15 to 19 years). The subjects heard the different performances of a
rhythm several times and were then asked to evaluate how well (in the musical sense) the rhythm concerned had been played. A scale of one
to six was used for this purpose. In addition, the subjects were requested to give verbal commentaries on the versions and to explain the
reasons for their evaluations, where possible.
The average ratings were first examined by means of variancy analysis. By this process, the factor „version" appeared to be statistically
significant at the p<0.01 level in all of the rhythms analysed; in the majority of cases, the consequent multiple comparisons of means provided
significant differences as well (p<0.01). In a second stage, a multiple quadratic regression analysis was calculated. The dependent variable
was the average rating; the predictor-variables were the above-mentioned values for the overall intensity of oscillation and the overall
intensity of change. The regression analysis was carried out spearately for the 47 versions evaluated by the experts and the 40 evaluated by the
school pupils. The r2-value which signifies how well the regression fits the data, is 0.661 for the experts and 0.727 for the pupil subjects. This
means that around 66% and 73% respectively of the variance can be explained by the two predictors. (The effect of the overall intensity of
oscillation was statistically significant for both groups of subjects, the overall intensity of change was only significant for the school pupils.)

SunPM6_4 Langner
By adjusting some parameters of the model (e.g. the value for the decay), its ability to explain the variance can be increased to about 75% and
80% respectively (see Langner 1999, pp. 117-119 for details).
In order to confirm separate aspects of the model, further calculations were made. These included altering the oscillation model in such a way
that those oscillations belonging to the hidden periodicities mentioned above were very much suppressed. With this altered model, the overall
intensity of oscillation and the overall intensity of change were calculated for each performance anew, and the regression analysis was carried
out on this new data. The variance explanation receded in this case from 66% to 46% in the case of the experts and from 73% to 50% in the
case of the school pupils.
Discussion II
The ability to explain variance of 66% and 73% respectively in the evaluation experiments, provided by this model, can be regarded as
extremely beneficial, given the commonly accepted fact that listeners, including musical experts, are generally not in a position to justify their
evaluations in any objective way. This inability also manifested itself in the course of the experiment, when the verbal explanations were
evaluated (c.f. Langner 1999, pp. 103-104). The oscillation model thus allows a large measure of objectification in an area which was
heretofore hardly possibly to objectify. However, the model does not imply that there is only one good performance: on the contrary, there
may be very different oscillograms for any given rhythm, all resulting in equally high values for both the intensity of oscillation and the
intensity of change.
The musical relevance of the detected periodicities and therefore of the model as a whole can thus be seen to be confirmed. It is worth bearing
in mind that the question of whether or not a rhythm is well played is of great import: playing rhythms well is the daily goal of thousands of
musicians all around the world.
In addition, one single aspect of the model has been seen to be confirmed. The hidden periodicities detected by the model, which were
mentioned above, perform an important function in the explanation of the evaluations: if they are not present, the variance explanation reduces
drastically. From this we can conclude that these periodicities are indeed relevant for perception, in agreement with the ideas of Yeston (1976)
or Parnucutt (1994) cited above.
In other calculations not discussed here, the positive influence of further mechanisms of the model can be demonstrated, particularly the
intensification of the contrasts. When these mechanisms were removed, the variance explanation always deteriorated drastically (c.f. Langner
1999, pp. 116-117).
The relationship between the oscillations and aesthetic evaluation had already been demonstrated in earlier experiments made with piano
performances (Langner & Kopiez 1995, Langner & Kopiez 1996). The question of whether the explanatory value of the model is merely
culture-specific (the culture in question being that of the West), or if it can be applied more broadly, cannot be answered definitively at the
present time. Initial experiments in comparative evaluation, with the same performances of the same simple rhythms and African subjects,
have been carried (Kopiez, Langner & Steinhagen 1999). The first regression analyses with the results gathered up to now indicate that the
model must be adjusted to explain the African evaluations. It appears that the same model can be applied, but with different values for some
of the parameters. From this, it would be possible to conclude that the same basic perception mechanisms are in effect in the case of African
listeners, but that these work in a slightly different way due to different cultural conditioning.
A comparison of the present study with the other studies referred to above reveals that the most important difference is the goal set for the
research. Whereas the other studies are geared towards the explanation of phenomena of perception, the present study aims ultimately for the
explanation of musical effects. Its focus is thus the treatment of another level of hearing.
The procedure demonstrates none of the limitations to application mentioned in the introduction. It deals with real performances, and reacts as
quickly as possible to deviations in tempo. Moreover, it is online-compatible: in other words, it calculates the oscillations parallel to the
continuing progress of the music. These characteristics would appear to make it suitable for providing the musician (for example, a
percussionist) with instant feedback regarding the performance. For this reason, the development of a real-time version is planned - a PC
software package which would allow the generation of an on-screen oscillogram while the musician plays. The fine deviations of tempo for
example could then be viewed, while the display of additional values for the overall intensity of oscillation and the overall intensity of change
give further information on the musical qualities. Such a software package could be utilised by musicians as an aid to self-regulation, and
would of course also be applicable in instrumental tuition.
An example of such a real-time analysis program, which has already been realised, will be presented in the course of the conference: c.f. in
this regard the paper „Real-Time Analysis of Dynamic Shaping" presented by Langner, Kopiez, Stoffel & Wilz.
References
Brown, J.C. (1993). Determination of the meter of musical scores by autocorrelation. Journal of the Acoustical Society of America, 94 (4),
1953-1957.
Dudel, J., Menzel, R., & Schmidt, R.F. (Eds.). (1996). Neurowissenschaft. Berlin u.a.: Springer.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music. New York: Academic Press.
Goldstein, E.B. (1999). Sensation & Perception. Pacific Grove u.a.: Brooks/Cole.
Kopiez, R. & Langner, J. (1998). The irresistable force of rhythm: Evidence for multiple oscillation maxima in the "spontaneous" generation
of tempo and reactions to trigger-impulses. In Suk Won Yi (Ed.), Proceedings of the 5th International Conference on Music Perception and
Cognition, Seoul, August 26-30 (pp. 91-94).

SunPM6_4 Langner
Kopiez, R., Langner, J. & Steinhagen, P. (1999). Afrikanische Trommler (Ghana) bewerten und spielen europäische Rhythmen. Musicae
Scientiae 3 (2), 139-160.
Langner, J. (1999). Musikalischer Rhythmus und Oszillation. Eine theoretische und empirische Erkundung. [Musical rhythm and oscillation.
A theoretical and empirical investigation]. Dissertation, Hochschule für Musik und Theater Hannover. (A printed version of this dissertation,
including a comprehensive abstract in English, will be published by Peter Lang Verlag, Frankfurt/Main in 2000 or 2001).
Langner, J. & Kopiez, R. (1995). Oscillations triggered by Schumann's "Traeumerei": Towards a new method of performance analysis based
on a "theory of oscillating systems" (TOS). In A. Friberg & J. Sundberg (Eds.), Proceedings of the KTH Symposium on Generative Grammars
for Music Performance (pp. 45-58). Stockholm, May 27 .
Langner, J. & Kopiez, R. (1996). Entwurf einer neuen Methode der Performanceanalyse auf Grundlage einer Theorie oszillierender Systeme
(TOS). In K.E. Behne, G. Kleinen & H. de la Motte-Haber (Eds.), Jahrbuch der Deutschen Gesellschaft für Musikpsychologie 1995 (Vol. 12,
pp. 9-27). Wilhelmshaven: Noetzel.
Langner, J., Kopiez, R. & Feiten, B. (1998). Perception and representation of multiple tempo hierarchies in musical performance and
composition. In R. Kopiez & W. Auhagen (Eds.), Controlling creative processes in music (pp. 13-35). Frankfurt a.M.: P.Lang.
Large, E. W. (1994). Dynamic representation of musical structure. Dissertation. The Ohio State University.
Large, E.W. & Kolen, J.F. (1994). Resonance and the perception of musical meter. Connection Science, 6 (2 & 3), 177-208.
Miller, B.O., Scarborough, D.L. & Jones, J.A. (1992). On the perception of meter. In M.Balaban, K.Ebcioglu & O.Laske (Eds.),
Understanding music with AI: Perspectives in music cognition (pp. 429-447). Cambridge, MA: MIT Press.
Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms. Music Perception, 11 (4), 409-464.
Povel, D.J. & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2 (4), 411-440.
Rosenthal, D. (1992). Emulation of human rhythm perception. Computer Music Journal , 16 (1), 64-76.
Todd, N.P.McAngus & Brown, G.J. (1996). Visualization of rhythm, time and meter. Artifical Intelligence Review, 10, 253-273.
Todd, N.P.McAngus, Lee, G.S. & O'Boyle, D.J. (1999). A sensory-motor theory of rhythm, time perception and beat induction. Journal of
New Music Research, 28 (1), 1-24.
Toiviainen, P. (1997). Modelling the perception of metre with competing subharmonic oscillators. In A. Gabrielsson (Ed.), Proceedings of the
Third Triennal ESCOM Conference, Uppsala, Sweden, 7-12 June (pp. 511-516).
Yeston, M. (1976). The stratification of musical rhythm. New Haven: Yale University Press.
Zwicker, E. & Fastl, H. (1990). Psychoacoustics. Berlin etc.: Springer.
Back to index

monday
Back
Proceedings
Monday 7th Morning Session

. S1 S2 S3 S4 S5 S6
Thematic Symposium: Symposium: Thematic Thematic Thematic
Session: Session: Session: Session:
Metacognition and Categorisation,
Perception of Strategy use in similarity, Environmental Beat perception Absolute Pitch
voice and song instrumental perception, cue influences on and production
abstraction. development Chair:
music practice
Chair: Welch,G. (Third and final Chair: Drake,C. Rakowski,A.
session) Chair:
Convener:
Hargreaves,D.
Hallam,S. Convener:
Deliège,I.
Chair: O'Neill,S. Chair:
Louhivouori,J.
Discussant: Discussiants:
Jorgensen,H. Baroni,M.,
Cross,I. Mélen,M.
9.00 Vurma, A. Hallam, S. Mélen, M. Lamont, A. Dahl, S. Vitouch, O.
Where is a The development Categorisation of Musical Interpretation and Absolute

singer's voice if of performance musical structures experiences, timing in a simple recognition of
it sounds planning strategies in 6 to 10 musical drumming task musical keys in
forward? in musicians month-old infants representations non-absolute
and cognitive (Abstract) pitch processors
abilities in
childhood
9.30 Erickson, M. Chaffin, R. Malloch, S. Boedeker, K. Ito, J. Ohgushi, K.
Perception of Characteristics of Cognition and The influence of Swinging impulse A stroop-like

pitch and timbre expert practice: a affect in music musical coordination as a effect in hearing -
and the case study listening: characteristics on source for swing cognitive
classification of interrelations style preferences rhythms interference
singing voices between musical of elementary between pitch and
structure, cue, school children word for absolute
abstraction and pitch possessors
continuous
measures of
emotion
(Abstract)
10.00 Kelley, N. Nielsen, S. Eerola, T. Custodero, L. Snyder, J. Bishoff, I.
Detecting Self-regulated use Categorising folk Engagement and How two voices Recent
intonation errors of learning melodies using experience: a make a whole: discoveries in the
in familiar strategies in similarity ratings model for the contrapuntal psychophysiology
melodies instrumental study of children's competition for of absolute pitch
practice musical cognition attention in
human and
machine
pulse-finding
file:///g|/Mon/monday.htm (1 of 2) [18/07/2000 00:31:54]

monday
10.30 Vaughn, K. Cantwell, R. Cross, I. Harrison, A. Luck, G. .
Restructuring A metacognitive Musical Overcoming Synchronising a

melodies: how account of musical categories, children's gender motor response
children from knowledge and ethnoscience and typed preferences with a visual
India and the US musical cognitive for musical event: the
parse culturally processing anthropology instruments: an perception of
familiar and intervention study temporal
unfamiliar (Abstract) information in a
melodies conductor's
gestures
(Abstract)
11.30 Invited speaker: Baroni, M.

Meaning in Music
Chair: Clarke, E.
file:///g|/Mon/monday.htm (2 of 2) [18/07/2000 00:31:54]

SYMPOSIUM: METACOGNITION AND STRATEGY USE IN INSTRUMENTAL MUSIC PRACTICE
Convenors: Professor Susan Hallam, Oxford Brookes University, Oxford, and

Professor Harald Jorgensen, The Norwegian State Academy of Music, Oslo.
Rationale.
Over the past ten years there has been an increasing interest in the strategies
that musicians adopt when they practice and prepare for performance. This
research has developed within several different paradigms. Some has been
conceived within the =91expertise=92 paradigm, some has been within a
phenomenographic framework, exploring musicians=92 experiences of practice and
performance, and some has developed within cognitive educational psychology and
is related to studies of metacognition and learning. Despite the different
origins of the research there is much commonality in the findings.
This symposium brings together researchers with different perspectives, who
have adopted different methodologies with a view to increasing our
understanding of the issues relating to practice and performance in musicians
and outlining the directions that future research might take within or between
alternative paradigms.
Aims.
The aim of this symposium is to consider the research relating to metacognition

and strategy use in instrumental music with a view to:
a.. Exploring the underlying assumptions made by the different research

paradigms within which the research has been framed;
b.. Exploring areas of commonality and difference in the findings;
c.. Deepening our understanding of the processes of practising and their

relationships to performance in music;
d.. Considering the relevance of the research to musicians;
e.. Identifying future directions for research.
Speakers.
Robert H. Cantwell, Neryl Jeanneret, Yvette Sullivan and Ian Irvine, Faculty of
Education, University of Newcastle, Australia
Siw G. Nielsen, The Norwegian State Academy of Music, Oslo
Roger Chaffin, Department of Psychology, University of Connecticut, and

Gabriela Imreh, Columbia Artists
Susan Hallam, Institute of Education, Oxford Brookes University
Discussant.
file:///g|/Mon/Hallsymp.htm (1 of 2) [18/07/2000 00:31:54]

Harald Jorgensen, The Norweigan State Academy of Music, Oslo.
Back to index
file:///g|/Mon/Hallsymp.htm (2 of 2) [18/07/2000 00:31:54]

Vurma
Proceedings paper
WHERE IS A SINGER'S VOICE, IF IT IS "PLACED FORWARD"?

Allan Vurma, Estonian Academy of Music, Tallinn
Jaan Ross, Department of Arts, University of Tartu
Introduction
Professional singers and singing instructors often use figurative expressions that are difficult for a layman to understand in order to describe voice
quality and singing technique, for example, "supported voice", "directed to the mask" or "made to fly". The persistent use of such expressions seems
to indicate that the expressions carry a more-or-less clearly defined specified meaning and that language use here is of a clearly metaphoric nature.
The present paper will concentrate on the study of a pair of concepts: the "forward " or "backward" placement of the singing voice. Clearly, the
words do not indicate the location of the voice source from the point of view of the listener but certain timbral qualities of the singers, which can be
modified by the appropriate application of specific articulatory mechanisms. Random observations from the classrooms of the Estonian Academy of
Music (EAM) show that instructors often advise their students to "direct their voice forward", "avoid moving the voice back", "concentrate the
whole sound stream behind the two upper teeth" and so on. At the same time it is also clear that the communicative precision of such metaphorical
expressions cannot be great. Yet, the success of teaching depends on the unambiguous meaning of terms used in the study process: the student has to
understand, as clearly as possible, in what direction the instructor wants to develop their voice.
Vocal pedagogy uses the concept of "placing the voice" quite frequently. Miller (1977, 1996) defines "directing" the voice as a subjective term that
refers to the vibratory sensation during singing. "Directing the voice" should describe or provoke a certain feeling in the resonance cavities of the
vocal tract. Singers do not usually employ the term "resonance" in its scientific meaning but to rather freely describe the vocal timbre. According to
Miller, most users of the concept of "directing" the voice do not believe that the "directing" would take place literally. The physical processes
necessary for the production of a specific vocal tone may be, to an extent, coordinated through a concept of "directing" the voice.
Rulnick et al (1997: 711) state that two axes can be differentiated in the concept of vocal "placement": up/down and front/back. The sounds the point
of articulation of which is close to the alveolar ridge, for example /t/, /d/, /n/, /l/, /s/, /Z/, /S/, /tS/, /dZ/, /T/, /f/, /v/, /p/, /b/ and /m/, may be considered
"front" sounds. /r/ and /g/ are placed in the "middle". When frontal consonant has been established, it may be assumed that the vowel that is
pronounced together with the consonant would also be carried "forward". Harpster (1984) asks the student to sense the voice in "front", in the area of
the bridge of the nose. He associates five basic vowels (/i/, /e/, /a/, /o/, /u/) with perceptions on the scale of brighter (/i/ and /e/) and darker (/o/ and
file:///g|/Mon/Vurma.htm (1 of 9) [18/07/2000 00:31:57]
Vurma
/u/), while the vowel /a/ remains in the middle of the scale of perception. He claims that the brighter vowels are felt to be "forward", in the mask area
(in the area of cheekbones, bridge of the nose and eyes) and the darker vowels "back", in the mouth. In his opinion, the balance of the vocal timbre
depends on the right balance of the tone of the "front" and "back" vowels.
In the literature of the field the term pair "front" and "back" can be found in the descriptions of vowel quality. Thus, in phonetics (e.g., Wiik 1991,
International Phonetic Association 1999) different vowels are classified by referring to the position where the tongue is arched in the formation of
the specific vowel. The position of the tongue is in turn is related to the lower formant frequencies in the vowel spectrum. The dimension of vowel
height depends on whether the tongue has been arched high or low in relation to the palate. /æ/ is an example of a low, /i/ of a high vowel, with the
height of the vowel linked to the frequency of the first formant (F1) in the spectrum. Beside the dimension of vowel height, a parallel scale of
closed-open (connected to the openness of the mouth in pronouncing the vowel) is used where the closed vowel corresponds to the high and the open
vowel to the low vowel. The second dimension in describing vowel space shows whether the tongue is arched in the front, near the teeth or back,
near the velum. This gives the classification of vowels into front (e.g., /i/ and /e/) and back vowels (e.g., /u/ and /o/) which are linked to the
frequency of the second formant (F2) in the sound spectrum: it is high for the front vowels and low for the back vowels.
Titze (1994: 167) writes that singers often sense specific "positions" for vowels. The author assumes that such perceptions of location may be caused
by resonance features in the vocal tract. At certain frequencies standing waves are created by the reflection of the vibrations from the walls of the
vocal tract. The sensation of where the vowel is localized or vibration occurs is related to the localization of maximal pressure of the standing waves
in the vocal tract. Benninger (1994) finds that it is often useful to first recognize the sensations created by the right way of singing and memorize it
in order to re-create the process. Even though descriptions of such sensations need not be scientifically accurate, they are of great use in learning
singing technique.
The possibility of directing the voice has also been taken literally. For example, Vennard (1967: 81) refers to the acoustic theory of Scripture (1906)
on this topic, based on an earlier treatment by Willis from 1829. The theory of speech production recogized today (Fant 1960) that describes the
functioning of the human vocal tract, does not, however, have an easy answer to the question of directing the voice. Since Helmholtz (1875), two
major components are distinguished in the voice production process, the source of the voice and the system of resonators of the vocal tract that
function as filters. The spectrum of the so-called glottal sound, produced by the vibration of the vocal chords, is modified under the influence of the
resonators of the vocal tract, so that some partials are enhanced and some are attenuated, enabling the speaker or the singer to produce sounds of
very different timbre by changing the shape of the vocal tract. The singer's ability to change the timbre of the voice created is limited to giving this
or that shape to the resonators in the vocal tract and, to an extent, to regulating the qualities of the glottal sound, while the spreading of the voice in
the body is not controlled by the singers will.
There have been attempts to find a connection between the terminology describing the sound of the voice and objectively measurable acoustic
parameters. Bartholomew (1934) has associated the brightness of the voice of a good opera singer with strong partials in the voice spectrum in the
range of at about 2800 Hz. Cleveland (1977) describes connection between the type of voice and the frequencies of vowel formants in the singing
voice. In the eight male singers studied by him the vowel formant had a lower frequency if the singer was a bass and higher frequency if the singer
was a tenor; the formant frequencies corresponding to the baritone were of an average value. The qualities of the vocal timbre that are connected to
different vocal techniques (e.g., covered open, throaty, pressed, free) have been described, on the basis of the spectrograms, by Berg and Vennard
(1959). Sundberg (1970, 1973) has studied bass singers of dark and light vocal timbre and discovered that in the case of darker timbre, the formant

Vurma
frequencies and the level of the singer's formant are lower. Bloothooft and Plomp (1986) analyzed vowels that were sung using different vocal
techniques, for example, neutral, light, dark, pressed, free, soft, loud, straight, with great vibrato. They found that when singing with a dark timbre,
the singer's pharynx is wide open and the phonation is flowing, while the phonation is pressed and the position of the larynx is high in case of the
pressed singing. In differentiating the vocal timbre, the most informative term is "vocal sharpness", in which the level of the higher fundamental
tones was greater. In another case (Bloothooft and Plomp 1988) they studied the perception of the timbral difference of the vowels sung by a number
of singers belonging to different voice groups and the connections of the differences with different semantic scales about the vocal timbre (e.g.,
light-dark, open-closed, free-pressed, etc.) Experts who were musicians could link the different vocal timbres to more semantic scales;
non-musicians used different scales synonymously in most cases. The sharpness of the voice was the only quality that stood out as a verbal category.
Another verbal category that could be distinguished in perception tests was colorfulness that is acoustically linked to the level of the singer's formant
in the vocal spectrum.
Methods
The present study will address the possible differences between the voice placed "forward" and "backward" in three different ways. First, a group of
singing teachers, both Estonian and foreign, were asked, in the form of interviews or written expert opinions, three questions: (1) Does the
terminological opposition - voice placed "forward"/"backward" - have a clearly defined meaning? (2) If yes, then what are the qualities that are
associated with the voice placed "forward" and "backward" (3) What vocal techniques should be employed to achieve a voice placement "forward"
and "backward", respectively? These questions were posed to eleven instructors at the Estonian Academy of Music (EAM) and to vocalists thorough
the Internet mailing list at the address http://www.vocalist.org/.
As the second step, short triads, performed by students of classical singing at the EAM, were recorded. Students were requested to sing so that the
voice placed "forward" and "backward" could be distinguished: in the first series they were to sing with the voice "placed forward", in the second
with the voice "placed backward". The triads were sung on the five so-called basic vowels, /a/, /e/, /i/, /o/ and /u/. The students were recommended
to sing in D major but they could change the key, choosing the tessitura most suitable to their voice. The meaning of the words "voice placed
forward/backward" was not explained. Of the twenty students eleven were female and nine male and their period of study in singing ranged from
two to ten years, with the average duration of studies being five years. The recordings were made using a DAT tape recorder Pioneer D-500 in a
low-reverberation studio, with a AKG C414B microphone placed 20 cm from the singer's mouth. The recordings were digitalized at the frequency of
22.05 kHz.
The recordings were then subjected to acoustic analysis with Voice Analysis software from Tiger Electronics in order to identify the differences
between the voice "placed forward" and "backward". Both the LPC and FFT algorithms were used to estimate the second (F2) and third (F3)
formant frequencies of each triad as well as the relative strength of the so-called singer's formant, i.e., the frequency maximum within the range of
2-4 kHz. Since it is difficult to determine the first (F1) formant frequency in the singing voice (especially in case of a high fundamental) and the
division of vowels into front and back vowels is phonetically linked to F2 frequency, F1 frequency was not determined in the analyzed material.
In the third stage, a listening test was conducted for a group of singing teachers and students (sixteen people in all) where the testees had to decide
whether the recorded triads were sung with the voice "placed forward" or "backward". The test consisted of pairs of triads, both of which were
Vurma
performed by the same singer in the same key and using the same vowel. The only difference between the members of the pair was that in one case
the singer tried to keep his or her voice "placed forward", in other "backward". The order of the members in the pairs was random.
There were 87 pairs of triads, 78 of which had been chosen from among the material recorded by the students. In order to test the consistency of the
expert opinions, four of the 78 pairs of triads were used twice. Five of the 87 pairs of triads had been computer-synthesized by means of Voice
Synthesis software by Tiger Electronics. The members in each synthetic pair differed from each other by the value of one specific acoustic
parameter, which could be either F1, F2, F3 and F4 frequency or the relative strength of the spectrum peak (actually F3) within the range of 2-4 kHz
that provisionally corresponds to the singer's formant. The sounds resembled the Estonian vowel /a/. The selection of the formant frequencies of the
vowel were based on the results of the acoustic analysis of the vowel /a/ sung by a baritone who participated in the recording.
The sound volume of all the recordings used in the listening test was levelled. The duration of the test was approximately 16 minutes. If a testee
found it hard to decide whether the voice was placed "forward" or "backward", they were allowed to leave the question unanswered. The total
number of answers was 1392.
Expert opinions of the teachers

All eleven professors from the EAM stated that for them the expression "voice placed forward/backward" had a specific meaning. However, many
of the teachers had difficulties with defining the content of the expressions. They said that the categories were intuitive and that it was impossible
clearly to verbalize them. The opposition "forward/backward" usually carries evaluative implications: the aim of studies in singing is to develop the
students' ability to sing with the voice placed "forward". At the same time, it seems that the expression "voice placed forward" may refer to certain
qualities of the timbre that need not prove sufficient to consider a vocal quality good ("voice placed forward may be poor in timbre, nasal, strained,
lacking in roundness, sharp, narrow and lean").The expression "voice placed backward" usually has a pejorative shade of meaning ("does not
sound", "is stuffy", "does not carry in a room"). For certain teachers the ideal placement of a singing voice is somewhere in the "middle". The
ambiguity of the term "voice placed forward" is characterized by the explanation which states that, in the voice that sounds ideally "placed forward"
the voice has been paradoxically divided between the "back" and "front". In describing the methods used to bring the voice "forward" from the
"back", the professors of the EAM employed a whole established system of vocal methods. In most cases the methods for bringing the voice
"forward" involved attempts to provoke the desired vocal behavior with the help of certain perceptual images.
Posing the same questions over the Internet yielded two longer replies. A Canadian professor used connections between the sound of the voice and
the vocal spectrum ("in the voice "placed forward" there is a concentrated set of higher sounds in the vocal spectrum") but also between the qualities
under study here and physiological processes ("the reason behind the quality of a voice "placed forward" is muscle activity of the vocal chords,
"voice placed backward" is associated with the passivity of the muscles of the vocal chords but also with a sensation of vibration on the top and the
back of the head"). A singing professor from the USA stated that each teacher uses the terms somewhat differently from the others and there is no
shared and unambiguous vocabulary in teaching singing. The performance of verbal material should be accompanied by demonstrations.
Change of acoustic parameters in the triads sung by the students

Vurma
On the basis of the measurements, it can be posited that both the F2 and F3 frequencies as well as the level of the so-called singer's formant tend to
increase rather than decrease in "forward placement" in comparison with "backward placement". "Forward placement" makes the F2 frequency to
rise in 63 per cent and to diminish in 17 per cent of the pairs. For F3 the percentages are 55 and 23, respectively. The level of the singer's formant
increases in 70 per cent and decreases in 23 per cent of the cases. At that, the average increase of F2 is 5.2 per cent and that of F3 1.7 per cent. The
level of the singer's formant in the spectrum (measured in decibels) is, on the average, 2.9 dB higher for "forward placement" than for "backward
placement".
If we analyze the results vowel by vowel, it is revealed that the front/back opposition works in the most consistent way with the vowel /e/, for which
F2 increases in 90 per cent of the cases and decreases in 5 per cent of cases. On the other hand, for /o/ the probability of both F2 and F3 increasing or
decreasing is nearly equal (36 and 32 per cent, respectively) although the singer's formant is strengthened rather systematically (in 63 per cent of the
cases). The level of the singer's formant is rising most consistently for /e/, /a/ and /i/ (in 79, 78 and 72 per cent of the cases, respectively) and the
least consistently for /o/ (56 per cent).
If we observe the average values of the changes vowel by vowel (see Table), we can state that the F2 of /e/ undergoes the most conspicuous changes
(10.7 per cent when "placed forward"). The average change of F2 for /i/ and /u/ is also relatively high (6.5 and 6.8 per cent, respectively). For /o/,
however, the average value of F2 decreases by 1.9 per cent when the voice was "placed forward". The average change of the values of F3 is less
salient. Still, the change is positive for all vowels except for /o/, where the average value of F3 decreased by 0.8 per cent. On average, the singer's
formant is stronger for "forward placement" in all vowels. The difference is more marked for the vowels /e/ and /i/ (4.5 and 4.0 dB, respectively).
The fact that the F2 frequency of the second formant changes a lot for the vowels /e/ and /i/ and little for /o/ is in some correlation with the
international phonetic alphabet. IPA vowel diagram (International Phonetic Association 1999) differentiates between the front vowels /i/ and /e/ and
their back variants /I/ and / ε /, but the /•/ and /o/ are in similar positions on the "back-front" axis.
The data at our disposal do not allow us to draw definitive conclusions about which articulatory mechanisms could produce the acoustic changes
described above. We can only speculate that the increase in the frequency of F2 is caused by the arching of the tongue forward in the mouth
(Sundberg 1987). The increase of formant frequencies (both F2 and F3) could be the result of shortening of the vocal tract, for example through
raising the larynx or retracting the lips and corners of the mouth (Sundberg 1987). The increase in the level of the singer's formant with the voice
"placed forward" may also be caused, in addition to alteration of the vocal tract shape, by changing the operation of the vocal cords, i.e. the
phonation type.
Recognition of "backward" versus "forward" placement in the listening test

The listening test involved 16 subjects, 8 of whom were third-year singing students of the EAM and 8 their professors. Four pairs of triads were
repeated twice in the test in order to determine the consistency of the evaluators in distinguishing vocal quality. The number of answers that
identified the voice similarly in both cases was 29. At the same time, there were only 5 cases, i.e. less than 8 per cent of the total, where a triad was
considered to be "forward" in one and "backward" in the other instance. This indicates that the task given to respondents was generally perceived as
meaningful and solvable within reasonable limits. The results of the listening test demonstrated that the singers' intention of placing the voice

Vurma
"forward" and "backward" was guessed correctly in 559 cases (43 per cent), mistaken for the opposite in 152 cases (11 per cent), while 632 cases (46
per cent) were considered indistinguishable.
To what extent do the estimates of different subjects about vocal quality coincide? We attempted to measure the opinion coincidence of 8 professors.
Coincidence of responses clearly prevails (85 per cent, i.e. 874 cases of agreement against 149 of disagreement). The proportion of divergent
answers depends considerably on the concrete pair of experts compared. For example, experts Ku and Gu or Ku and Vu are in complete agreement
about the pairs they could distinguish (25 and 38 answers coincided, respectively). At the same time, the expert pair Vu and Kä or Ku and Kä have
been in agreement in only about 70 per cent of the cases and in disagreement in about 30 per cent of the cases (the ratio of agreement to
disagreement is, respectively, 26/11 and 19/6).
Table. Results of the acoustical analysis of the "forward/backward" voice placement. Data on three variables are presented separately for five
vowels, /a/, /e/, /i/, /o/ and /u/. The three variables are the frequencies of the second and third formants, and the relative strength of the singer's
formant. For each variable, its value for the "forward" placement, averaged for all singers (F2f and F3f, in Hz, columns 2 and 5, or sformf, in dB,
column 8), the difference between the "forward" (f) and "backward" (b) placements (in per cent, columns 3 and 6, or in dB, column 9), and the
statistical significance level of this difference (columns 4, 7 and 10) are given.
1 2 3 4 5 6 7 8 9 10
vowel F2f F2f-F2b p F3f F3f-F3b p sformf sformf-sformb p

average (Hz) % average (Hz) % average (dB) (dB)
/a/ 967 3.8 0.003 2905 2.4 0.03 -18.1 1.5 0.74
/e/ 1659 10.7 0.26 2664 1.6 0.52 -16.2 4.5 0.43
/i/ 1923 6.5 0.006 2767 4.9 0.31 -16.3 4.0 0.19
/o/ 879 -1.9 0.006 2898 0.3 0.03 -19.3 2.5 0.64

Vurma
/u/ 824 6.8 0.025 2868 -0.8 0.13 -27.7 1.6 0.07
Overall 5.2 <0.001 1.7 <0.001 2.9 0.007
It seems that the greater the difference between the measured formant frequencies in the spectra for "forward" and "backward" placements, the more
easily the listeners perceive the singer's intention, i.e., classify the voice as being placed "forward" or "backward". The Pearson Product Moment
Correlation between the expected gradient of the formants (i.e. the higher frequencies for "forward" placement and the lower ones for "backward"
placement) and the correct guesses of the singer's intention by the listeners is higher for F2 (R=0.45, p<0.001) than for F3 (R=0.28, p=0.01). An
analogous correlation can be observed between the inverse formant gradient (where higher frequencies are measured for "backward" placement and
lower ones for "forward" placement) and the number of "wrong" answers, i.e., considering the singer's intention to have been the opposite to what it
actually is (R=0.44, p<0.001 for F2 and R=0.37, p<0.001 for F3). If the level of the singer's formant is higher for "forward" placement, the listeners
are more accurate in identifying the singer's intentions (R=0.29 and p<0.01); if the level of the singer's formant is lower for "forward" placement
than for "backward" placement, the listeners tend interpret the timbre of the pair against that of the singer's intention (R=0.25, p<0.05).
In addition to the sung triads, the hearing test also contained five synthesized triad pairs. The members of the pairs differed by the F1, F2, F3, and F4
frequencies and by the level of F3. Defining the synthesized voice in terms of "forward" and "backward" placement was considered strange by the
majority of experts who found it impossible to perform the task. For those experts who were able to differentiate between the two synthesised vowel
triads, the "forward" quality of the voice was connected to both the increase in the frequency of F1, F2 and F3 and the strengthening of the singer's
formant (in reality F3).
Conclusions
The terminological opposition of "forward" and "backward" placement is rather widely used in vocal pedagogy. It is based on a century-old idea that
a good quality of a singing voice cannot be achieved without the singer "directing" his or her voice into certain body resonators such as various parts
of the cranium, thorax, etc. The concept of "directing" the voice is also connected to the vibratory sensations that are created in the process of
singing, also with the place where different sounds are created in the vocal tract. The idea that the voice can be literally directed into different parts
of human body has little to do with objective knowledge about the production and propagation of a human voice. However, the results of the present
study demonstrate that the "forward/backward" placement opposition still seems to have a meaning that - within certain limits - can be objectively
defined, and that is understood similarly by most students and professors. "Forward placement" seems to be a desirable aim to be achieved during
the training lessons with the help of the teacher. Acoustically, a voice placed "forward" differs from one placed "backward" by a higher level of the
so-called singer's formant. The increase of F2 and F3 frequencies also seems to be associated with the "movement of the voice forward from the
back" for both the singers and the listeners. In the case of the changes in the frequency of F2, the opposition of "forward/backward" used in vocal
pedagogy is analogous to the division, generally accepted in phonetics, of vowels into front and back ones which are distinguished by their F2
Vurma
frequencies. On the basis of the changes in the acoustic parameters, one could posit that the terms "voice placed forward/backward" may be
synonymes to terms "light/dark timbre". Sundberg's study (1970) distinguishes between similar differences in the acoustic features of the voice in
the case of light/dark timbre as we have in the case of voice placed "forward/backward" in the present study. If such a relationship indeed exists
between those terms, it would be necessary to clarify whether the vocal quality of "forward/backward" placement can always be associated with the
concepts light/dark timbre. This task, however, remains beyond the scope of the present project.
The results of this study demonstrate that there is considerable indeterminacy in the use of "forward/backward placement" as terms in vocal
pedagogy. This ambivalence may reach, on the basis of the listening test results, quite a high level in some cases (professors either could not
distinguish the placement or gave an opinion different from the one intended by the singer in 64 per cent of the cases). Consequently, the acoustic
(and articulatory) changes when the voice is placed either "forward" or "backward" need not be universal. The given situation may be the result of
both a different (sometimes even opposite) interpretation of the terms "forward/backward placement" by the singers or in their still insufficient
command of their own vocal mechanism, which hinders them from inducing the changes desired by the teacher into their own voice.
It may be said that, despite a certain objectively definable content of the terms "voice placed forward/backward", every teacher seems to use the
concepts idiosyncratically. The quality of being placed "forward" may mean accordance to a certain subjectively defined ideal standard. The voices
placed "forward" that are considered ideal by different experts, need not be the same. In other cases, the term "voice placed forward" may mean the
direction of certain changes in vocal quality, without being connected to a very clearly fixed standard. In working with beginners it should be
remembered that, in order to ensure effective communication, the vocabulary used between the teacher and the student should be developed
gradually and that the terms used by different singing teachers need not be absolute equivalents.
References
Bartholomew, W. T. (1934). A physical description of "good" voice quality in male voice. Journal of the Acoustical Society of America, 6, 25-33.
Benninger, M. S., Jacobson, B. H., & Johnson, A. F. (Eds.) (1994). Vocal arts medicine: the care and prevention of prefessional voice disorders.
New York,Theime Medical Publishers.
Berg, J. and Vennard, W. (1959). Towards an objective vocabulary for voice pedagogy. NATS Bulletin, 15, 10-15.
Bloothooft, G. & Plomp, R. (1986). Spectral analysis of sung vowel III. Journal of the Acoustical Society of America, 79, 852-864.
Bloothooft, G. & Plomp R. (1988). The timbre of sung vowels. Journal of the Acoustical Society of America, 84, 847-860.
Cleveland, T. F. (1977). Acoustic properties of voice timbre types and their influence on voice classification. Journal of the Acoustical Society of
America, 61, 1622-1629.
Fant, G. (1960) Acoustic theory of speech production. Mouton, The Hague.
Harpster, R. W. (1984) Technique in singing. New York, Schirmer.

Vurma
Helmholtz, H. von (1875) On the sensations of tone as a physiological basis for the theory of music. London, Longmans, Green, and Co.
International Phonetic Association (1999) Handbook of the International Phonetic Association: a guide to the use of the International Phonetic
Alphabet. Cambridge, Cambridge University Press.
Miller, R. (1977) English, French, German and Italian techniques of singing. Metuchen, NJ, The Scarecrow Press.
Miller, R. (1996) On the art of singing. New York, Oxford University Press.
Rulnick, R. K. , Heuer J. R., Perez, S. K., Emerich, K. A. & Sataloff, R. T. (1997). Voice therapy. In R. T. Sataloff (Ed.). Professional voice: The
science and art of clinical care. New York, Raven Press. pp 699-720.
Scripture, E. W. (1906) Researches in experimental phonetics: The study of speech curves. Washington, DC, Carnegie Institution of Washington.
Sundberg, J. (1970). Formant structure and articulation of spoken and sung vowels. Folia Phoniatrica, 22, 28-48.
Sundberg, J. (1973). The source spectrum in professional singing. Folia Phoniatrica, 25, 71-90.
Sundberg, J. (1987) The science of the singing voice. DeKalb, IL, Northern Illinois University Press.
Titze, I. R. (1994) Principles of voice production. Needham Heights, MA, Allyn and Bacon.
Vennard, W. (1967) Singing, the mechanism and the technic. New York, Carl Fischer.
Wiik, K. (1991) Foneetika alused. Tartu, Tartu Ülikool.
Back to index

Can we teach musicians to practise effectively
Proceedings paper
THE DEVELOPMENT OF PERFORMANCE PLANNING STRATEGIES IN MUSCIANS

Professor Susan Hallam, School of Education, Oxford Brookes University
Phone 0044 1865 863656 Fax 0044 1865 863171
Mailing address: 16 Eynsham Rd, Oxford, OX2 9BP, UK.
E-mail: shallam@globalnet.co.uk
Introduction
There is considerable evidence that practice plays a crucial role in the acquisition of expertise on a musical
instrument (Ericsson et al., 1993; Sloboda, et al., 1996), although there is debate regarding the degree of
importance (Hallam, 1998; Sloboda and Howe, 1991). One factor which may mediate the amount of
practise required to learn a piece to performance standard may be the effectiveness of the practice
undertaken.
Much of the research relating to effective studying has been carried out in relation to students in Higher
education. Early work was reviewed by Ford (1981). Since then a number of multi-dimensional models
have been developed which attempt to account for the many factors which contribute towards learning
outcomes in learners of all ages (e.g. Entwistle et al, 1992; Biggs 1993). There have also been attempts to
develop students capacity to "learn to learn" (e.g. Dansereau et al., 1978; Howe, 1991), which particularly
stress the importance of metacognitive activity. This is mirrored by developments in professional training
which increasingly make use of the notion of reflective learning (Kolb, 1984) and conceptualise the
professional as a "reflective practitioner " (Schon, 1987). These perspectives share in common a view that
educators should be concerned with enabling students to learn to learn. Within a musical context there has
been little research, although Jorgensen (1997) advocates the view that practice should be seen as a
"self-teaching" activity, with training in conservatoires designed to develop reflective learning.
Research suggests that experts have extensive domain knowledge, which can facilitate them in perceiving
meaningful patterns in that domain quickly and improve their analysis of the problem, which they
represent at a deeper level. They also have improved short and long term memory skills and strong self
monitoring skills (Glaser & Chi, 1988). Holyoake (1991) suggests that strategies adopted are dependent on
the context. He cites Dorner & Scholkopf (1991) who suggest that successful problem solvers have to
continually adjust the processes of planning, gathering information, forming hypotheses, making choices
and reconsidering decisions. They know how to do the right thing at the right time. There is no single
"expert" way to perform all tasks. Effective musical practice might therefore be seen as "that which
achieves the desired end-product, in as short a time as possible, without interfering negatively with
longer-term goals" (Hallam, 1997b). This assumes that effective practice might take many forms
depending on the nature of the task to be undertaken; the context within which the task is to be learned; the
level of expertise already acquired; and individual differences. It also suggests that the musician requires
considerable metacognitive skills in order to be able to recognise the nature and requirements of the task ;
identify particular difficulties; have knowledge of a range of strategies for dealing with these problems;
know which strategy is appropriate for tackling each task; monitor progress towards the goal; if progress is
unsatisfactory acknowledge this and draw on alternative strategies; evaluate learning outcomes in
file:///g|/Mon/Hallam.htm (1 of 10) [18/07/2000 00:32:00]

performance contexts and take action as necessary to improve performance in the future. The musician
must also have well developed metacognitive skills for supporting practice, e.g. managing their time
appropriately to be able to meet deadlines, maintaining concentration, maintaining motivation,
understanding what preparations they need to make to ensure high performance standards. The aim of this
study is to explore the nature of metacognition and planning in musicians and how these may change as
expertise develops.
The study
A semi-structured interview technique was adopted to enable an in depth analysis of the approaches to
practising, interpretation, memorisation and performance of the musicians. In the early stages of the
research, to validate the content of the interviews, each musician was shown a piece of music and asked to
describe the activities he or she would undertake during the initial stages of learning that work.
For ethical reasons, and the difficulties inherent in classifying professionals in terms of levels of expertise,
all of the musicians interviewed were chosen on the basis of peer evaluations of their high levels of
technical competence and their sensitivity in performance. Only those musicians whose performances were
consistently referred to as being of a high standard, both technically and musically, were included in the
study.
Twenty-two professional musicians were interviewed, 11 female and 11 male with an age range of 22 to
60. They were selected to represent differing spans of time in the music profession, differing
instrumentation, and a broad range of musical experience. All were practising freelance professionals
working within a range of musical environments.
The novice sample consisted of 55 string players, with standards ranging from beginner to music college
entrants, aged 6 - 18. They were recorded for a period of ten minutes practising a short piece of appropriate
standard, which they then performed. The task was part of normal examination procedure for the students.
The taped performance was assessed by two independent judges, marks being awarded out of ten for
overall impression, rhythmical accuracy, steadiness of pulse, notational accuracy, intonation, sense of
tonality and observation of marks of expression. Inter-rater reliability ranged from .82 to .96 (p=.0001).
The students were also interviewed using the same schedule as that used for the professionals.
Each interview was transcribed in full. The content of the tapes from the recordings of the novices was
also transcribed to give a detailed account of their activities while they were practising. This included
information about errors, their correction, stops, starts, poor intonation, inaccurate rhythm, faltering,
repetitions, etc.
Objectivity was established by insisting on agreement between three independent judges on the
categorisation of statements. Where there was disagreement about the categorisation of statements they
were discussed. Only where complete consensus was reached that a statement supported a particular
categorisation was it included in the analysis.
Despite the fact that all the professional musicians interviewed exhibited great sensitivity in performance
and had considerable technical skills, it became evident that there were indeed clear differences in the way
that practising was undertaken. Initial analysis of the data from the interviews and tapes of the novices also
indicated qualitative changes in the nature of expertise as it developed. This was particularly marked at
advanced levels, i.e. Grade 8 and above. The data from these students was therefore examined separately.
Findings: Professional Musicians
What emerged clearly from the data was the extensive metacognitive skills of the professional musicians.
They demonstrated acute self-awareness of their own strengths and weaknesses, extensive knowledge
regarding the nature of different tasks and what would be required to complete them satisfactorily and had
a range of strategies which could be adopted in response to their needs. This not only encompassed

technical matters, interpretation and performance but also questions relating to learning itself, e.g.
concentration, planning, monitoring and evaluation. Although there were similarities in some aspects of
their practice there was also considerable variation because of individual need. This was well illustrated by
statements from two musicians relating to their teaching.
"My pupils are very different from each other. Some are incapable of playing with any kind of
freedom at all, they are so rigid....their fingers go down like machines and so I encourage
them to get away from that. Others are incapable of playing a simple melody with the right
note values, they distort everything. There are two extremes. My pupils sometimes ask me
whether they can come and sit in on other lessons I give. I say, you are most welcome, there's
no secret in what I'm trying to do but I don't think you'll gain because I am only trying to help
that particular pupil at that moment".
"I think we all have our little idiosyncrasies in fingering because of the shapes and sizes of
our hands and the way we approach it. When I'm teaching I find out what suits them."
This acknowledgement of individual needs in relation to practice appeared consistently throughout the
analysis. It demonstrates metacognitive activity as central in determining the nature of the practice
undertaken. Differences were found in the regularity of practice, its content, the extent to which it was felt
necessary to warm up and the type of technical work undertaken (Hallam, 1995a). All these depended on
what the musician felt was necessary to maintain their standards of performance. There were wide
differences in the ways in which musicians prepared for performance. Some adopted an intuitive serial
approach to developing interpretation which evolved as they learnt the piece, others planned in advance,
listening extensively to recordings to develop their ideas, some were prepared to make spontaneous
changes in performance if they felt it was musically appropriate, even to the extent of creating technical
difficulties (Hallam, 1995b). Metacognitive skills were also demonstrated in relation to memorisation (for
details see Hallam, 1997a).
Learning new music
When learning new music all but one of the musicians initially acquired an overview of it, either by
playing it through or by careful examination of the score. Getting an overview of the work served technical
and musical purposes. It enabled the identification of difficulties, an assessment of tempo, which had
musical and technical implications, and a consideration of the structure of the work and the thematically
important material (Hallam, 1995a, b).
Difficulties identified
The difficulties identified by the musicians varied. This variability was, in part, due to their own individual
strengths and weaknesses.
"I would be looking for areas which I know to be my own weaknesses and therefore areas that
I have got to look out for particularly carefully".
"For me personally, semiquavers, fast passages, low notes are never any problem. If it's got
high notes in, it means I have to put in extra practice to build up strength to play it".
Some general trends did emerge, although there were exceptions. Passages that required performance at
the extremes of the instrument, high and low for wind players, high for string players were often seen as
problematic. Particular technical tasks relating to specific instruments e.g. double stopping, triple tonguing,
position changing, particular hand shapes for pianists were frequently mentioned. Generally, fast technical
passages were seen as requiring practice although, for some, once learned they posed no difficulties and
one reported:-
"I don't use the metronome for speeding things up, if anything I've had to use it for slowing

down".
What was clear was that all the musicians knew what for them was difficult and would in their initial
examination of a piece be looking to identify those passages for practice. They also had a range of
strategies available for that purpose.
Strategies
After the identification of technical difficulties, practice was undertaken to overcome problems. The
musicians had a repertoire of strategies which they were able to utilise as necessary to master differing
technical passages. To some extent these depended on the nature of the instrument itself. However some
general trends emerged. All of the musicians emphasised the importance of either cognitive analysis or
slow meticulous playing in the early stages of learning a new work. After this initial stage one of two main
strategies was adopted, repetitious or analytic (Hallam, 1995a). Practise was goal oriented but in some
cases did not focus on learning a particular piece but rather ensuring that technique was of a sufficiently
high standard to deal with difficulties as they arose. Where this strategy was adopted it tended to reflect the
limited nature of the repertoire of the instrument.
Marking the part
The musicians varied in the extent to which they marked things on the music. This depended on perceived
need. Some were reluctant to write anything.
"I try and remember.....I tend not to write much on the music".
"I don't like writing....I find if it is covered in marks I'm looking at the marks instead of the
rhythm and the notes. I find that very upsetting".
"I never mark in accidentals, I never mark in semitones, I don't go in for that at all. It doesn't
mean anything to me".
In contrast some relied heavily on marking parts.
"I write things in to help, very much so."
"I write a lot of things on the music. I have a memory like a sieve."
Others sometimes wrote things in the part.
"Eventually, the day before the show I eventually get round to scribbling a few things,
maybe".
Eleven of the musicians reported extensively marking information on their music, 2 reported making
moderate use of marking, 7 reported using it very little and 2 said that they did not mark things on the part
at all.
Organisation of practice
As with the other aspects of practice there was considerable variability in the level of organisation
reported. Some musicians reported being very well organised.
"I don't have time to waste sitting for hours hammering away ineffectively. If I know I've got
to do something I will do it as fast and as efficiently as possible. If I haven't got anything to
work for I will obviously be a lot more selective in what I'm motivated to practice. Whatever I
practise will be done efficiently and really properly".
"I can achieve a lot in a comparatively short time."

"I try and take a passage. I try and be systematic about it so that I don't always start in the
same place. I decide that today I'm going to take this chunk and work at it".
In contrast some musicians felt a lack of "natural" organisation.
"I don't think I've ever been a very organised practiser.....I wasn't very efficient
"If I don't have a routine its just a waste of time for me .....I fritter the time away".
Some musicians appeared to be aware of their lack of organisation and had taken steps to adopt strategies
to help.
"Well in the past what I would do is just, sort of toy with this bit and that bit and do the same
thing every day in the hope that eventually everything would gradually get better but I've
realised that that is not a very good way to do it. That you've really got to decide from the
word go which of the bits are really going to be the ones that need all the work and get down
to those straight away and that's what I try to do now".
"I would start working on it a month in advance and two weeks before the concert I would
learn it from memory.....which never works out because it tends to be a week before....I tend
to do most of my practice when I'm learning it from memory".
"I'm not terrible self-disciplined that's why I have these little schemes of time schedules and
building up towards the point of performance because in fact I'm very unself-disciplined. A
person who practised easily and more naturally wouldn't need this kind of organisation. So I
do find that my practice isn't always as I would desire it to be. I would like to start every
practise session with slow scales. And in fact I used to start when I did the Dvorak concerto
some three or four years ago, I practised for an hour to three quarters of an hour of slow scales
a day in four different keys which I found very satisfying, very pleasurable, but it's difficult to
get into a routine like that".
It seems as if these musicians are trying to compensate for some degree of natural disorganisation by
imposing schedules on themselves. Others reported difficulties with concentration while they were
practising.
"I get the metronome out. I'm a great believer in the metronome. Well it's a discipline.
When...if you're not feeling like practising......the metronome concentrates your mind in a way
that nothing else seems able to do, because you've got to concentrate on it.....you can't be
thinking wouldn't it be nice to be in the garden, or what am I going to do with my life."
Of the 22 musicians, 7 appeared to be low in "natural" organisation, 10 were moderately organised in their
practice and 5 considered themselves to be highly organised and efficient.
When concentration was considered, 14 reported no problems or good concentration, 5 reported
considerable difficulties, 3 reported some difficulties. When the relationships between reported
organisation and concentration in practice were examined, those musicians who were well organised (5)
also exhibited high levels of concentration, those who reported low levels of organisation (7) had either
low or moderate levels of concentration. While the moderate planners (10) tended to have high levels of
concentration (9). It seems that there is a relationship between level of organisation in practice and level of
concentration.
Preparing for performance
There was considerable individual diversity in relation to the level of preparation perceived as necessary
for performance itself beyond mastery of the work. Some musicians experienced considerable stage fright
but had developed strategies to cope:-

"I'm not a natural performer. I never was any good at it. Partly because I was pushed far too
young. I know that if I haven't done enough practice I am going to be scared out of my wits so
I try to make sure nowadays that I prepare for it properly so that I can have a reasonable hope
of getting through it. If it is a very big event I take a beta blocker".
"I used to think it would be worse for having psyched myself up but having read a book on
tension and these issues where they suggest that you actually should think of it and also being
aware that when the adrenalin comes on suddenly you're doing yourself more harm than if
you've got it gently.... it's been going for a couple of days or whatever".
"I regard that first playing through as practice for the occasion. Because on the occasion
you've got to play it through from cold".
"I have found that if I practise immediately before it that helps. I try and breathe deeply, you
know the usual things to try and calm nerves, the ordinary straight forward things."
In contrast others felt that they needed an audience to "psych" themselves up, improve concentration and
give the performance a "spark". Nervousness can have positive as well as negative effects.
"I don't do any particular preparation for performance. I tend to feel that some kind of
automatic response comes into operation. I enjoy playing to an audience in public and this
engenders its own enthusiasm, its own spark of creativity or whatever and there is no need to
"psych" oneself up in any way. On the occasion of the performance it is simply a matter of
being prepared, practice, technique and then the music will come by itself providing you have
done the groundwork. I find I don't need any special preparation for the performance
situation".
Others have a relative level of unconcern regarding performance to an audience.
"I'm too worried about actually playing the right notes. That's what I'm worried about, the
rhythm.........the audience is the last thing to worry about".
"I think the important thing if you are performing is to make your audience happy."
One musician described her relation with performance as love/hate. She is frequently physically ill before
performances but says that there is nothing she likes doing more.
The interviews also revealed that stage fright can be transient and unpredictable. One musician reported:-
"For a time I was afraid of being afraid".
Others reinforced this unpredictability indicating that even when playing the same programme in a series
where everything had previously been successful one could be overcome by nerves. Of the twenty two
musicians 4 reported low arousal levels feeling the need for an audience to enable them to perform better.
Twelve experienced moderate levels while six experienced high levels of arousal which created problems
for them in performance. These appeared to be related to levels of organisation and concentration in
practice. Those who reported high levels of arousal in performance also reported high levels of
concentration in practice and high or moderate levels of organisation. Those reporting low arousal levels in
performance consistently reported low levels of concentration and organisation in practice. To explore
these relationships in greater depth more sensitive measures need to be developed.
Findings: Novice musicians
Analysis of the data from the novices revealed six advanced students (Grade 8 standard) whose practice
was qualitatively similar to that of the professionals, although a rather "taken for granted" conception of
performance was adopted, none raised the issue of spontaneity vs planning in performance and there was

little evidence of specific "performance" preparation. The novices ranged in standard from beginners to
Grade 7 standard. The development of expertise and strategy use was explored (for details see Hallam,
1994; 1995a, 1997a). This paper focuses on issues relating to metacognition and planning.
Planning
There was an increase in the practice undertaken by 92% of the advanced students and novices as
examinations approached, with greater organisation of practice and more concentration on technical
aspects, e.g. scales. At other times the amount of time spent practising tended to depend on task
requirements. Not even those students contemplating a career in music felt that daily practice was
necessary to maintain standards. The number of days spent practising was correlated with Grade .12 and
age -.02, both non-significant. Total amount of practice did increase with expertise correlating with age .56
(p=.0001) and grade .51 (p=.0001). This was because the length of practice sessions increased.
Criteria were set out to distinguish high, moderate and low levels of planning in the prepared practice and
in normal practice. High levels of planning in the prepared practice were identified by evidence of the
completion of task requirements; speedy identification of difficulties; concentration of effort on difficult
sections; and integration of the sections practised into the whole for performance. Moderate levels of
planning were identified by completion of task requirements; evidence of on task behaviour but repetition
of large sections of the work rather than a focus on difficulties; and no integration specifically towards
performance. Low levels of planning were ascribed when the task was not completed; the first part of the
music was practised but not the remainder; and considerable amounts of time were spent off task. All of
the advanced students exhibited high levels of planning in their recorded practice, while 5 (12.5%) of the
novices did so. 28 novices (70%) showed moderate levels and 7 (17.5%) low levels.
In relation to daily practice, high planning was characterised by reports of specified aims of practice; a
consistent order of practice; self-imposed organisation of when practice was undertaken and a tendency to
mark things on the part. 4 novices (10%) and 2 advanced students (33%) were classified as having high
levels of planning in their daily practice. Moderate planning was categorised on the basis of some
organisation of when practice was undertaken; a planned order of practice when taking examinations and
evidence of some time organisation. 26 novices (65%) and 4 advanced students (66%) fell into this
category. Those categorised as having low planning skills reported practising when they had time;
constantly having to be reminded to practice, wasting time practising unnecessary material and being
disorganised in their work. 10 novices (25%) fell into this category. The advanced students demonstrated
considerable task planning in their prepared sight reading regardless of their normal planning of practice.
This level of planning may therefore be a feature of increased expertise and is perhaps a characteristic
necessary for becoming expert at playing a musical instrument.
Table 1 sets out the relationships between planning in recorded practice and organisation of daily practice.
The novices exhibited different levels of each kind of planning. As was demonstrated in the professional
group there may be a need for conscious cognitive planning to over ride what are perceived as more
"natural" planning mechanisms or levels of expertise.
Table 1
Novice and advanced students'
approaches to planning
Planning in recorded practice

Organisation in daily High Moderate Low

practice
High 2 (4%) 3 (7%) 1 (2%)
Moderate 7 (15%) 18 (39%) 5 (11%)
Low 2 (15%) 7 (15%) 1 (2%)
Performance
The advanced students exhibited a similar range of behaviours to the professional sample. Some were
excited at the prospect of performance, others realised that nervousness marred their performance. Unlike
the professionals these advanced students had generally not developed successful coping strategies.
Amongst the novices, 90% reported being nervous on the day of the examination but a minority (38%) of
these reported nervousness occurring for several days in advance, some experiencing extreme headaches.
Others (10%) reported no nerves at prospective performance, some were excited.
69% of the novices adopted some kind of strategy (or more than one) to overcome nerves. The most used
strategy was to play to someone else prior to the examination (21 students). The second most popular
strategy was practising itself (8) followed by doing a mock examination (7) or arranging to be tested (6).
These strategies were part of performance preparations. Other strategies were utilised during performance.
These included treating the examination as it it were a lesson (3), avoiding thinking about it (3) and
actively concentrating on the music (1).
For the novices examinations were considered more important than public performances. An advanced
student suggested that this might be due to their concrete outcome, i.e. a mark. Although strategies were
adopted in relation to stage fright they tended to be focussed on reducing the fear rather than as a positive
means of alleviating any detrimental effects on performance. This had clearly not developed the same
significance as for the professional group.
Lack of concentration in practice was not reported by the novices or the advanced students. Perhaps young
people are generally less aware of their own internal states except in the case of nervousness, which
because of its severe physical symptoms, is difficult to ignore. Perhaps lack of concentration in practice is
perceived as boredom, a reason for terminating practice rather than a study problem. For the professionals
with performance deadlines to meet and standards to maintain, this is not an option.
Conclusion
The findings indicate that 'expert' musicians have extensive metacognitive skills which enable them to
optimise their learning and performance taking account of their own strengths and weaknesses and task
and performance requirements. They adopt extensive support strategies to promote concentration in and
organisation of practice and to optimise arousal levels for performance. The strategies are adopted
consciously to compensate for what are perceived as 'natural' occurring deficiencies. It was possible to
identify strategy use in the advanced students and novices but strategies were less well developed and did
not have a well defined focus on optimising performance. Taking the evidence together planning
mechanisms seem to operate on three levels. Firstly, planning related to task completion which depends in
part on the level of expertise acquired, second automated planning /organisation which may be a relatively
consistent characteristic of the individual and thirdly, conscious, strategic planning which may compensate
for deficiencies in the other planning mechanisms. Research, utilising interval level measures is required to

elucidate these relationships further

References
Biggs, J. (1993). Theory to practice: A cognitive systems approach, Higher Education research and
development, 12(1), 73-85.
Dansereau, D. (1978). The development of a learning strategies curriculum. In H.F. O'Neil (ed), Learning
strategies. New York: Academic Press.
Dorner, D. & Scholkopf, J. (1991). Controlling complext systems In K.A. Ericsson & J. Smith Toward a
general theory of expertise: Prospects and limits. Cambridge: Cambridge University Press
Entwistle, N., Thompson, S., & Tait, H. (1992). Guidelines for promoting effective learning in Higher
Education. Edinburgh: Centre for Research on Learning and Instruction.
Ericsson, K.A. , Krampe, R.T., & Tesch-Romer, C. (1993). The role of deliberate practice in the
acquisition of expert performance, Psychological Review 100(3), 363-406.
Ford, N. (1981). Recent approaches to the study of teaching of "effective learning" in higher education.
Review of Educational Research, 51(3), 345-377.
Glaser, R. & Chi, M.T.H. (1988). Overview in M.T.H. Chi, R.Glaser & M.J.Farr The nature of expertise.
Hillsdale, New Jersey: Lawrence Erlbaum associates
Hallam, S. (1994). Novice musicians' approaches to practice and performance: Learning new music.
Newsletter of the European Society for the Cognitive Sciences of Music, 6, 2-10.
Hallam, S. (1995a). Professional musicians' orientations to practice: implications for teaching. British
Journal of Music Education, 12(1), 3-19.
Hallam. S, (1995b). Professional musicians' approaches to the learning and interpretation of music.
Psychology of Music, 23(2), 111-128.
Hallam, S. (1997a) The development of memorisation strategies in musicians: implications for education.
British Journal of Music Education, 14(1), 87-97.
Hallam, S. (1997b) Approaches to instrumental music practice of experts and novices: Implications for
education In H. Jorgensen and A.C. Lehman (Eds) Does practice make perfect? Current theory and
research on instrumental music practice. Oslo: Norges musikkhogskole.
Hallam, S. (1998). The predictors of achievement and drop out in instrumental tuition. Psychology of
Music, 26(2), 116-132.
Holyoake, K.J. (1991). Symbolic connectionism: towards third-generation theories of expertise In K.A.
Ericsson & J. Smith Toward a general theory of expertise: Prospects and limits. Cambridge: Cambridge
University Press
Howe, M. (1991). Learning to learn: A fine idea but does it work? British Psychological Society,
Education Section Review, 15(2), 43-57.
Jorgensen, H. (1997). Time for practising? Higher level music students' use of time for instrumental
practising. In H. Jorgensen and A.C. Lehman (Eds) Does practice make perfect? Current theory and
research on instrumental music practice. Oslo: Norges musikkhogskole.
Kolb, D.A. (1984). Experiential learning. Englewood Cliffs, New Jersey: Prentice Hall Inc.
Schon, D. (1987). Educating the reflective practioner: Toward a new design for teaching and learning in

the professions. Jossey Bass, San Fransisco.

Sloboda, J.A., Davidson, J.W., Howe, M.J.A., and Moore, D.G. (1996). The role of practice in the
development of performing musicians. British Journal of Psychology, 87, 287-309.
Sloboda, J.A. & Howe, M.J.A. (1991). Biographical precursors of musical excellence: An interview study.
Back to index

Introduction
Proceedings paper
Categorization of musical structures by 6-to-10 month old infants

Marc Mélen (1, 2, 3, 4) & Julie Wachsmann (1, 2)
(1) University of Liège
(2) Unit of Research in Psychology of Music
(3) Senior Researcher at the "Fonds National de la Recherche Scientifique
(4) "Centre de Recherches Musicales de Wallonie"
Introduction
Categorization is a fundamental process by which the subject decreases the complexity and the diversity of the social or physical environment
by organising it. A category can be defined as a class made up of a number of different items that are considered equivalent, yet discrete.
Thus categorization consists in bringing together objects or events into classes, and responding according to their status of member of a class
rather than to their peculiarities. Categorization is central to the development of flexible, parcimonious information processing; it is involved
in the identification and recognition of objects and in the assimilation and organization of new knowledges.
The fonction of category systems is to provide maximum information with the least cognitive effort. If all the perceptual representations
formed for the multitude of objects and their relations encountered during a lifetime were independent of each other, the outcome would be
mental disorganisation and intellectual chaos. Cognition and its development would be difficult, if not impossible, under such circumstances.
The information-processing advantages of perceptual categorization include organised storage of information in memory, efficient retrieval of
this information, and the capability of responding equivalently to an indefinitely large number of exemplars from multiple categories.
Given the adaptative importance of categorization, there has been a remarkable effort to study categorization skills in infants and toddlers
during the last two decades (for a recent review, see Quinn, 1998). Most of the studies were inspired by Rosch's theory (see Rosch and Lloyd,
1978) and its extension to children's development by Rosch and Mervis (for a synthesis, see Mervis, 1987). A classic distinction in this
context is made between subordinate-level (chihuahua, setter, etc.; or, velvet throusers, jeans, etc.), basic-level (dogs; throusers) and
superordinate-level (mammals; clothing) categories. The subordinate-level represents the less abstract level of categorization and brings the
largest quantity of information about the exemplars it contains. The superordinate-level, the most abstract level, is less informative but more
distinctive, in that it gives few information about its examplars but allows easy distinction between categories. The basic-level represents a
compromise between these extremes: it gives enough information to distinguish efficiently members of the category, i.e. it is informative,
while preserving distinctiveness, i.e. the distinction between the different classes of objects.
Following the seminal paper by Rosch, Mervis, Gray, and Boyes-Braem (1976), most of the studies made with very young children (less than
two years of age) aimed to describe their ability to form categories at the basic- and the superordinate-levels. These studies have mainly
focused on categorization of geometric forms (see, for example, Quinn, 1987; Lécuyer & Poirier, 1994); natural objects, both physical (see
Behl-Chadha, 1996 for categorisation of furniture by 3-4-month-old infants) and living objects (see, e.g., de Schonen and Deruelle, 1991 for
human faces; Mandler, Bauer, McDonough, 1993 for animals). By contrast, categorization of auditory events has been barely explored, with
an exception for speech stimuli (see, notably, studies by Kuhl and co-workers, e.g. Kuhl, 1991).
The present paper is concerned with categorization of musical motifs, a class of auditory events whose categorization is less studied (but see
Trehub & Thorpe, 1989 for categorization of rhythm and Clarkson & Clifton, 1985; Trehub, Endman, & Thorpe, 1990, for categorization of
timbre). Moreover for lack of reference to a general model of music perception, the few studies concerned with musical categorization do
demonstrate categorization of music stimuli, but give few information about the functional role of this capability.
Deliège suggested a model of music perception in which categorization plays a central function (see Deliège, 1987 for the seminal paper, and
Deliège & Mélen, 1997, for a recent synthesis). Indeed, as explained elsewhere (Deliège, this symposium), the model of sameness and
difference sees music perception as a vast process of categorization which begins with rhythmic grouping and grouping of groups thanks to
the extraction of cues, proceeds with perception of similarities between motifs located at different places in a piece based on the recognition
of the cues they share, and culminates through the organisation of the traces left by the cues, the imprints, in categories motifd around
prototypes (see Deliège, this symposium, for further details). Deliège makes use of Rosch's theory to understand categorization in music, and
suggests that the cues constitute the basic-level of categorization, i.e. each cue is the heading of a category located at the basic level. The
motifs which share a same cue while varying it represent the subordinate-level. Finally, at the superordinate-level the different cues share a
common reference: the style of the work.
Melen (1999 a & b), using an operant conditionning of head-rotations, demonstrated the relevance of this model for the development of music
perception in infancy. He showed that 6-to-10-month-old infants organize rhythmic grouping according to the principles of sameness and
difference. Therefore, these studies were concerned with simple demonstration of categorization. This paper represents a first step toward
study of more complex types of categorization in infancy. It aims to shed light on the processes through which babies collect different motifs
which share a common cue, i.e. are variations of a same cue, to form a category distinct from another one, both being situated at the
basic-level.
file:///g|/Mon/Melen.htm (1 of 8) [18/07/2000 00:32:03]

Introduction
Experiment 1
Methods
Participants
The participants were 24 healthy, full-term infants ranging from 8 to 10 months of age. Four infants were excluded from the sample, because
they failed to meet a predetermined training criterion (see below). The final sample comprised 9 males and 11 females, with a mean age of 8
months, 17 days.
Apparatus
The experiment was controlled by a computer (Macintosh SE 30 4/20), which monitored, through a program designed with the Max software,
the electronic equipement via a MIDI interface (Studio Opcode 2). The stimuli, composed with the software Performer, were generated on
line by a synthetizer (Yamaha TG-100) and presented via a stereo amplifier (Sony TA-F117R) and one loudspeaker (Yamaha Studio NS10-M).
A French cartoon for small children (Bouli) served as reinforcement and was displayed on a video monitor (Commodore 1702) thanks to a
Sony SLV-325 video recorder which was controlled by a home-made interface. A 15 watts green lamp was also controlled by that interface.
The assistant and the experimenter wore headphones (AudioTechnica ATH610) carrying music to mask the nature of the stimuli presented to
the infants.
Musical material
The Ländler n° 10 D145, op. 11 by F. Schubert served as stimulus. This piece is 16 bars long and can be divided in two periods of 8 bars,
comprising each two 4-bars phrases. The first two-bars expose 1 motif (1st bar) and a variation of this first motif (2nd bar), whereas the
second motif is two-bars long. However, a symetrical grouping can be considered as more parcimonious and preferable for methodological
reasons. Thus, in the present experiment, the piece has been considered as made up of two 2-bars motifs. In other words, the piece involves
two motifs (a and b), whose models (parent motifs a and b) are exposed at the beginning, the rest of the piece being composed of 3 variations
of these parent motifs. Thus, the piece corresponds to the following schema : Parent A - Parent B - A1 - B1 - A2 - B2 - A3 - B3.
Consequently, the piece can be described as involving two categories, each comprising 4 sequences (1 parent and 3 variations).
Procedure
During the session the infant was seated on the assistant's lap. The experimenter was seated in another room and monitored the experimental
session through a one-way mirror. The infant was presented repeatedly with a standard sequence (S), the parent motif A, separated by
500-msec silence intervals. When the infant was quiet and facing directly ahead to the flashing green lamp, the experimenter initiated a
training trial, at which time a contrasting sequence was presented once at an intensity 6 dB higher than the previously heard standard
sequence. During the training phase, the contrasting sequence consisted in the parent motif B. If the infant turned his/her head at least 45 d°
toward the loudspeaker during the presentation of the training sequence (T) or the 500-ms silent interval that followed (total response time : 5
seconds), the experimenter recorded the turn and a cartoon was displayed (6 sec) after the end of the contrasting sequence. The video monitor
was under the louspeaker, located 1 meter from the infant at an angle of 90 d°. Head-turns made at other times were not reinforced, nor were
head-turns of less than 45 d°. If the infant responded correctly on two consecutive trials, the intensity of the contrast stimulus was made
equivalent to that of the standard sequence. Subsequently, the infant was required to respond correctly on four consecutive trials in order to
meet the training criterion. Infants who failed to turn to the change on the first two trials were presented with the contrasting sequence 12 dB
higher than the standard on subsequent trials until they responded correctly on two consecutive trials, at which time the intensity of the
contrast stimulus was lowered 6 dB, and so on. The session was abandoned if the training criterion was not met within 20 trials. During the
testing phase, the standard sequence remained the same as in the training phase. The infant was presented with change and no-change trials,
the latter of which provided a measure of spontaneous rotation in the direction of the loudspeaker during the test session. The test session
comprised 36 trials, 12 no-change trials [Standard vs Standard, (S)] and 24 change trials [S vs Comparison, (C)], in random order. Among the
change trials, 3 consisted in variations of parent A and were considered as not-to-be-responded-to change trials (C-, i.e. C- 1, C- 2, C- 3) and
3 in variations of parent B which were considered as to-be-responded-to change trials (C+, i.e. C+ 1, C+ 2, C+ 3), each repeated 4 times. The
infant was expected to turn the head more frequently for the change trials that involved variations of parent motif B. The infants were divided
in two groups in order to counterbalance the categories (in the first condition: parent A, A1, A2, A3 were, respectively, S, C- 1, C- 2, C- 3,
and parent B, B1, B2, B3 were, T, C+ 1, C+ 2, C+ 3; in the second condition: parent B, B1, B2, B3 were, respectively, S, C- 1, C- 2, C- 3, and
parent A, A1, A2, A3 were respectively, T, C+ 1, C+ 2, C+ 3).
Results and discussion

Raw data consisted in the number of head rotations to change and no-change trials. Globally, the infants responded to 35.83 % of no-change
trials, to 48.75 % of change trials involving variations of parent A (C- trials) and to 57.92 % of change trials involving variations of B (C+
trials).
As suggested by Thorpe, Trehub, Morrongiello, & Bull (1988), the head-rotation procedure can be assimilated to a go-no go task, since for
each stimulation, the participant has one alternative: to turn or not the head. They further suggest that the data fit with the framework of the
Signal Detection Theory (SDT). Consequently, responses to change trials were considered as hits, whereas responses to no-change trials were
considered as false alarms. The proportions of hits and false alarms were converted into d', i.e. the index of the SDT, using a normal
distribution table following the conventional formula.
A d' was calculated for each type of change trials. Each actual d' was compared to the expected value in case of no-discrimination, i.e. zero. A
one-sample Student t test (1-tailed) was run out and lead to the following results : for the C- trials, d' = .73, t(19) = 3.349, p < .002; for the C+

Introduction
trials, d' = 1.186, t(19) = 3.361, p < .002. Thus both types of change trials have been discriminated by the participants.
The d' observed for both types of change trials were also compared to each other. Both analyses were run with a paired Student t test
(2-tailed): t(19) = -1.48, p > .14. Thus, both types of change trials were equally discriminated.
The experiment involved two conditions in order to counterbalance the role of the sequences. In the first condition the standard was parent A,
the training sequence parent B, the C- trials were variations of parent-A, the C+ trials were the variations of parent-B. In the second condition,
the role of the parent motifs were reversed, and consequently the sequences involved in C- trials and C+ trials. A 2 (conditions) X 6 (items)
between-within ANOVA taking item as within factor was run out on the number of head turns. Both main effects were non significant
(respectively, F(1,18) = 3.40, p = .08; F(5,18) = 1.84, p > .10). Interestingly, the interaction between condition and items was significant :
F(5,90) = 3.41, p < .007. The Newman-Keuls realised as post-hoc showed that all the variations elicited similar rates of responses in the first
condition, whereas a3 elicited significantly more head-turns than the other to-be-discriminated sequences in the second condition. And, this
percentage of responses was large enough to make the effect of condition nearly significant (see figure 1A). Moreover, a paired Student t test
(2-tailed) between the number of responses to the parent A and to each variation in the first condition, indicated a significant difference
between parent A and A3 (t(9) = -1.466, p < .002) but not between parent A and A1 or A2 (respectively, t(9) = -.166, .p > .60; t(9) = -.366, p
> .30).
The pattern of results presented sofar is far from ideal. One would have expected a non-significant d' for the not-to-be-responded-to trials in
the first analysis and a significant result in the second analysis showing a better discrimination for the to-be-responded-to trials. Nevertheless,
the fact that the different variations A and B were discriminated from their parents, indicates that the infants might distinguish the members of
both categories. In fact, the second condition seems to be closer to the ideal profile, since according to a paired t test (2-tailed, t(9) = -1.987),
the d' in this condition tended (p = .0782) to be larger for the to-be-responded-to sequences (d' = 1.937) than for the not-to-be-responded-to
sequences (d' = .999).
Consequently, the first experiment does not allow us to conclude to the existence of a categorization process, since the infants responded
virtually equally to both types of change trials. Moreover, the sequence A3 seems to have a special status, since it tended to elicit more
responses in the first condition and did provoke more responses in the second one. This can be referred to the high register of this passage,
that might have attracted infants' attention and made this sequence more distinctive than the other. Mélen (1997, 1999b) showed that register
change were powerful cues for 6-to-10-month old infants, and these results give further support to this suggestion. One could also say that the
categories were not easy to discriminate, since the features which distinguish them did not lead to higher proportions of responses for the
to-be-responded-to sequences. Even, if the d' for the to-be-responded-to sequences would have been significant in the second condition, it
could have been attributed to one particular sequence rather than to categorical features as such.
Two questions arise at the end of this experiment : 1) is A3 really dissimilar to the other sequences ? 2) Would categorization be manifest if
the difference between categories was increased ? Experiments 2 and 3, respectively, were run to address these questions.

Introduction
Experiment 2
Methods
Participants
Twenty female and 20 male young adults (mean age = 23 years, range = 15-29 years) took part to the experiment. They never received any
music training.
Apparatus
The experiment was controlled by a computer (Macintosh Classic 4/20), which monitored, through a program designed with the Max
software, the electronic equipment. The stimuli, composed with the software Performer, were generated on line by a synthetizer (Korg
05R/W) involving a built-in MIDI interface and presented via two self-amplified loudspeakers (Altec Lansing ACS31).
Musical material
The Ländler n° 10 D145, op. 11 by F. Schubert served as stimulus.
Procedure
The participants were tested individually and presented with 45 pairs of sequences, representing all the possible arrangements of sequences
Parent A, A1, A2, or A3, on the one hand, and all the possible arrangements of sequences Parent B, B1, B2, or B3, on the other hand. Three
control pairs were added, i.e. A vs A, B vs B, A vs B. Thus, there was a total of 15 pairs, each presented three times. The task was to evaluate
the degree of similarity between the elements of a pair. They used a Likert-type scale ranging from 1 (totally dissimilar) to 7 (identical) made
up of seven strokes of the keyboard. They responded by pressing the corresponding stroke. No time pressure was imposed. A pair of
segments replaced the other, as soon as the participant had given his/her response. The participants were allowed to respond only after the
second element of the pair. The experiment, followed by a debriefing, lasted about 10 minutes.
The three control pairs showed that the participants performed consistantly, since the mean rating was 6.8 (SD = .34) for the pair Parent A -
Parent A and 6.56 (SD = .51) for Parent B - Parent B, whereas it was 1.67 (SD = .80) for the pair Parent A - Parent B. These ratings are very
similar to those obtained by non-musician subjects for similar task in other experiments (Deliège, 1996).
Two one-way Anovas considering the pairs, excepted the control pairs, as within-factor was run out on the mean similarity ratings and led to
a very significant effect (for the A motifs: F(5,195) = 16.66, p < .0001; for the B motifs: F(5,195) = 27.51, p < .0001). A Newman-Keuls post
hoc analysis revealed that the participants considered all the pairs containing the sequence A3 as being dissimilar from the other pairs (p <
.05). The sequences B and B1 were judged quite similar since the participants gave an average rating of 6.1 to this pair, and this rating was
higher than any other rating (Newman-Keuls significant at p < .05). The similarity ratings for the other pairs were ranging from 4.2 to 4.66
(see figure 1B).

Introduction
This experiment confirms that sequence A3 appeared dissimilar to other sequences derived from the parent A. The results of the present
experiment must also be compared to those of the experiment by Mélen, Praedazzer and Deliège (this symposium), showing that 7-9 year-old
children did not consider this part of the Ländler as being a member of the category derived from the parent A. The high register of this
sequence could make it so particular that young listeners could have been misled and isolated it from its category. The infants, in the first
experiment, could have experienced the same misleading impression.
Experiment 3
Methods
Participants
The participants were 11 healthy, full-term infants ranging from 8 to 10 months of age. One infant was excluded from the sample, because
she failed to meet a predetermined training criterion. The final sample comprised 4 males and 6 females, with a mean age of 7 months, 6
days.
Apparatus
See experiment 1.
Musical material
The parent motif A from the Ländler n° 10 D145, op. 11 by F. Schubert in the first experiment and the Finale Rondo of Diabelli's Sonatine
n°2 served as stimuli. The latter is described in Koniari, Mélen and Deliège (this symposium). It was chosen because it presents the same
structure as the piece by Schubert. The parent motif B and its derivatives (B1, B2, B3) were extracted from the piece and opposed to the
parent motif A and its derivatives from the Schubert's piece. We thought this could enhance the distinctivity of the categories, since the
sequences of the Rondo are clearly different from the sequences of the Ländler except in terms of length.
Procedure
The same procedure as in experiment 1 was followed in the third experiment.
As in experiment 1, raw data consisted in the number of head-rotations to change and no-change trials. Globally, the infants responded to
21.62 % of no-change trials, to 25.83 % of change trials involving variations of parent A (hereafter, C- trials) and to 61.67 % of change trials
involving variations of B (hereafter, C+ trials).
A d' was calculated for each type of change trials. Each actual d' was compared to the expected value in case of no-discrimination, i.e. zero. A
one-sample Student t test (1-tailed) was run out and led to the following results : for the C- trials, d' = -.021,

Introduction
t(9) = -.202, p >.40 ; for the C+ trials, d' = 1.526, t(9) = 3.097, p < .007. Thus only change trials involving variations of B have been
discriminated by the participants. The d' observed for both types of change trials were also compared to each other using the Wilcoxon test, Z
= 2.7, p < .007. Thus, infants responded differently to both types of change trials.
Therefore, it can be concluded that infants' responses to C- trials were not different from what was expected by chance even when it involved
the sequences A1, A2 or A3, and that they responded preferably to the C+ trials.
Nevertheless, the sequence A3 still had a special status as demonstrated by a Kruskall-Wallis Anova comparing the d' associated to the C-
and the C+ trials according to the condition, i.e. according to the type of sequences that composed these change trials (variations A or B). The
d' for the C- trials did not differ from one condition to the other (Chi2(1) = .48, p > .45). Thus, the d' did not differ whether it involved the
variations A or the variations B. By contrast, the d' for the C+ trials was larger in the second condition, i.e. when the extract of Diabelli's
Rondo played the role of standard sequence and Schubert's Ländler played the role of C+ trials (Chi2(1) = 4.29, p < .04). Once again, the
sequence A3 of Schubert's piece is mainly responsible for this pattern of results, since it ellicited significantly more head-turns than the other
when it belonged to the to-be-responded-to sequences (see figure 1C).
In conclusion, the infants appeared to be more efficient in this experiment than in the first one, since they responded mainly to the C+ trials,
whereas the C- trials provoked less responses, even when it comprised the A3 of Schubert's piece.
The results of this experiment are more readily interpretable in terms of categorization as those of the first experiment. Indeed, the infants
neglected the variations of the standard sequence and responded to the variations of the comparison sequence. It could be concluded that they
detected the common feature to the sequences - the cue - and responded to the sequence that shared the same cue when it was reinforced by
the cartoon.
One could argue against the categorization hypothesis from the non-significant d' for the C- trials: this could mean that infants could simply
not discriminate between the parent motif and its variations because they could not perceive the differences between the items of the same
category. The answer to this argument is a matter of empirical evidence. This experiment is still to be done. In the meantime, our results give
several indirect supports to the categorization hypothesis. In the first experiment, a d' significantly different from zero was observed for the
C- trials of the first condition. Yet, in this condition the C- trials were made up of variations a of Schubert's piece. It revealed, therefore, that
infants did actually discriminate the variations of the Standard sequence from the Standard sequence itself. In the third experiment, when
Schubert served as the Standard category, infants categorized each sequence appropriately. However, for the second condition using
Diabelli's sequences as Standard category, it cannot be concluded that infants categorised so well, i.e. that they perceived each member of the
Standard category as distinct exemplars of that category. Nevertheless, we can suppose it was really the case, since there was no difference
between the two conditions for the d' associated to C- trials. If the infants categorized Schubert's sequences, there is no reason to conclude
they did not categorize Diabelli's sequences.
General discussion

Introduction
On the whole, the present study suggests that infants aged from 6 to 10 months were able to form musical categories. Yet, the results of the
first experiment were unconclusive, since it failed to demonstrate discrimination of the categories and suggested, by contrast, excessive
discrimination between the members of one category (those derived from the Parent A of Schubert's piece). We speculated that infants could
succeed if the task was made easier. Rather than to decrease the distinctiveness of the members of the categories, the distinctiveness of the
categories themselves was increased by presenting motifs coming from different pieces (Parent A and derivatives from Schubert's piece vs
Parent B and derivatives from Diabelli's piece). Tested with such items, infants proved to be able to attribute to each category its examplars,
while remaining able to discriminate the derivatives of Schubert piece's parent motif. Discrimination between the exemplars of the second
cue, the variations of the parent motif B of Diabelli's piece is not demonstrated directly. Nevertheless, we think we have enough indirect
evidence to conclude that, probably, the infants did discriminate between members of both categories.
Referring back to Deliège's model of music perception and to Rosch's model of categorisation, the present results suggest that infants can
build categories by collecting structural features that define these categories. Infants were able to extract the structural features of the parent
motifs - the cues - and to perceive new items as variations of these cues. They extended the size of the category by assimilating to the parent
motif the sequences that derivated from that motif.
Of course, this process is not errorfree. One constant in this series of experiment is the particular status of the sequence A3, the third
derivative of the Ländler by Schubert. Both infants (experiments 1 and 3) and adults (experiment 2) found this sequence very distinctive. In
experiment 1, the distinctivity of that sequence was even larger than the distinctivity between categories, prevented to conclude to
categorization. It must be underlined that older children experienced similar difficulties in classification of this part of Schubert's piece
(Mélen et al., this symposium). Moreover, this can be referred to data from Jolicoeur, Gluck, and Kosslyn (1984) showing that atypical
members of a category (like penguins or ostriches for the birds) are more readily classified at their own "entry level" than at the basic-level
(they are classified as, respectively "penguins" or "ostriches" rather than as birds). They are so different from the other exemplars that they
seem to form the first examplar of a new category, which turns out to be very restricted. Thus, even the errors made by the infants manifest
the underlying categorization process.
Further studies are needed to precise the structural characteristics on which the infants decide to accept or to reject a particular examplar as a
member of a particular category and to precise the internal structuration of the categories. Nevertheless, this paper can be considered as a first
demonstration that principles of sameness and difference underlye music perception in infancy at more complex levels than rhythmic
grouping.
References
Behl-Chadha, G.(1996). Basic-level and superordinate-like categorical representations in early infancy. Cognition, 60, 105-141,
Clarkson, M. G. & Clifton, R. K. (1985). Infant pitch perception: Evidence for responding to pitch categories and missing fundamental.
Journal of the Acoustical Society of America, 77, 1521-1528,
Deliège, I. (1987). Le parallélisme, support d'une analyse auditive de la musique : vers un modèle des parcours cognitifs de
l'information musicale. Analyse Musicale, 6, 73-79.
Deliège, I. (1996). Cue abstraction as a component of categorization processes in music listening. Psychology of music, 24, 131-156,
Deliège, I., & Mélen, M. (1997). Cue abstraction in the representation of musical form. In I. Deliège, & J. Sloboda (Eds.), Perception
and Cognition of music (pp. 387-412). Hove: Psychology Press.
Jolicoeur, P., Gluck, M. A., & Kosslyn S. M. (1984). Picture and names: Making the connection. Cognitive Psychology, 16, 243-275.
Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories,
monkeys do not. Perception and Psychophysics, 50, 93-107,
Lécuyer, R., & Poirier, C. (1994). Categorization in the five-month-old infants. Cahiers de Psychologie cognitive, 13(1), 193-509,
Cognitive, 13(1), 193-509.
Mandler, J.M., Bauer, P., & McDonough, L. (1993). Separating the sheep from the goats: Differentiating global categories. Cognitive
Mélen, M. (1997). Les principes du même et du différent comme organisateurs du groupement rythmique chez les nourrissons de six à
dix mois. Thèse de doctorat en psychologie non publiée, Université de Liège, Liège.
Mélen, M. (1999a). Les principes du même et du différent comme organisateurs du groupement rythmique chez le nourrisson:
Arguments théoriques. Musicae Scientiae, 3(1), 41-66.
Mélen, M. (1999b). Les principes du même et du différent comme organisateurs du groupement rythmique chez le nourrisson: Une
étude empirique. Musicae Scientiae, 3(2), 161-191,
Mervis, C. B. (1987). Child-basic object categories and early development. In U. Neisser (Ed.), Concept and conceptual development
(pp. 201-233). Cambridge (UK): Cambridge University Press.
Quinn, P. C. (1987). The categorical representation of visual pattern information by young infants, Cognition, 27, 145-179.
Quinn, P. C. (1998). Object and spatial categorization in young infants: "what" and "where" in early visual perception. In A. Slater
(Ed.), Percetual development: Visual, auditory, and speech perception in infancy (pp. 131-165). Hove: Psychology Press.

Introduction
Rosch, E., & Lloyd, B. B. (1978). Cognition and Categorization. Hillsdale, NJ: Lawrence Erlbaum Associates.
Rosch, E., Mervis, C. B. Gray, W. D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8,
382-439.
Schonen, S., & Deruelle, C. (1991). Spécialisation hémisphérique et reconnaissance des formes et des visages chez le nourrisson.
L'Année Psychologique, 91(1), 15-16.
Thorpe, L.A., Trehub, S.E., Morrongiello, B.A., & Bull, D. (1988). Perceptual grouping by infants and preschool children.
Developmental Psychology, 24, 484-491.
Trehub, S. E., & Thorpe, L. A. (1989). Infants' perception of rhythm: Categorization of auditory sequences by temporal structure.
Canadian Journal of Psychology, 43, 217-229.
Trehub, S. E., Endman, M. W., & Thorpe, L. A. (1990). Infants' perception of timbre: Classification of complex tones by spectral
structure. Journal of Experimental Child Psychology, 49, 300-313.
Back to index

ICMPC AL Paper 5/00
Proceedings paper
Musical Experiences, Musical Representations and Cognitive Abilities in Childhood

Alexandra Lamont, University of Leicester
Background and Significance

The issue of whether music can affect cognition has recently received much research attention (Costa-Giomi, 1997;
Hallam & Price, 1998; Rauscher et al., 1993; 1997; Rauscher & Zupan, in press). However, there is a need for further
research to explore these effects in more detail (cf. discussions by Overy, 1998; Rauscher et al., 1998; Weinberger,
1999, 2000). In particular, we need to specify more precisely what kinds of music can affect the mind and where,
when, how and why this might occur (Lamont, 1998b: 203). I summarise the current state of knowledge in four main
fields that have an impact on this area, and in so doing highlight specific requirements for further systematic and
detailed enquiry which will be partly addressed in the current study.
Music and Spatial-Temporal Reasoning: The most recent and robust evidence suggests that musical training seems to
have the greatest effects on other cognitive abilities, specifically children's spatial-temporal skills, with no effects on
general intellectual abilities or on spatial abilities. Longitudinal studies have shown that training is most effective if on
a keyboard instrument, for a period of more than 2 years, and begun before the age of 7 (Rauscher & Zupan, in press;
Rauscher, 2000). The cortical model proposed by Leng & Shaw (1991) and adopted by Rauscher and colleagues
ascribes these transfer effects to early brain plasticity. This implies that children older than 7 cannot be subject to any
further improvements in spatial-temporal reasoning as a result of musical training.
Yet studies with children aged 10-11 seem to result in similar enhancements in spatial-temporal reasoning
(Costa-Giomi, 1997). Research with adults also indicates that differences in spatial-temporal abilities can be explained
by later aspects of musical engagement. These include current engagement in musical activities (Plumb & Cross, 2000)
or having studied music theory alongside practical musical training (Lamont, 2000). These studies also indicate that
studying science subjects either at age 16-18 or at university can be related to spatial-temporal improvements for
university students. These findings highlight the need to undertake research in a more contextually-grounded manner,
spanning a wide age-range, and to include data on a wide range of non-musical abilities and experiences, in order to
further explore the assumed relationship between musical training and specific cognitive abilities.
Music and Achievement. Folk beliefs about the effects of music on academic achievement are prevalent (cf. Sloboda &
Davidson, 1996). Music instruction during childhood does not itself seem to result in straightforward improvements in
achievement in school-leaving examinations; university students with more years of music training tend to have higher
levels of academic achievement, yet gender and parental educational achievement are more powerful predictors of
differences in students' own levels of academic achievement (Lamont, 2000). Nonetheless, it is typically those children
with higher levels of academic achievement who are selected for music instruction, and thus this issue is carefully
examined here as an important component in the network of real-world relationships involving music and achievement.
Musical Representations: Studies indicate that whilst music instruction is important for developing higher-order
musical representations, musical experiences alone can also lead to sophisticated abstract representations of music
(Lamont, 1998a, 1998c). The current project extends these ideas to questions outside the musical field. Intervention
studies involving music instruction do not account for children's motivations to engage in musical activities (cf. O'Neill
& Sloboda, 1996), and it is likely that within the experimental groups studied there is also a diversity of levels of
understanding music. It is suggested that measuring musical representations, rather than measuring or providing music
instruction, may give a more accurate picture of the extent to which music has an impact on children's musical thinking
in a real-world context. It is proposed that whilst music instruction may lead to changes in cognition in other domains,
this may be mediated by an intervening variable of musical representation or understanding (that may develop without
formal music instruction), which therefore must be included in any systematic study of the effects of music on
cognition.
Musical Opportunities: Whilst all children between 5 and 14 years participate in compulsory school music education in
the UK, the opportunities for music instruction or participation in musical activities vary greatly, and often depend on
financial support from parents (ABRSM, 1994; Sharp, 1991). National surveys also illustrate a diversity in the kinds of
file:///g|/Mon/Lamont.htm (1 of 5) [18/07/2000 00:32:05]

ICMPC AL Paper 5/00
musical opportunities provided by the different local education authorities in the UK (Cleave & Dust, 1989).
Small-scale studies show that children's involvement with music instruction is strongly influenced by parental
encouragement and the qualities of music teachers (Sloboda & Davidson, 1996) in addition to children's own
motivations (O'Neill & Sloboda, 1996). Finally, university students from higher socio-economic categories who do
embark on musical training during childhood are more likely to continue with their training for longer periods of time
(Lamont, 2000). These diverse findings point to the need to include a broad range of variables relating to the children's
own and their families' engagement with music and the effects of school culture upon these opportunities, alongside a
consideration of socio-economic status, in any rigorous study of children's musical and cognitive development.
Significance: This research proposes a more complex theoretical model of the network of relationships between
children's musical and non-musical abilities and achievements (Figure 1 below), and provides a pilot test of the model
to assess the impact that music can and does have on children's thinking, in a highly ecologically valid context.
It is argued that the simple relationship between musical training and spatial-temporal abilities found in previous
research may be complicated by a number of hidden mediating variables. Improvements in spatial-temporal abilities
may be due to musical training, musical experiences, or academic achievement in science. Further, parental
socio-economic status and parental educational achievement may also play a large role in influencing many of the
variables shown below (particularly in terms of take-up of music instruction and children's academic achievements).
Finally, children's real musical experiences and understandings are carefully recorded and assessed (rather than
randomly assigning children to 'music' and 'no-music' conditions), and the network of variables outlined in Figure 1 are
considered as interrelated and integrated aspects of children's thinking.
Figure 1
Working Model of the Relationships between Music and Cognition
(Single arrows indicate unidirectional effects. Double-headed arrows indicate hypothesised bi-directional relationships.
Question marks indicate hypotheses are not yet proven. The absence of question marks indicates accepted causal
relationships.)
Aims
This research studies children's musical experiences in an ecologically valid context, incorporating real-world
preferences and opportunities into the design, in order to explore the relationships between children's musical and

ICMPC AL Paper 5/00
non-musical abilities and achievements. The project has two main research questions:
1) What is the relationship between children's musical experiences (formal music instruction, informal
musical experiences, and musical representations) and their academic achievements (English, maths and
science)?
2) What is the relationship between children's musical experiences (formal music instruction, informal
musical experiences, and musical representations) and their non-musical cognitive abilities (general,
spatial, and spatial-temporal)?
These two questions make no assumptions about simple or directional relationships, instead admitting the possibility
that improvements in either area may result in improvements in the other. For example, children with high levels of
academic achievement may be more likely to choose to and be allowed to participate in musical activities (particularly
music instruction), which are often seen as 'optional extras' that might be distracting for academically less able
children. Similarly, children who excel at specific academic subjects (notably mathematics and science) may also be
those who are able to or choose to achieve high levels of musical expertise. Both questions are contextualised by a
broad range of background biographical data concerning children's actual take-up of musical opportunities, aspects of
their home and school lives in relation to music, family involvement with music, parental educational achievement and
socio-economic status.
Method
Children from every year-group aged between 5 and 18 were drawn from different schools in the Midlands of England.
Data was collected in two stages.
Stage 1: This part of the study focused on the relationship between musical experiences and academic achievement
(Question 1 above). Data was gathered via questionnaires from both children and their parents regarding their
experience of formal musical training, informal musical activities, music in the home, and individual and institutional
preferences, opportunities and constraints on children's participation in music. Data was also collected on parents'
occupations and parental levels of education attained. Children's academic achievement was measured from school
records, Statutory Assessment Tests (taken by UK school children at age 7, 11, and 14 in English, mathematics and
science), and public examinations (taken at age 16 and 18) as appropriate. Class teachers, music teachers, and heads of
schools were also interviewed to explore the structure of musical opportunities afforded to the children and to shed
light on any unusual circumstances (either musical or academic).
Stage 2: This part of the study focused on the relationship between musical experiences and non-musical cognitive
abilities (Question 2). It comprised a series of cognitive tests, including group measures of musical representation (as
developed by Lamont, 1998c), measures of spatial-temporal and spatial reasoning (following Rauscher et al., 1997),
and measures of general analytic intelligence (Raven's Standard Progressive Matrices). Children's responses in Stage 2
were contextualised by the data gathered in Stage 1.
Results
Analysis of results is currently being undertaken and will be presented at the conference. The strategy for analysis will:
• explore the relationship between children's musical and non-musical abilities and achievements for each year
group. This will enable the effects of musical experiences and achievements to be more carefully assessed on a
range of cognitive abilities and achievements, and comparisons in both fields to be made on the basis of
children's age, amount and type of musical experiences.
• evaluate the contribution that thus-far neglected variables such as socio-economic status and musical
representations make to the profile of abilities and achievements. This will provide a preliminary test of the
model outlined above in terms of mediating factors between musical training and enhanced spatial-temporal
reasoning.
• focus on comparisons across different schools to assess the impact that music provision has on children's
musical and non-musical abilities. This will enable a consideration of how far children's socio-cultural context
affects their musical development in terms of preferences, attitudes, and achievements.
The results will be important in providing a preliminary empirical test of the model proposed above. This will set the
groundwork for a broader longitudinal study involving a larger sample of children which will trace the development of
children's musical experiences and non-musical cognitive abilities over time.

ICMPC AL Paper 5/00
References
ABRSM (1994). Making Music: The Associated Board Review of the Teaching, Learning and Playing of
Musical Instruments in the United Kingdom. London: Associated Board of the Royal Schools of Music.
Cleave, S. & Dust, K. (1989). A Sound Start: The Schools Instrumental Music Service. Windsor: NFER-Nelson.
Costa-Giomi, E. (1997). The McGill Piano Project: Effects of piano instruction on children's cognitive abilities.
Proceedings of the Third Triennial ESCOM Conference, Uppsala University, Sweden, 446-450.
Hallam, S. & Price, J. (1998). Can the use of background music improve the behaviour and academic
performance of children with emotional and behavioural difficulties? British Journal of Special Education,
25(2), 88-91.
Lamont, A. (1998a). Music, Education, and the Development of Pitch Perception: The Role of Context, Age,
and Musical Experience. Psychology of Music, 26, 7-25.
Lamont, A. (1998b). Response to Katie Overy's Paper, "Can Music Really Improve the Mind?" Psychology of
Music, 26 (2), 201-204.
Lamont, A. (1998c). The Development of Cognitive Representations of Musical Pitch. Unpublished PhD
Dissertation, University of Cambridge.
Lamont, A. (2000). University Students' Musical Experiences, Musical Representations, and Cognitive
Abilities. Paper presented to the SPRMME conference on The Effects of Music, University of Leicester, April.
Leng, X. & Shaw, G.L. (1991). Toward a Neural Theory of Higher Brain Function Using Music as a Window.
Concepts in Neuroscience, 2(2), 229-258.
O'Neill, S.A. & Sloboda, J.A. (1997). The Effects of Failure on Children's Ability to Perform a Musical Test.
Overy, K. (1998). Can Music Really Improve the Mind? Psychology of Music, 26(1), 97-99.
Plumb, J. & Cross, I. (2000). A generalised effect of music education. Paper presented to the SPRMME
conference on The Effects of Music, University of Leicester, April.
Rauscher, F.H. (2000). Musical Influences on Spatial Reasoning: Experimental Evidence for the "Mozart
Effect". Paper presented to the SPRMME conference on The Effects of Music, University of Leicester, April.
Rauscher, F.H. & Zupan, M.A. (in press). Classroom Keyboard Instruction Improves Kindergarten Children's
Spatial-temporal Performance: A Field Experiment. Early Childhood Research Quarterly.
Rauscher, F.H., Shaw, G.L. & Ky, K.N. (1993). Music and spatial task performance. Nature, 365, 611.
Rauscher, R.H., Shaw, G.L., Levine, L.J., Wright, E.L., Dennis, W.R. & Newcomb, R.L. (1997). Music training
causes long-term enhancement of preschool children's spatial-temporal reasoning. Neurological Research, 19,
2-8.
Rauscher, F., Spychiger, M., Lamont, A., Mills, J., Waters, A., & Gruhn, W. (1998). Responses to Katie
Overy's paper, 'Can music really 'improve' the mind?' Psychology of Music, 26(2), 197-210.
Sloboda, J.A. & Davidson, J.W. (1996). The young performing musician. In: I. Deliège & J. Sloboda (Eds.),
Musical Beginnings: Origins and Development of Musical Competence, Oxford: Oxford University Press,
171-190.
Sharp, C. (1991). When every note counts: The schools' instrumental music service in the 1990s. Slough:
National Foundation for Educational Research.
Sloboda, J.A., Davidson, J.W., Howe, M.J.A & Moore, D.G. (1996). The role of practice in the development of
expert musical performance. British Journal of Psychology, 87, 287-309.
Weinberger, N.M. (1999). Can Music Really Improve the Mind? The Question of Transfer Effects. MuSICA

ICMPC AL Paper 5/00
Research Notes, VI, II, Spring.
Weinberger, N.M. (2000). "The Mozart Effect": A Small Part of the Big Picture. MuSICA Research Notes, VII,
I, Winter.
Back to index

From: Miss Sofia Dahl
INTERPRETATION AND TIMING IN A SIMPLE DRUMMING TASK.
Sofia Dahl
sofia@speech.kth.se
Background:
Much attention has been paid to expressive timing in music performance but
relatively little of this research work has been devoted to percussionists.
Like all musicians, a percussionist will interpret the score, giving more
emphasis to certain notes. In percussions, the principal ways of giving a note
more emphasis are changes in duration and dynamic level. It seems reasonable to
assume that these two performance parameters are used more distinctively in
percussion playing than for most other instruments. This fact ought to be
clearly reflected even in the performance of a simple drumming task.
Aims:
This study investigates the timing in three subjects during the performance of
a simple drumming task.
Method:
Three professional drummers were recorded playing repeated single strokes with
interleaved accents every fourth note in three different tempi and at three
different dynamic levels. The separation of the strokes in time, the
inter-onset intervals (IOIs), were analysed and comparisons of the players'
timing performances under different conditions were studied.
In a listening test, subjects were presented some of the recorded sequences

with the amplitude information removed. The task was to sort sequences
according to the perceived grouping of strokes (groups of 2, 3, 4 strokes, and
no grouping).
Results:
Both long-term changes in tempo and short-term variations between adjacent

inter-onset intervals (IOIs) were observed. The spread in absolute values of
IOI was found to decrease with increased tempo and dynamic level. Traces of
cyclic patterns reflecting the task given in the score were discerned, but
generally not throughout the whole sequence. In the listening test, sequences
with a clear cyclic IOI-pattern were identified by many listeners.
Conclusions:
The timing in drumming shows long-term drift as well as short-term variations.

In some cases the timing information alone is enough to facilitate listeners'
grouping of the strokes into patterned sequences.
Back to index
file:///g|/Mon/Dahl.htm (1 of 2) [18/07/2000 00:32:06]

From: Miss Sofia Dahl
file:///g|/Mon/Dahl.htm (2 of 2) [18/07/2000 00:32:06]

Ruhr-Universität Bochum
Proceedings paper
ABSOLUTE RECOGNITION OF MUSICAL KEYS

IN NON-ABSOLUTE-PITCH-POSSESSORS
Oliver Vitouch & Andrea Gaugusch
Music Psychology Unit, Dept. of Psychology, University of Vienna, Austria
Background
Absolute pitch (AP; sometimes also "perfect pitch" or "positive pitch") is the ability to identify or produce tonal pitches
without reference to an external standard (Takeuchi & Hulse, 1993). This auditory long-term memory for pitches seems
to be based on the perceptual coding of tone chromata, is reported to have a low prevalence in humans (less than
1 : 10.000, at least in its "perfect" form), and has been subject to nature/nurture debates for more than a century (cf.
Stumpf, 1883; Meyer, 1899; Bachem, 1937, 1940; Miyazaki, 1988; Takeuchi & Hulse, 1993; Chin, 1997; for a recent
synopsis see Ward, 1999). Considering neurophysiological evidence, however, the principle of tonotopic organization
throughout the auditory projection pathway suggests that "absolute" information about pitches should be available on
every human's primary auditory cortex (see, e.g., Romani, Williamson, Kaufman & Brenner, 1982; Pantev, Bertrand,
Eulitz, Verkindt, Hampson, Schuierer & Elbert, 1995; Pantev, Oostenveld, Engelien, Ross, Roberts & Hoke, 1998).
Accordingly, it has been proposed that also non-possessors of AP may have access to "latent" long-term pitch
representations in auditory memory (e.g., for the pitches of well-known tunes): Long-term memory for musical keys in
spontaneous pitch production (Halpern, 1989; Levitin, 1994) or pitch recognition (Terhardt & Ward, 1982; Terhardt &
Seewann, 1983), i.e., active and passive "absolute tonality," can be understood as weakened forms of AP.
Aims
Using the 12 major key preludes from Johann Sebastian Bach's Well-Tempered Clavier, Terhardt & Ward (1982) and
Terhardt & Seewann (1983) showed that musically literate subjects without AP performed above chance in
discriminating original keys even from one-semitone transpositions. However, these experiments did not reliably exclude
short-term memory judgments based on tone intervals (i.e., relative pitch cues). We aimed to scrutinize the finding of
musical key recognition in non-AP possessors by using a 24-hours inter-stimulus interval for rigorous short-term
memory interference (Hall, 1982) and updated technical means ("identical replication" by digital transposition).
Study Design
We presented 52 students without manifest AP with the first prelude in C major from the Well-Tempered Clavier, either
in the nominal key or digitally transposed to C-sharp, and tested their ability to discriminate between these two keys.
Each condition was presented seven times in a random sequence of 14 trials, one trial per day. As the two versions were
identical except for their pitch difference of one semitone, subjects without AP were expected not to achieve a
discrimination rate above chance level (7 correct judgments or 50%).
Testing Material and Procedure
A digital recording of the C major prelude (BWV 846) was duplicated in C-sharp on an electronic piano (Yamaha
Clavinova, CLP-840); both versions were recorded with a Digital Audio Tape Deck (Pioneer D-05). N = 52
non-AP-possessors (mostly 17-18 years old students), who declared to be familiar with the piece, were tested in single
sessions, one trial per day only, on 14 subsequent days. Full-length recordings (2'13") were presented in random order, 7
times each, via stereo headphones. Other than in the Terhardt studies, no written musical material (score of the prelude)
was provided. While or after listening to the piece, participants had to judge which version they actually heard and to
mark their dichotomous decision on a response sheet. No feedback was given.
file:///g|/Mon/Vitouch.htm (1 of 3) [18/07/2000 00:32:07]

Results
Even with our rigorous testing mode, participants clearly outperformed chance (see Figure 1, left panel). With a mean hit
rate of 8.2 (59%, SD = 1.8 or 13%), the close-to-normal score distribution is significantly shifted to the right compared to
a chance distribution (one-sample t-test, p < .001, effect size δ = 0.7). Performance was still slightly better (M = 8.7 or
62% hits, SD = 1.5 or 11%; between-groups p = .017) in participants with piano playing experience (Figure 1, right
panel). Except for this familiarity / musical expertise effect, we found no other moderating effects, such as training
effects (trials 1-7 vs. 8-14, p = .542), or systematic choice preferences (e.g., for the "white" key; Takeuchi & Hulse,
1991).
Figure 1: Distribution of hit rates in the key recognition task
Conclusion
The small, but stable effect that we found points to the existence of a rudimentary ability for absolute pitch recognition.
There is increasing evidence about "latent" forms of AP being more widespread, at least among individuals with some
musical pre-experience, than traditionally assumed. Thus, it seems more adequate to adopt a continuum view of AP
instead of maintaining a discrete distinction between "possessors" vs. "non-possessors."
References
Bachem, A. (1937). Various types of absolute pitch. Journal of the Acoustical Society of America, 9, 146-151.
Bachem, A. (1940). The genesis of absolute pitch. Journal of the Acoustical Society of America, 11, 434-439.
Chin, C. S. (1997). The development of absolute pitch. In A. Gabrielsson (ed.), Proceedings of the Third Triennial

ESCOM Conference (pp. 105-110). Uppsala, Sweden: Uppsala University, Dept. of Psychology.
Hall, D. E. (1982). "Practically perfect pitch:" Some comments. Journal of the Acoustical Society of America, 71,
754-755.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition, 17, 572-581.
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies.
Meyer, M. (1899). Is the memory of absolute pitch capable of development by training? Psychological Review, 6,
514-516.
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors. Perception & Psychophysics, 44,
501-512.
Pantev, C., Bertrand, O., Eulitz, C., Verkindt, C., Hampson, S., Schuierer, G., & Elbert, T. (1995). Specific
tonotopic organization of different areas of the human auditory cortex revealed by simultaneous magnetic and
electric recordings. Electroencephalography and clinical Neurophysiology, 94, 26-40.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical
representation in musicians. Nature, 392, 811-814.
Romani, G. L., Williamson, S. J., Kaufman, L., & Brenner, D. (1982). Characterization of the human auditory
cortex by the neuromagnetic method. Experimental Brain Research, 47, 381-393.
Stumpf, C. (1883). Tonpsychologie (Vol. 1). Leipzig: Hirzel.
Takeuchi, A. H., & Hulse, S. H. (1991). Absolute pitch judgements of black- and white-key pitches. Music
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin, 113, 345-361.
Terhardt, E., & Seewann, M. (1983). Aural key identification and its relationship to absolute pitch. Music
Terhardt, E., & Ward, W. D. (1982). Recognition of musical key: Exploratory study. Journal of the Acoustical
Society of America, 72, 26-33.
Ward, W. D. (1999). Absolute pitch. In D. Deutsch (ed.), The Psychology of Music (2nd ed., pp. 265-298). San
Diego: Academic Press.
Back to index

PERCEPTION OF PITCH AND TIMBRE AND THE CLASSIFICATION OF SINGING VOICES
Proceedings paper
PERCEPTION OF PITCH AND TIMBRE AND THE

CLASSIFICATION OF SINGING VOICES
Molly Erickson
Department of Audiology and Speech Pathology, University of Tennessee, Knoxville
Introduction
Since vocal pedagogues cannot directly view the vocal mechanism, they rely on perceptual cues to
help them determine an individual's voice classification. Traditionally, voice classification has been
based on three perceptual parameters: range, timbre, and tessitura (Vennard, 1967); however, these
parameters are poorly defined and the interrelations between them are unknown.
To date, no research has been conducted that examines the interrelationship of pitch, tessitura, and
timbre as predictors of voice classification. Most research studies have focused on the acoustic
correlates one parameter, timbre.
The accepted definition of timbre is as follows: two tones are of different timbre if they are judge to
be dissimilar and yet have the same loudness and pitch (ANSI, 1973). To define timbre for the vocal
instrument, an additional restriction is required. Not only must the two sounds be of the same pitch
and loudness, they must also be of the same vowel quality. Using such a definition, a singer would
have an individual timbre for each pitch-vowel combination. Yet this is not how vocal timbre has
been treated traditionally. Cleveland (1977) states that an individual singer has a characteristic timbre
that is a function of the laryngeal source and vocal tract resonances. Singers with similar timbres,
then, constitute members of the same voice timbre type or voice category. It is possible, however, that
any two voices may be perceived as having similar timbre on one pitch-vowel combination and
dissimilar timbre on another. In this case, each voice possesses a set of timbres. It may not be possible
to devise one simple acoustic measure that can accurately classify voice timbre types.
Research has shown a correlation between timbre type classification and average formant frequency,
with basses having lower formant frequencies than tenors (Cleveland, 1977) and sopranos having the
highest formant frequencies (Dmitriev & Kiselev, 1979). Since vocal tract length is directly related to
formant frequency, it is believed that this physical attribute contributes to voice quality. Yet when the
data provided by Dmitriev and Kiselev are examined closely, there is some evidence that vocal tract
length may not be related to voice classification in females. They observed distinctly different and
increasingly shorter vocal tract lengths for the voice categories of bass, baritone, and tenor,
respectively. However, vocal tract lengths for the voice categories of mezzo-soprano and soprano
were nearly identical. Likewise, the center frequency of the fourth formant decreased and showed
little overlap for basses, baritones, and tenors, respectively. Yet a great deal of overlap in fourth
formant frequency was observed for mezzo-sopranos and sopranos. These data suggest that the
acoustic correlate of vocal tract length, formant frequency, may be a perceptual cue used to assess
voice classification in males, but does not appear sufficient to differentiate the traditional voice
file:///g|/Mon/Erickson.htm (1 of 11) [18/07/2000 00:32:12]

categories in females.
Acoustic and temporal cues have been shown to be important in the perception of speech and musical
instrument timbre. Every vibrating body has natural resonating frequencies. The natural resonating
frequencies of the vocal mechanism are known as formants and are know to influence timbre
perception. Additionally, four temporal parameters may be of importance in the perception of vocal
timbre: onset (e.g., Darwin, 1981; Grey & Gordon, 1978), and vibrato rate and extent (McAdams &
Rodet, 1988), and spectral variation. In cases where steady state spectral information is not sufficient
for timbre perception, such as instances where fundamental frequency is high and harmonics are
therefore widely spaced, these cues may provide additional acoustic information.
This paper attempts to investigate the perceptual validity of singing voice classification systems as
they relate to two parameters, pitch and timbre. Previous studies (e.g., Cleveland, 1977) have
examined the perception of voice classification using forced-choice paradigms based on the traditional
voice classification system of bass, baritone, tenor, alto, mezzo-soprano, and soprano. Such perceptual
experiments provide information as to how listeners place stimuli when provided with arbitrary
classification categories. They do not provide information on the perceptual validity of the categories.
The question that begs answering is this, if provided no classification system a priori, how do
listeners tend to group vocal stimuli? Do they group them in a manner that supports current
classification systems? When grouping vocal stimuli, is the perception of timbre truly independent of
pitch?
This study employed two research paradigms in order to examine the perceptual dimensions of
classical voice classification. First, multidimensional scaling procedures were used to discover the
dimensions underlying vocal timbre. Second, an "oddball" paradigm was used to assess whether
timbre, independent of pitch, can be used as a perceptual cue to group vocal stimuli into traditional
voice categories.
Method
Stimuli
Master's level singers from the Department of Music at the University of Tennessee, Knoxville have
provided stimuli for the experiment. These subjects met the following criteria:
1. Bilateral hearing within normal limits as determined by a 20 dB hearing screening;
2. Voice study at the Master degree level or higher;
3. No voice problems at the time of taping as determined by a certified speech-language
pathologist.
Two singers from each voice classification, mezzo-soprano and soprano, were recorded singing the
vowel /Y / on six different pitches, A3, C4, G4, B4, F5, and A5, at a constant loudness level. The
resulting 24 stimuli were used in two perceptual experiments.
Recordings were made in a single-walled sound booth (Acoustic Systems RE-144-S). Subjects were
recorded while producing a sustained /Y / for 5 seconds using digital audio tape recorder (Sony
PCMR500) and a Sennheiser MD 441-U microphone. Subjects stood in the center of the booth. Lip to
microphone distance was 12 inches. A keyboard was used to present pitches. Prior to taping, subjects
were allowed to vocalize freely and become comfortable with the recording environment.

Listeners
All listeners in this study were experienced vocal professionals. Listeners were recruited from the
Knoxville Choral Society and the Knoxville Opera Company. Twelve listeners were recruited that met
the following criteria:
1. Bilateral hearing within normal limits as determined by a 20 dB hearing screening;
2. Bachelor's degree or higher in a vocal arts related discipline (e.g., pedagogy, performance, or
choral conducting) or 5 years experience in a vocal arts discipline.
Procedure
Two listening experiments were conducted. Both experiments took place in a single-walled sound
booth (Acoustic Systems RE-144-S). Stimuli were presented binaurally through Sennheiser
headphones. Listeners entered responses using a computer monitor and a mouse.
Experiment 1
Experiment 1 utilized a dissimilarity paradigm. From the 24 vocal stimuli, all possible combinations
of two stimuli (A and B) were constructed, resulting in 276 paired trials. Within each trial, stimuli
were randomly assigned as A or B. Trials were presented in randomized order in an ABBA format to
12 experienced listeners.
Subjects were asked to rate each pair on a dissimilarity scaled (0-10). This was accomplished through
use of a horizontal scroll bar presented via a computer monitor. The left side of the scroll bar
indicated 0 while the right side of the scroll bar indicated 10. Subjects were instructed that they should
rate the two stimuli as 0 if they were identical and 10 if they were very different. Subjects were told
that they could use any measure between 0 and 10 to indicate the degree of dissimilarity. Subjects
were told not to use pitch as a factor in their ratings. All subjects were presented with test stimuli so
that they could become familiar with the computer interface and the task.
Experiment 2
For Experiment 2, trials of three stimuli were constructed, using an oddball paradigm. In this version
of the oddball paradigm, two of the three stimuli in each trial were produced by the same singer at two
different pitches (X1 and X2), while the third stimulus was produced by a different singer (Y). For
each singer, three same-singer conditions were constructed: one pairing the pitches G4 and B4 (XG4
and XB4), a second pairing the pitches C4 and F5 (XC4 and XF5), and a third pairing the pitches A3
and A5 (XA3 and XA5). For each singer and each condition, the "odd" stimulus (Y) was varied across
the three remaining singers and across the pitches A4, C4, G4, B4, F5, and A5. This design created
216 trials based on 4 singers times 3 conditions time 3 singer-pairs times 6 pitches. For each trial,
stimulus order was randomized. The resulting 216 trials were presented in random order to 12
experienced listeners.
Prior to stimulus presentation, listeners were told that two of the three stimuli in each trial were
produced by the same person and that they were to chose the stimulus produced by the different
person. Listeners were allowed to replay each trial as many times as they needed. Listener judgments
were recorded via a computer interface.

Results
Experiment 1
Distance measures obtained from Experiment 1 were subjected to multi-dimensional scaling analysis
(MDS). The optimal MDS solution was found in 3 dimensions. Fit measures for the solution were as
follows: Stress = .18 and R2 = .80.
Only the first two dimensions will be discussed in this paper. Dimension 1 was highly correlated with
pitch (R = .83). Dimension 2 appears related to voice category and was moderately correlated with F2
through F4 frequency (R = .73). Dimensions 1 and 2 for all four singers are presented in Figure 1.
Figure 2 displays mean values for dimensions 1 and 2 calculated for each voice category, mezzo
soprano and soprano.
Figure 1. MDS dimensions for Experiment 1

Figure 2. Mean MDS dimensions for sopranos and mezzo sopranos.
In general, experienced listeners rated same-pitch stimulus pairs from the same voice category as
more similar than they did same-pitch stimulus pairs from different voice categories. However, at the

pitches F5 and A5, listeners rated all stimulus pairs as very similar, regardless of voice category.
Pitch was a primary factor in all judgments. Within each singer, stimulus pairs with larger pitch
differences generally were perceived as being more dissimilar than those with smaller pitch
differences.
Experiment 2
For each X1X2 pair (XG4XB4, XC4XF5, and XA3XA5), the percent correct identification of the oddball
stimulus (Y) was calculated as a function of pitch for two comparisons: Y in the same voice category
as X1X2 and Y in a different voice category than X1X2.
Results for each X1X2 pair are presented in Figure 3. The plot labeled "Y in Same Voice Category as
X1X2" provides a graphic representation of the ability to discriminate differences within the same
voice category. Conversely, the plot labeled "Y in Different Voice Category than X1X2" provides a
graphic representation of the ability to discriminate differences between voice categories.
Figure 3. Percent correct identification of Y stimulus as a function of pitch for all three X1X2
conditions.

XG4XB4 Condition
In the condition XG4XB4, experienced listeners were able to select the "oddball" stimulus with a high
degree of accuracy. Y stimuli from a different voice category than X1X2 were accurately identified
regardless of pitch. However, when Y stimuli were from the same voice category as X1X2, accuracy
of Y stimulus identification decreased as the distance between Y stimulus pitch and XB4 decreased,
dropping from 100% accuracy to approximately 70% accuracy.

XC5XF5 Condition
For the condition XC4XF5, experienced listeners were less able to select the "oddball" stimulus than
they were in the XG4XB4 condition. Y stimuli from a different voice category than X1X2 were
accurately identified more often than those from the same voice category as X1X2. Accuracy levels
for this comparison ranged between 40% and 55%. Y stimuli from the same voice category as X1X2
generally were identified at or below chance levels. Unlike in the XG4XB4 condition, correct
identification of the Y stimulus did not decrease as the distance between Y stimulus pitch and either
X1 or X2 decreased. In fact, for Y stimuli in the same voice category as X1X2, peak percent correct
scores were achieved when Y pitch equaled either XC4 or XF5.

XA3XA5 Condition
Listeners identified the "oddball" stimulus least accurately in the XA3XA5 condition. While Y was
more accurately identified when in a different voice category than X1X2 than when in the same
category as X1X2, accuracy levels generally were at chance or less. For both comparisons, the greatest
accuracy was achieved when Y pitch equaled XA3, when Y pitch equaled XA5, or when Y pitch was
nearly midway between XA3 and XA5. In all other conditions, accuracy levels were far less than
chance.
Discussion
Results of Experiment 1 suggest that the following three factors affect the perception of dissimilarity:
voice category, pitch difference, and high pitch. It was shown that generally, listeners found voices in
different categories to be less similar than those in the same category. Listeners also found that

stimulus pairs were more similar when they were closer in pitch. Finally, listeners in general found all
stimuli to be highly similar at high pitches. Given these findings, several predictions concerning
Experiment 2 can be made:
1. Listeners should be more able to accurately identify Y stimuli when they are in a different voice
category than X1X2 than when Y is in the same category as X1X2;
2. Listeners' accuracy in Y stimulus identification should increase as the distance between Y
stimulus pitch and both X1 and X2 pitch increases.
3. Listeners should be less able to accurately identify Y stimuli when both Y and X2 occur at
pitches above F5 than when Y is in close proximity to X1 or X2 in a pitch range less than F5.
Effect of Voice Category

In general, listeners were able to more accurately identify a Y stimulus when it was in a different
voice category than X1X2. This was true in all three conditions with two exceptions: when the Y
stimulus pitch was A3 or A5 in the XG4XB4 condition and when Y stimulus pitch was A3 in the
XA3XA5 condition. The first exception is not surprising since the effects of pitch difference are
maximized in this condition. The interaction of pitch difference and voice category may result in the
obliteration of voice category effects at extreme pitches. The second exception is understandable only
when individual voice samples are considered. On A3, soprano 1 produced a stimulus with a
noticeably breathy quality. Thus, this stimulus may have been perceived as quite dissimilar from that
produced by soprano 2 on the same pitch.
Effect of Pitch Difference

XG4XB4 is the only condition where, as Y stimulus pitch changes, it increases its distance from both
X1 pitch and X2 pitch. As predicted, as the difference between Y stimulus pitch and X1 and X2 pitch
increases, accuracy of Y stimulus identification increases. In the other two conditions, XC4XF5 and
XA3XA5, X1X2 is separated by a large pitch difference. If pitch difference is a major factor in
perception of similarity, then listeners should achieve chance accuracy levels when Y stimulus pitch is
equidistant from X1 and X2 pitch. As Y stimulus pitch nears X1 or X2 pitch, accuracy rates should
drop below chance. Accuracy rates should lowest when Y is equal to or lower than X1 pitch and when
Y is equal to or higher than X2 pitch. This was not the case, suggesting that some other factor may be
operating simultaneously.
Effect of High Pitch

If pairs of high-pitched stimuli tend to be perceived as less dissimilar than pairs of low-pitched
stimuli, accuracy rates should be lower in conditions XC4XF5 and XA3XA5 when Y pitch equals X2
pitch than when Y pitch equals X1 pitch. This was generally observed in both conditions, but was
much more pronounced when Y was of a different voice category then X1X2.
Summary
This research suggests that listeners use multiple cues when determining similarity of vocal stimuli.
Acoustic cues associated with voice categories, pitch difference, and high pitch all are associated with

perception of dissimilarity. However, experienced listeners also appear to use other strategies for
making similarity judgments of vocal stimuli. Further study is needed. Specifically, similarity
strategies for experienced and inexperienced listeners should be compared through examination of
degree of accuracy and error patterns for various listening conditions.
References
American National Standards Institue. (1973). Psychoacoustical terminology. S3.20.
New York: American National Standards Institute.
Cleveland, T. F. (1977). Acoustic properties of voice timbre types and their influence on
voice classification. Journal of the Acoustical Society of America, 61, 1622-1629.
Darwin, C. J. (1981). Perceptual grouping of speech components differing in
fundamental frequency and onset-time. Quarterly Journal of Experimental Psychology.
A. Human Experimental Psychology, 33A, 185-207.
Dmitriev, L., & Kiselev, A. (1979). Relationship between the formant structure of
different types of singing voices and the dimensions of the supraglottic cavities. Folia
Phoniatrica (Basel), 31, 238-241.
Grey, J., & Gordon, J. (1978). Perceptual effects of spectral modifications on musical
timbres. Journal of the Acoustical Society of America, 63, 1493-1500.
McAdams, S., & Rodet, X. (1988). The role of FM-induced AM in dynamic spectral
profile analysis. In H. Duifhuis, J. Horst, & H. Wit (Eds.), Basic issues in hearing (pp.
359-369). London: Academic Press.
Vennard, W. (1967). Singing, the mechanism and the technique. New York: Fisher.
Back to index

Chaffin
PRACTICING PERFECTION: A CASE STUDY OF A CONCERT PIANIST
Roger Chaffin, Department of Psychology, University of Connecticut, and
Gabriela Imreh, Columbia Artists
Background.
The importance of deliberate practice for eminent performance is widely

acknowledged. Little is known, however, about how concert artists prepare for a
performance. A detailed description of the practice involved may suggest
characteristics of deliberate practice necessary to achieve eminence.
Aims.
The goal was to give a detailed description of a concert pianist learning a new
piece for performance. Studies of expert performance in other domains provided
hypotheses about characteristics to look for: early identification of
underlying structure, shaping of early decisions by long-term goals, extended
retrieval practice.
Method.
A concert pianist, the second author, recorded herself as she learned the third
movement (Presto) of the Italian Concerto by J.S. Bach. The pianist commented
on her goals as she practiced, and provided detailed reports about the formal
structure, about decisions made during practice (e.g., about fingering and
phrasing), and about retrieval cues attended to during performance. Effects on
practice were identified using multiple regression.
Results.
Several features of the pianist's practice were identified that may contribute
to eminent performance: Early use of the formal structure to organize practice;
early effects of interpretive and expressive features; extended practice of
retrieval cues; division of practice into intensive work and longer runs;
clustering of sessions separated by months without practice.
Conclusions.
The pianist's early identification of the formal structure and of interpretive

and expressive goals is reminiscent of the ability of physics experts to
identify the underlying structure of problems and of chess experts to
anticipate later developments in their opening moves. The pianist's
memorization techniques paralleled and those of expert memorists in other
domains without a large motor component (e.g., recalling digit strings). Other
characteristics, such as allowing the piece to "mature" without practice, have
not been previously identified as typical of expert performance.
Back to index
file:///g|/Mon/Chaffin.htm (1 of 2) [18/07/2000 00:32:13]

Chaffin
file:///g|/Mon/Chaffin.htm (2 of 2) [18/07/2000 00:32:13]

COGNITION AND AFFECT IN MUSIC LISTENING:
COGNITION AND AFFECT IN MUSIC LISTENING:
INTER-RELATIONS BETWEEN MUSICAL STRUCTURE, CUE ABSTRACTION AND CONTINUOUS

MEASURES OF EMOTION.
Stephen Malloch and Kate Stevens
Macarthur Auditory Research Centre Sydney
University of Western Sydney Macarthur
Australia
Background. Analysis and investigation of music from a psychological perspective has flourished during the past two
decades. Research has concentrated on examination of the perceptual and cognitive processes that mediate music
recognition, cue abstraction, and the identification of perceptual components of music. Recently, however, attention
has been given to the emotional aspects of music, such as the role of expressive timing and dynamics in performance,
the effects of emotion on performance and preference, and studies that utilise continuous emotional response measures.
However, the relationship between the cognitive organisation and segmentation of music and emotional response and
been neglected.
Aims. To test the hypothesis that emotional and physiological changes relate strongly to the perceived unfolding
musical structure, and that this perceived segmentation, in turn, relates to the music-theoretic and performance
structures.
Method. Both musicians and non-musicians were asked to listen to contrasting excerpts from pieces for string quartet.
Experiment 1 asked the participants to identify significant cues, and Experiment 2 identified the strength of temporal
placement of these perceptual segments within the musical framework. Using a 2-dimensional emotional space,
Experiment 3 measured both the emotion the participants believed the music was trying to convey, and the emotion
that participants experienced. Physiological changes in the participants were measured in Experiment 4. The results of
these experiments were compared with the structure of the pieces, both performed structure (measured through acoustic
analysis) and theoretic.
Results. Both musicians and non-musicians showed similar responses. Significant cues were found to correlate both
with significant emotional and physiological responses, as well as with important moments of theoretic and
performance segmentation.
Conclusions. Theoretic and performance musical segmentation has reality in perceived musical structure and in
emotional response.
Back to index
file:///g|/Mon/Malloch.htm [18/07/2000 00:32:14]

Proceedings paper
The Influence of Musical Characteristics on Style Preferences

of Elementary School Children
Katrina P. Boedeker
Indiana University - Purdue University Fort Wayne
2101 E. Coliseum Boulevard
Music Department
Fort Wayne, IN 46805-1499, (219) 481-6716
The Influence of Musical Characteristics on Style Preferences of Elementary School Children

The past three decades have seen increasing public and scholarly interest regarding the development and establishment of
musical preferences in children, adolescents, and adults (e.g., Wapnick, 1976; Hargreaves, 1982; LeBlanc, 1980; May, 1985;
Taylor, 1985; Arnett, 1992; Christensen, 1992; Hoffer, 1992). More than ever before, consumers are exposed to widely
diverse styles of music in public venues such as the marketplace, the media, the home, the place of education, and the
community of worship.
Music preference is a complex construct which has been defined and investigated in many different ways. Commonly
equated with attitude, the construct of preference is defined as "an act of choosing, esteeming or giving advantage to one
thing over another", (Price, 1986, p. 154).
Likert and semantic differential type self report scales which ask the listener to rate musical excerpts are widely used as
measures of musical preference (Miller, 1990; Wapnick, 1997). Such scales are not a viable option for use with younger
children and pictorial data forms have been developed and used as an alternative (May, 1985; Peery & Peery, 1986; Sims,
1987; Flowers, 1988; Giomo, 1993; LeBlanc, Chang Jin, Simpson, Stamou, and McCrary, 1998). These scales present
cartoon faces with varying degrees of expression which subjects are asked to choose from to describe a feeling, mood or
preference elicited by musical stimuli. A study which compared the test-retest reliabilities of pictorial scales and verbal
Likert-type response scales found both methods to be statistically reliable, with the pictorial scale preferred by a majority of
third, fourth, fifth and sixth grade participants (LeBlanc, Chang Jin, Simpson, Stamou, and McCrary, 1998).
Music preference decisions appear to be influenced by a wide variety of subject and stimulus variables. This is best illustrated
in the model of music preference developed and refined by LeBlanc (1980) and investigated in the bulk of the literature
(LeBlanc, 1981; LeBlanc and Cote 1983; LeBlanc and McCrary, 1983; LeBlanc, Colman, McCrary, Sherrill, and Malin,
1988; LeBlanc, Sims, Siivola and Overt, 1996). His model of music preference identifies and categorizes all of the elements
which influence a musical preference decision. Arranged in an eight level hierarchy with three main components, the model
suggests that preference decisions are made based upon many factors. These factors include the music (qualities of the
stimulus), the environment (the effect of outside peer and authority figures, and the listener (the effects of subject variables
and current emotional, physical and attention states). These factors account for nearly all the influences on preference
decisions examined in the literature until the present.
Much research has focused upon the social context in which preference decisions are formed and made, as seen in the
environment component of LeBlanc's model. The approval of authority figures (teacher, musical artists, disc jockeys) appears
to positively influence preference for non-rock music idioms such as jazz, electronic and symphonic music (Greer et al, 1973;
Alpert, 1982) while social environments (the approval of peers, and listening alone or with friends) negatively impacts
preference for musical genres (Rae, 1985; Finn@s, 1989; Thompson & Larson, 1994).
Age and gender, listener characteristics in the LeBlanc model, also influence preference decisions, especially as they relate to
musical style. Studies investigating the relationship between age and preference have found that children up to age eight
express wider ranges of preference for varying musical styles, while there is a marked decline in preferences for non
file:///g|/Mon/Boedeker.htm (1 of 10) [18/07/2000 00:32:17]

"popular" styles after this age (LeBlanc, 1981; Boyle, Hosterman and Ramsey, 1981; May, 1985; Sims, 1987; LeBlanc,
Colman, McCrary, Sherrill, and Malin, 1988; Hargreaves, Comber, and Colley, 1995; LeBlanc, Sims, Siivola, and Obert,
1996). Examination of the differing preference judgements of males and females reveals gender effects at all age levels.
Hargreaves, Comber and Colley (1995) noted that secondary school females expressed broader preferences for a wider range
of styles than did boys, while Christensen and Peterson (1988)found significant differences in the ways which female and
male college students categorize and prefer popular music styles. Early elementary children's preferences also reflect gender
effects, even as early as first grade (May, 1985).
Qualities of the stimulus such as tempo, rhythm, melody and genre also contribute to musical preference decisions. Research
which has examined the influence of differing tempos on the preference judgements of school-aged and older participants has
found strong evidence to support a positive relationship between preference and tempo. Sims (1987) found preference
decisions for classical music stimuli were related to both the age of participants and tempo markings of the selections
presented, with faster tempos increasingly preferred by fourth graders, the oldest age group tested. These results were
corroborated by LeBlanc, Colman, McCrary, Sherrill and Malin (1988) who measured the effect of four levels of tempo on
preferences of listeners representing six different age levels. Their findings indicated a significant preference for faster tempi
at every age level. Similar results were obtained by Montgomery,(1996) who found that participants in kindegarden through
grade two did not significantly differ in their preferences for varying tempos of early romantic opera, while participants in
grades three through eight significantly preferred faster tempos. Similarly, the musical characteristics of mood, melody and
rhythm contributed to higher preference ratings of popular music (Boyle et al, 1981) while fast tempos, loudness, and bright
timbres were among the characteristics related to higher preferences of unfamiliar world music (Fung, 1996).
The variety of influences upon music preference judgements such as qualities of the music, environment, or the listener can
be seen in the literature. Experiment I sought to determine the effects of the subject characteristics of grade and gender on
preference for musical styles represented in the media today: rap, pop-rock, alternative, jazz, and classical. Because previous
studies have shown performance medium to be an important influence on such decisions (LeBlanc & Cote, 1983), all music
selections were presented in an instrumental-only format . Since tempo and activity level have also been found to influence
preference judgements in past studies (LeBlanc & McCrary, 1983; LeBlanc & Cote, 1983; Moskovitz, 1992), two "activity
levels", stimulative and sedative, were presented for each style category. These activity levels were based upon characteristics
outlined by Radocy and Boyle (1979, p. 240-241). Stimulative music selections were chosen based upon percussive beat,
faster tempos, louder dynamics and strong rhythmicity while sedative selections were based upon criteria such as sustained,
legato melodies, slower tempos, softer dynamic levels and less percussive rhythms.
Experiment I
The following research questions were addressed:
1. Are there differences in preferences of boys and girls for activity level (stimulative/sedative) and style
(jazz/pop/rap/alternative/classical) of presented musical selections?
2. Are there differences in preferences of first, second and third graders for activity level (stimulative/sedative) and style
(jazz/pop/rap/alternative/classical) of presented musical selections?
3. Is there an interaction between grade and gender in preference responses of first, second, and third grade boys and girls for
activity level (stimulative/sedative) and style (jazz/pop/rap/alternative/classical) for presented musical selections?
Method
Participants. Participants for this study included 349 children (158 males, 191 females) from three intact school populations
in a small city in the Midwest. Participants were drawn from Grades 1 through 3, and represented both urban and inner city
areas. All students who gave informed consent and were present during the day of testing were selected for the study.
Apparatus Preference was measured via an instrument developed by the author for this study. Subjects were asked to listen to
ten 27-35 second musical selections, arranged in random order, and to indicate via self-report their preferences on a pictorial
response form. This form was similar to those used in previous preference research with children (May, 1985; Brown, 1978;
Kuhn, 1976) and provided options for students to select a cartoon face from one of five faces representing varying degrees of
smiles and frowns. Four items were addressed for each musical selection using a non-numerical continuum of faces:
like-don't like, cool-not cool, happy-sad, and my friends would like this-my friends would not like this. Additional
information regarding participant age, number of older siblings, grade, gender, and school was also obtained on the first sheet
in the questionnaire booklet.
The musical selections included stimulative and sedative excerpts from five musical styles: rap, alternative, rock/pop, jazz
and classical, with a total of ten stimuli (plus one practice item). Stimuli were selected based upon their inclusion in a music
style category by major compact disk and audiotape sales companies, performance medium (instrumental only), and sedative
versus stimulative musical characteristics. The complete list of excerpts along with generic labels, in shown in

Table 1.
Musical Excerpts by Style, Activity Level, Title and Artist
Style/ActivityLevel Title Artist
Rap/Stimulative "Wax the Nip" Aphex Twin
Pop/Sedative "River" Natalie Merchant
Classical/Stimulative "Symphony No.4, Allegro Mendelssohn, F.

Vivace"
Jazz/Stimulative "This I Dig of You" Mobley,H.
Alternative/Sedative "One More Time" The Cure
Jazz/Sedative "Hi-Lili,Hi-Lo" Desmond,P.
Rap/Sedative "Alberto Balsam" Aphex Twin
Pop/Stimulative "Classic Instrumental Mariah Carey
Classical/Sedative "Andante for Flute Mozart, W.A and Orchestra"
Alternative/Stimulative "Shiver and Shake" The Cure
Note: Selections are listed in order of presentation.
Procedure
Participants were tested in intact music classroom groups. Every attempt was made to control for problems with group
testing, by arranging classroom chairs in rows facing front with at least 2 feet between each chair. Participants were asked to
listen quietly to the musical excerpts without moving or making faces which would indicate their response to the music and
were led through the four test items for each stimuli presented. Selections were ordered by page color to maintain group
cohesion to the process.
Results
Data were converted from the self-report forms to numerical equivalents and analyzed via the Statistical Program for Social
Sciences (SPSS). A five point scale was used to compute means and standard deviations for grade and gender groups with
negative anchor statements weighed as one point, and positive statements weighed as five points.
Reliability analyses were completed to determine whether the four items measuring preference responses to the music were
correlated with each other . Alpha coefficients reflecting the degree of correlation between the items were .7798 for
Pop/Stimulative, .8445 for Pop/Sedative, .8829 for Rap/Stimulative, .7709 for Rap/Sedative, .8839 for Classical/Stimulative,
.8749 for Classical/Sedative, .9140 for Alternative/Stimulative, .8435 for Alternative/Sedative, .8879 for Jazz/Stimulative,
and .8842 for Jazz/Sedative. Items were highly correlated with each other so the responses to the four items were totaled into
a single composite score for each musical stimulus, with a possible range from 4-20.
Examination of means and standard deviations in Tables 2 and 3 indicates that for all participants Popular/Stimulative,
Rap/Stimulative, Alternative/Stimulative and Rap/Sedative were generally preferred the most, while Classical/Sedative,
Jazz/Sedative, and Classical/Stimulative were preferred the least. These findings support those of prior studies by May
(1985), Greer et al, (1974) and LeBlanc (1979) who found that rock and popular music forms were most preferred by
children.
Table2
Means, Standard Deviations,and Univariate Analysis of Variance of Grade Differences in Music Selection Preferences
Grade

Stimuli First Second Third Total F
(n=121) (n=138) (n=90) (N=349)
Alt/Sed 15.36 13.63 13.87 14.29 5.451**
(4.45) (4.78) (4.05) (4.54)
Alt/Stim 18.43 17.58 16.17 17.51 7.853***
(3.35) (4.38) (4.64) (4.21)
Cla/Sed 11.81 9.55 9.63 10.36 8.425***
(5.28) (5.16) (4.27) (5.08)
Cla/Stim 13.14 12.22 11.53 12.36 2.953*
(5.18) (5.04) (4.46) (4.97)
Jaz/Sed 13.09 11.16 11.42 11.90 5.124**
(5.14) (5.32) (4.75) (5.18)
Jaz/Stim 14.68 14.51 13.33 14.26 2.326**
(5.23) (4.62) (4.30) (4.78)
Pop/Sed 14.42 13.61 12.44 13.59 5.017**
(4.86) (4.67) (4.19) (4.67)
Pop/Stim 18.50 18.91 17.70 18.46 4.808**
(3.04) (2.57) (2.98) (2.88)
Rap/Sed 16.88 17.88 15.77 16.99 7.519**
(3.96) (3.57) (4.58) (4.06)
Rap/Stim 18.07 18.46 17.48 18.07 2.600
(3.22) (3.06) (3.40) (3.22)
Note: Style and activity levels are abbreviated; standard deviations are in parentheses.
*p < .05
** p < .01
*** p < .001
Table 3
Gender
Stimuli Male Female Total F

(n=158) (n=191) (N=349)
Alt/Sed 14.16 14.39 14.29 .270
(4.53) (4.55) (4.54)
Alt/Stim 18.39 16.79 17.51 13.441*
(3.26) (4.74) (4.21)
Cla/Sed 8.70 11.72 10.36 28.253*
(4.77) (4.94) (5.08)
Cla/Stim 11.23 13.29 12.36 14.114*
(4.93) (4.83) (4.97)
Jaz/Sed 11.68 12.07 11.90 .423
(5.41) (4.99) (5.18)
Jaz/Stim 13.92 14.55 14.26 1.375
(4.74) (4.81) (4.78)
Pop/Sed 13.55 14.56 13.59 18.894*
(4.30) (4.20) (4.67)
Pop/Stim 18.59 18.35 18.46 .695
(2.75) (2.99) (2.88)
Rap/Sed 17.07 16.93 16.99 .074
(3.69) (4.06) (4.06)
Rap/Stim 18.73 17.52 18.07 11.176*
(2.35) (3.71) (3.22)
Note:Style and activity level are abbreviated: standard deviations are in parentheses.
*p<.001
The results of this study suggest that Rap and Alternative music may also be added to the list of popular musical styles
preferred by the children of today. In addition, for every style grouping, the stimulative selection received a higher score, thus
indicating that "upbeat" rhythmic selections were more preferred than sedative, "flowing" selections.
The variable of grade is also of interest. First graders rated 7 out of the 10 musical selections more highly than did second and
third graders. This suggests that first graders had less specific and greater breadth of preferences for wider varieties of
musical style and activity level. In contrast, the third grade scores were consistently lower than first and second graders for 7
out of 10 selections, thus indicating a narrowing of depth in preference responses with advancement in grade level. The main
effects of gender and grade on preference scores were analyzed through the performance of a multi variate analysis of
variance (MANOVA). Preliminary diagnostic testing for the assumptions did not include any serious deviations from
normality and tests for homogeneity of variances did not show significant differences. The test assumptions for MANOVA
were met except that of independent observations, but since many precautions were taken in the procedure to address this

issue, and the data approximated a normal population, variances did not vary by more than one to two and a half times, thus
indicating that MANOVA was an appropriate test for this data.
When considering gender as a main effect, significant differences were found (F=6.834 (10,334), p < 001.). Significant
differences were also found for grade as a main effect (F=3.492 (20,670), p < 001.). No interaction effects were found
(F=1.064 (20,670), p =.385.
Separate between subjects univariate analyses of variance (ANOVA) and post hoc analyses were performed to determine
which style and activity level differences contributed the most to main effect differences. As seen in Table 3, boys
significantly preferred the Rap/Stimulative and Alternative/Stimulative selections (p < .001), while girls significantly
preferred the Classical/Sedative, Classical/Stimulative, and Pop/Sedative selections (p < .001). When grade was considered,
there were significant differences for several musical selections, as seen in Table 2.
ScheffJ tests revealed that first graders preferred the Alternative/Sedative stimuli more than second and third graders (p =
.009 and p = .060) and Alternative/Stimulative selections were significantly preferred by first graders and second graders
when compared to third graders (p < .001, and p = .038, respectively). The Classical/Sedative selection was preferred more
by first graders than second and third graders (p = .001 and p = .005, respectively), and the Jazz/Sedative selection was also
significantly preferred by first graders more than second (p = 011). The Pop/Sedative stimuli was preferred significantly more
by first graders than third (p = .008), while second graders significantly preferred the Pop/Stimulative and Rap/Sedative
selections than did third graders (p =.008, p = .001, respectively).
Discussion
Examination of the items used to determine preference indicates that mood effects (happy-sad), environmental effects of
popularity (cool-not cool, my friends would like this- would not like this) and preference (like-dislike) items are all
components of the preference response in early elementary school children, as indicated in the high reliability coefficients for
these items for each style and activity level combination presented in the musical selections.
The influence of gender on preference responses was significant. Boys preferred the Alternative/Simulative and the
Rap/Stimulative stimuli, while girls more significantly preferred both Stimulative and Sedative Classical selections, and the
Pop/Sedative selection. This gender effect seems to indicate that regardless of grade level, girls are less influenced by popular
music styles and demonstrate more breadth in preference response for non-popular musical idioms than do boys. The strong
rhythmicity of the two Stimulative selections preferred by the boys may have strongly affected their responses to these two
selections, and may indicate a male preference bias towards more percussive, rhythmic stimuli.
Examination of grade level effects was even more significant. Throughout almost the entire spectrum of stimuli, first graders
demonstrated greater positive preferences for all styles and activity levels of music, as indicated by high scores, except for the
Classical/Stimulative, Jazz/Stimulative and Pop/Stimulative selections. Preferences stated by second and third graders were
lower than those of first graders for all styles and activity levels of the musical stimuli presented, even those of popular
idioms such as Popular/Rock, Alternative and Rap music.
The combination of the variables of grade level and gender did not yield significant differences in preference, although there
were differences in grade level and gender when considered separately.
This study has replicated the findings of prior research regarding music preferences of early elementary school children,
despite many changes in today's popular musical styles. Preference still appears to be more intense and responsive to a wider
variety of musical styles and activity levels at grade level one, than in ensuing years. With advancing grade level, preference
judgements are more restricted, with positive preference most strongly associated with popular musical styles and more
stimulative activity levels.
It is clear that activity level is an important factor in the preference decisions of early elementary school children.
Stimulative music appears to be highly preferred by children and may result in more positive responses to non-popular music
idioms than does sedative. However, it is unclear which musical characteristics and aspects of musical style participants
responded to when making their preference judgements. Experiment II was designed as a follow-up of the first study to
explore further which musical characteristics (present in varying styles and activity levels) tend to be associated with high
and low preference judgements of participants.
Experiment II
Method
Participants. Eleven music therapists and seventeen music educators (28 judges in all) were asked to rate the observable
audible musical characteristics of each excerpt presented to participants in Experiment I, thus providing a "musical picture"
of the salient features of each selection. These participants were obtained via small group presentations, and consented to

serve as judges for this follow-up study.
Apparatus
A ten item Likert scale adapted from a similar tool used in preference research by Resch (1996) was used as the rating tool by
judges. Judges listened to all of the excerpts two times and rated their musical characteristics via the 10 items, each anchored
by the opposite extremes of each musical characteristic. Scores were assigned from one to seven for each of ten items
(Appendix).
Results
Chi square analysis was performed on the music therapist and music education judge scores and no significant differences
were found in their ratings of musical characteristics. Their scores were therefore combined into one group. Modal class
scores were then determined for each characteristic, in each musical style and activity level category. Mean preference scores
for each excerpt were split into high and low preference categories, with six selections in the high preference group (two
excerpts had the same score) and four selections in the low preference group. Modal scores for each of the styles and activity
levels of music are presented in Table 4.
Table 4
Modal Scores by Selection Order and Musical Characteristic
Order Tempo RhyI RhyII Pitch Timbre Dynamics Key Text Perf Affect
*1 5 7 2 4 4 5 4 5 5 5
2 4 4 6 4 4 4 5 3 2 3
3 6 4 6 6 6 5 7 6 6 7
*4 5 4 3 4 5 5 6 4 4 4
*5 4 5 6 4 3 5 4 4 4 2
6 4 4 6 3 4 3 5 3 3 3
*7 4 7 3 3 4 4 4 4 3 3
*8 5 6 5 5 5 5 6 4 4 6
9 2 2 6 7 6 4 6 5 5 4
*10 6 7 6 4 4 7 4 6 3 2
Note: Some musical characteristics are abbreviated. Numbering represents order of presentation to participants. Asterisks
indicate highest preferred selections.
Preliminary data analysis (in progress) of the scores included the creation of dot plots for each individual characteristic by
preference group. For all participants, regardless of grade level and gender, rhythm 1 (smooth versus percussive rhythm)
appeared to be a discriminative variable. Songs which had moderate to high levels of percussiveness were associated with
high preference scores while songs which were smooth in rhythmic texture were associated with low preference scores.
Faster tempos also appeared to be associated with high preferences, although songs which were not preferred represented a
wide range of tempos. Regularity of rhythm also appeared to be a discriminative factor for low preference scores.
Scatter plots revealed some possible interactions between variable combinations. For all subjects, the combination of
moderate to fast tempo and percussivity appeared to be related to high preference. Faster tempos and simple performances
were also related to higher preference scores. In addition, loud dynamics and simple performances also appeared to create
high preference decisions.
In contrast, the only combination of variables which appeared to be related to low preference was that of regularity and
"smoothness" of rhythm.
General Discussion

Preliminary analysis appears to indicate that the musical characteristics of tempo, percussivity, regularity of rhythm,
dynamics, and performance (simple versus complex) are the stimulus characteristics most involved in preference decisions
for music of various styles. An exploration of the data differences between gender and grades is in process. However, future
research in this area demands the use of more specific musical stimuli samples, controlled for the variables indicated above.
Because popular musical styles can be difficult to define and categorize, the use of only one style, created with more extreme
polarities of the musical characteristics noted above, is called for.
It is interesting to note that preferences ratings of early elementary children have not changed notably since the 1970s.
Examinations of preferences for varying musical characteristics over the age range of children to adult would provide
valuable data regarding the evolution of music preference over the age span.
Appendix
Professional Music Educator Rating Tool

Slow 1 2 3 4 5 6 7 Fast
Smooth 1 2 3 4 5 6 7 Percussive
Irregular Rhythm 1 2 3 4 5 6 7 Regular Rhythm
Low Pitches 1 2 3 4 5 6 7 High Pitches
Dull Timbre 1 2 3 4 5 6 7 Bright Timbre
Soft 1 2 3 4 5 6 7 Loud
Minor Key 1 2 3 4 5 6 7 Major Key

Simple Texture 1 2 3 4 5 6 7 Complex Texture
Simple or Accessible 1 2 3 4 5 6 7 Demanding or Virtuosic
Performance Low Affect 1 2 3 4 5 6 7 Performance High Affect
References
Alpert, J. (1982). The effects of disc jockey, peer and music teacher approval of music on music selection and preference.
Baker, D. S. (1980). The effect of appropriate and inappropriate in-class song performance models on performance
preference of third- and fourth- grade students. Journal of Research in Music Education, 28, 3-17.
Boyle, J., Hosterman, G., & Ramsey, D. (1981). Factors influencing pop music preferences of young people. Journal of
Christensen, P. and Peterson, J. (1988). Genre and gender in the structure of music preferences. Communication Research,
15, 282-301.
Finnas, L. (1989). A comparison of young people's privately and publicly expressed music preferences. Psychology of Music,
17 (2), 132-145.
Flowers, P. J. (1981). Relationship between two measures of music preference. Contributions to Music Education, 8, 47-54.
Fung, C. V. (1996). Musicians' and nonmusicians' preferences for world musics: Relation to musical characteristics and
familiarity. Journal of Research in Music Education, 44, 60-83.
Geringer, J. M (1982). Verbal and operant discrimination - preference for tone quality and intonation. Psychology of Music,
Special Iss., 26-30.
Goimo. C. J. (1993). An experimental study of children's sensitivity to mood in music. Psychology of Music, 21 (2), 141-162.
Greer, R., Dorrow, L. & Hanser, S. (1973). Music discrimination training and the music selection behavior of nursery and
primary level children. Bulletin of the Council for Research in Music Education, 35, 30-43.

Greer, R.D., Dorow, L. G., Wachhaus, G., & White, E. (1973). Adult approval and students' music selection behavior.
Journal of Research in Music Education, 21 (4), 345-354.
Greer, R., Dorow, L., & Randall, L. (1974). Music listening preferences of elementary school children. Journal of Research
in Music Education, 22, 284-291.
Gregory, D. & Sims, W.L. (1987). Music preference analysis with computers. Journal of Music Therapy, 24 (4), 203-212.
Gregory, S., Worral, L., & Sarge, A. (1996). The development of emotional responses to music. Motivation and Emotion, 20
(4), 341-348.
Hargreaves, D.J. (1982). The development of aesthetic reactions to music. Psychology of Music, Special Issue 1ss, 51-54.
Hargreaves, D.J. (1988). Verbal and behavioral responses to familiar and unfamiliar music. Current Psychological Research
and Reviews, 6 (4), 323-330.
Hargreaves, D., Comber, C., & Colley, A. (1995). Effects of age, gender and training on musical preferences of british
secondary school students. Journal of Research in Music Education, 43, 242-250.
Kratus, J. (1993) A developmental study of children's interpretation of emotion in music. Psychology of Music, 21, 3-19.
Kuhn, R.L. (1980) Instrumentation for the measurement of music attitudes. Contributions to Music Education, (8), 2-38.
LeBlanc, A. (1980). Outline of a proposed model of sources of variation in musical taste.Bulletin of the Council for Research
in Music Education., 61, 29-34.
LeBlanc, A. (1981). Effects of style, tempo, and performing medium on children's music preference. Journal of Research in
LeBlanc, A. & Cote, R. (1983). Effects of tempo and performing medium on children's music preference. Journal of Research
in Music Education (1), 57-66.
LeBlanc, A. & McCrary, J. (1983). Effect of tempo on children's music preference.Journal of Research in Music Education,
31 (4), 283-293.
LeBlanc, A. & Sherrill, C. (1986). Effects of vocal vibrato and performer sex on children's music preference. Journal of
LeBlanc, A., Colman, J., McCrary, J., Sherrill, C., & Malin, S. (1988). Tempo preferences of different age music listeners.
LeBlanc, A., Sims, W., Siivola, C., & Obert, M. (1996)
Music style prefrences of different age listeners. Journal of Research in Music Education, 44, 49-59.
LeBlanc, A., Chang Jin, Y., Simpson, C., Stamou L. & McCrary, J. (1998). Pictorial versys verbal rating scales in music
preference measurement. Journal of Research in Music Edcuation, 46, 425-435.
May, W.V. (1985). Musical preferences and aural discrimination skills of primary school children. Journal of Research in
Miller, R. (1990). The semantic differential scale in the study of musical perception: A theoretical overview. Quarterly of the
Center for Research in Music Learning and Teaching, 1 & 2, 63-73.
Montgomery, A. P. (1996). Effect of tempo on music preferences of children in elementary school and middle school. Journal
of Research in Music Education, 44, 134-146.
Morrongiello, B. A. (1992). Effect of training on children's perception of music: a review. Psychology of Music, 20, 29-41.
Moskovitz, E.M. (1992). The effect of repetition on tempo preferences of elementary children. Journal of Research in Music
Education, 40 (3), 193-203.
Peery, J.C., & Peery, I. W. (1986). Effects of exposure to classical music on the musical preferences of preschool children.
Radocy, R. E., & Boyle, J. D. (1979). Psychological Foundations of Musical Behavior. Springfield, IL: Charles C. Thomas,
Publisher.
Rawlings, D., Hodge, M., Sherr, D., & Dempsey, D. (1995). Toughmindedness and preference for musical excerpts,

categories and triads. Psychology of Music, 23, 63-80.
Resch, B.R. (1996). Adolescents' attitudes towards the appropriateness of religious music. Unpublished doctoral dissertation,
Indiana University, Bloomington.
Robazza, C., Macalus, C., & D'Urso, V. (1994). Emotional reactions to music by gender, age, and expertise. Perceptual and
Motor Skills, 79 (2), 929-944.
Roe, K. (1985). Swedish youth and music: listening patterns and motivations. Communication Research, 12 (3), 353-362.
Sims, W. L. (1987). Effect of tempo on music preference of preschool through fourth-grade children. In C.K. Madsen & C.A.
Prickett, (Eds.), Applications of Research in Music Behavior (pp. 15-25). Tuskaloosa: University of Alabama.
Taylor, M.M. (1985). Music in the daily experience of grade six children. Psychology of Music, 1, 31-39.
Terwogt, M.M. & Van Grinscen, F. (1991). Musical expression of mood states. Psychology of Music, 19 (2), 99-109.
Thompson, R. and Larson, R. (1995). Social context and the subjective experience of different types of rock music. Journal of
Youth and Adolescence, 24 (6), 731-734.
Wapnick, J. (1976). A review of research on attitude and preference. Bulletin of the Council for Research in Music
Education., 48, 1-20.
Back to index

swing2000
Proceedings paper
Swinging Impulse Coordination as a Source for Swing Rhythms

John Ito, Columbia University
jpi9@columbia.edu
Introduction
Feelings of swing, characteristic of some jazz performances, are both perceptually salient and definitionally
elusive. Put less formally, you recognize a swing feel when you hear it, but it's not at all clear just what "it"
is. One of the initial problems in studying swing is that the word is used to describe a wide variety of
phenomena. It can describe a style of music popular in the 30's and 40's, a feel of performance (e.g. as an
alternative to funk or Latin), or an ineffable quality which a performance can possess to varying degree.
(Swing as a quality is ineffable in part because, though most centrally describing music in the swing feel,
some jazz musicians feel that it is also found in music in other feels. This paper's hypotheses about
hierarchical impulse coordination may offer a partial explanation of this.) For the purposes of this paper,
'swing' will be understood as characterizing a competent performance in the swing feel; such a performance
should be identifiably in the swing feel and should possess a basic minimum of the quality of swing. Thus to
ask of a performance in the swing feel, "Does it swing?" will be taken as the equivalent of asking of an
utterance, "Was this utterance produced by a native English speaker?" To answer 'yes' to either question is to
identify the object as being acceptable and competently produced; it indicates membership in a category and
says nothing about location within that category.
The analogy with native English speakers can be pushed further. Just as an individual speaker will have both
a regional accent and unique individual characteristics, so swing feels vary both according to stylistic
currents within jazz and with the personal stamp of individual musicians. Both domains also offer research
problems of comparable complexity. When we recognize a specific person's voice, we do this on the basis of
so many individual but mutually-interacting parameters that a complete understanding of our mental
processing is a practical impossibility (Handel 1989). Accent recognition (here native vs. non-native
speaker) requires an abstraction out of those already complex factors, and thus poses an even greater
research challenge. The problem of characterizing swing requires a comparable abstraction out of already
complex objects, so we should not expect to find more than very incomplete conditions for swing, conditions
which are certainly not sufficient and often not even necessary.
Discussions of the sources of swing often focus on phenomenal accentuation of weak beats at the tactus level
and delay of weak beats at the first subtactus level, with particular attention given to the latter condition. It is
well known among musicians who do computer synthesis of jazz, however, that these conditions do not
suffice to produce a feeling of swing. Merely delaying weak eighths will not make music swing. (In notating
jazz, the quarter-note level is almost always the tactus level. This practice will be assumed in much of this
paper, and references to specific note values will always assume a quarter-note tactus. This will simplify the
discussion of different metric levels.) Furthermore, there is no archetypal rhythmic proportion of the swung
eighth to the quarter-note span in which it is embedded (hereafter "swing ratio"). This fact is a commonplace
among jazz musicians, and empirical verification of this was the subject of Experiment 1.
Experiment 1
In Experiment 1 recordings by eminent jazz drummers were studied for the relative durations of the swung
file:///g|/Mon/Ito.htm (1 of 16) [18/07/2000 00:32:26]

swing2000
eighths played on the ride cymbals. Ride cymbals were chosen because they are extremely salient on a
sonogram, extending well above the frequency ceiling of the other instruments. Furthermore, the typical
swing pattern on the ride cymbals (all quarters plus offbeat eighths of beats two and four) provided plenty of
swung eighths to study. Drummers Art Blakey and Max Roach were chosen because they are particularly
flexible rhythmically. Because they are both ranked among the greatest jazz drummers, two important
assumptions may safely be made: that this flexibility is the product of intent and not of lack of control; and
that their performances fall within the bounds of acceptable practice.
Multiple portions of four tracks from two albums were imported as .aiff files and subjected to sonogram
analysis using the program AudioSculpt. In many cases markers corresponding to ride-cymbal hits were
placed algorithmically by AudioSculpt; in some cases they needed to be placed by hand. For each sample the
markers were exported as a list of timings. This list was analyzed by a PatchWork patch which calculated the
swing ratios, their average, minimum, and maximum, and their standard deviation. The results are tabulated
as Appendix 1. Swing ratios were found to range from 14% to 48%; this strongly confirmed the belief that
there is no archetypal swing ratio, and thus that rhythmic alteration is not a sufficient condition for swing.
Similar methods were used by Friberg and Sundström (1999), and similar results were obtained.
The Role of the Performer's Body
While the results of Experiment 1 are important for demonstrating the non-existence of an archetypal swing
ratio, it should be stressed that such an archetype would not sufficiently explain the phenomenon of swing
even if it did exist. For example, melodic lines that contain no eighth notes can be perceived to swing. In the
shout chorus of a swing-feel big-band chart the band is often in rhythmic unison, and there may be no
consecutive eighth notes for lengthy stretches; in a good performance these passages will nonetheless be
perceived as swinging.
In looking for sources of swing that go beyond the rhythmic structure of consecutive eighths, a number of
researchers have focused on the role of structured asynchronicity among performers. Another issue
commonly discussed among jazz musicians, it has been most prominently discussed in the academic
literature by Keil (1987), and has been studied analytically by Prögler (1995), synthetically by Friberg
(1999), and using both approaches by Iyer (1999). While the importance of structured asynchronicity has
been demonstrated by the success of these studies, there are plenty of other factors which contribute to
swing, and some of them do not involve synchronization . For example, it is possible for a solo line without
accompaniment to swing, in which case there are of course no issues of synchronization.
As one factor behind the phenomenon of swing which is more general than structured asynchronicity, I
propose that there are characters of movement of the performer's body which leave sonic traces in the
articulations and timings of the notes, and that these characters of movement are important in generating
feelings of swing. The music swings because the performers' bodies swing. This fits well with Gunther
Schuller's (1989) frequently quoted definition of swing: "in its simplest physical manifestation swing has
occurred when, for example, a listener inadvertently starts tapping his foot, snapping his fingers, moving his
body or head to the beat of the music." (p. 223) Iyer (1999) also roots his methodology in the physicality of
performance; his account of structured asynchronicity is based on physical aspects of some African and
African-American cultural practices and values.
My proposal differs from previous work in its emphasis on the role of articulation. I believe that amount and
uses of muscular tension in the performer's body make a difference in the quality of the sound, that a note
played as a rebound sounds different from a note played as a main impulse. If our bodies tend to respond
physically to music, it seems plausible that they do so because of recognizing the character of motion of the
bodies that made the music. (The possibility that physical participation may affect perception will be
discussed briefly below.)
A substantive description of swinging characters of movement is well beyond the scope of this study. While
some characterization will be offered, for the moment it will suffice to argue for the plausibility the concept.

swing2000
First of all, it must be possible for subtle differences in muscular control to make a qualitative difference in
the sound produced. Iyer, for example, states that in playing certain drums "the only two elements at one's
disposal are intensity and timing." (1998, p. 105) In general I am skeptical about claims that any instrument
in which the performer's body is not entirely mechanically distanced from means of production of the sound
(i.e. any instrument not like the pipe organ) does not respond to subtle differences in the way the performer
moves when playing. To stay with the ride cymbal example, this may seem initially to be too simple an
instrument to produce such shadings of sound; it might be argued that the location on the plate of contact
with the stick and the forcefulness of that contact are the only variables. But this neglects the fact that the
stick and the plate have a complex interaction. They are in physical contact for a finite period of time, and
during this time they will behave to some extent as coupled oscillators. The firmness with which the stick is
held, the angle with which it contacts the plate, and the degree to which the stick is controlled or allowed to
bounce freely will all affect the interaction of stick and plate, and thus affect the vibration of the plate.
Secondly it must be established that the listener can in fact reconstruct information about the performer's
body from the sonic traces thereof. This is intuitively plausible based on the extremely detailed knowledge
inferred from environmental sounds; more direct verification was the purpose of Experiment 2.
Experiment 2
In Experiment 2, computer-altered versions of ride cymbal patterns recorded by a jazz drummer were played
for subjects. In each of two example groups, the swung eighth note was be moved in time but had its spectral
envelope preserved; the two groups were based on originals with differing swing ratios (.285 and .196). The
initial hypothesis was that when asked which version swung the best, subjects would choose the original
timing of the swung eighth based on articulation cues.
Experimental Method
Examples were based on recordings made of Dave Gluck, a nationally touring jazz drummer who is based in
New York. Dave played a 20" Zildjian Custom Dark Ride cymbal, and was recorded on DAT with sampling
rate 44.1 kHz using a stereo microphone placed between three and six inches from the cymbal. He was asked
to play typical ride patterns (all quarter-note beats plus offbeat eighths of beats two and four) with three
different feels; one in which the swung eighth was approximately equal to a triplet eighth, a snappier feel
which was more like a sixteenth, and a straighter one which was closer to a straight eighth. The feels were
recorded in that order, and before each a metronome was played at 130 beats per minute in order to have the
different feels have approximately equal tempo. Within each feel, after recording several measures I had him
play the same feel again twice, this time interrupting him by knocking his stick away in the middle of the
pattern. This was done after beat 2 and after the eighth that follows beat 2. The idea was that when moving
the swung eighth, either the off-beat eighth or the on-beat eighth that preceded it would have to be made
longer, and that it would be helpful to be able to copy the decay that would have occurred had the next note
not been played. The stick was knocked away, rather than simply asking Dave to stop the pattern, in order
that the physical motions which produced the note in question would be exactly those that would have been
used if the next note were going to be played.
The recordings made were imported as .aiff files and processed using BIAS Peak v. 2.0. Within each feel,
one half-measure excerpt was selected which seemed to swing particularly well. When it came to processing
the examples, however, problems arose due to my having underestimated the complexity of the sound of the
ride cymbal. Although I was explicitly testing its capacity to carry subtle articulation cues, I had not
observed its low-frequency behavior. In many musical settings the lower frequency components are not
particularly salient relative to other sounds in the same spectral region, and the envelope of these lower
components evolves relatively slowly. I had therefore not realized that they would present problems. What I
found was that hits with the same metrical position within the same feel were perceived as differing in
overall pitch; it was therefore not possible (with one exception) to mix the decay of one hit convincingly
with the attack of another. It was then necessary to find another way of extending the decay of some of the
hits. After trying a few different methods, I ended up simply copying the last few milliseconds of the hit and
pasting them in at the end. This usually involved only a few milliseconds (at most 23.5 ms pasted four times,

swing2000
but that was extreme). I did not perceive this as causing any distortion of the sound. The only other method
used was in lengthening the swung eighth of the version with swing ratio .196. This note was problematic
because for some examples it needed to be lengthened considerably, and because it was interrupted by the
next hit very soon into its decay, at a point at which the envelope was evolving extremely rapidly. For this
set I ended up using as my model the last complete pattern from the take in which I interrupted the drummer
after the swung eighth. The swung eighth which ended that take was thus played almost immediately after
the swung eighth in the pattern used as a model, and they had the same overall pitch. I was therefore able to
splice the decay of the final eighth onto the swung eighth from the basis pattern.
Three groups of seven examples each were produced. The first group was based on a half bar from the
approximately triplet feel and had swing ratio .285. The second was based on the snappier feel that was more
similar to a dotted rhythm and had swing ratio .196. The third feel was from the straighter feel and had
swing ratio .35. In each group the overall duration of the pattern was kept constant, and the swung eighth
placed in seven different positions, ranging from ratio .42 to ratio .18 in increments of .04. In each case the
basis pattern lasted from the attack of beat one until just before the attack of beat four. The span from the
attack of beat two until just before the attack of beat four was then copied and pasted in repeatedly until there
were three full bars, concluding with the first beat of the fourth bar (i.e. ending just before the attack point of
the second beat of bar four). Once these groups had been generated, the entire third group was discarded as
sounding awkward; something about the feel of the original half bar made it not conducive to exact
repetition. The first and final examples of each group were also discarded as obviously not swinging, leaving
five examples in each group ranging in ratio from .38 to .22.
There were several problems intrinsic to this method of constructing examples. In order to isolate the
relationship between swing ratio and articulation, it was necessary to repeat the same pattern exactly several
times. (It might have been possible to use examples only a half bar long, but it would have been very
difficult to judge such a short example as either swinging or not swinging.) The exact repetition, however,
created a situation which is basically never heard in real jazz; even the most consistent drummers do not use
such absolutely exactly repeating patterns. The second problem was that the perceived lower frequency pitch
of the cymbal changed over the course of the half bar. Pasting on a repetition thus led to a completely abrupt
change in the perceived main lower frequency; such abrupt changes are never heard in real cymbal playing.
The result was a strange oscillating melody of sorts in the lower frequency range. (Readers wishing to hear
some of the examples should visit my website, "www.columbia.edu/~jpi9".)
Once the examples had been generated, I realized that the coding of bodily motion via articulation cues did
not create such strong restrictions as I had supposed. I was correct in predicting that the same swung eighth,
moved in time from its original position, would not only sound rhythmically different but also sound like it
had been produced with a different character of motion. It was true that the same eighth, at first relaxed and
natural, would gradually become stiff and awkward. But this did not happen as quickly as I had expected.
Rather, there was a considerable range across which the character changed, but in which the examples still
sounded musically viable.
In an attempt to verify the importance of articulation cues, I created a number of other examples which were
identical to existing ones in temporal and peak intensity profiles but
which were assembled out of hits which were not originally proximate (and which should therefore be
expected to have little relation to each other in terms of structure of articulations). In making various
examples I used a single on-the-beat hit, a variety of on-the-beat hits, or hits which retained their original
metrical position but which came from different feels. To my ear those examples ranged in the order
presented from stiff and unmusical to fairly reasonable. At the least, this indicates to me that on-the-beat hits
and off-the-beat hits are, in fact, qualitatively different.
Examples were put into pairs and triplets for comparison, with sets falling in three categories. Within each of
the two groups derived from one of the original feels, the five examples were divided into two overlapping
triplets, with swing ratios ranging from .38 to .30 and from .30 to .22. These four sets comprised the first

swing2000
category. There were also five pairs, in which the examples having the same swing ratio from each of the
two groups were compared. Those were the second category. Finally, the third category consisted of six
pairs and one triplet in which the more synthetically generated examples were compared with the originals
which they were imitating. All sets were then randomly ordered, and within each set order of presentation of
the examples was also randomized. Once in sequence, the 37 examples comprising the 16 sets were burned
as separate tracks onto CD-R; they were distributed over two CD-R's for ease of track control.
Jazz musicians have extremely divergent attitudes about research into the inner workings of the music.
While some are eager to think about and discuss such issues, others find the attempt to explain that which is
ultimately unexplainable at best futile and at worst offensive. Although I do in fact agree that the workings
of swing will never be completely explained, I decided that it was safest and easiest not to indicate to the
subjects that swing was the object of my investigations. They were therefore informed that the experiment
looked at differences between jazz and classical musicians in how they perceive and characterize swing
rhythms. Subjects were asked to rate the examples within each set numerically, and also were also asked to
write a few words about how they perceived the differences in character. Subjects were allowed to use the
same number more than once if they felt that there was no difference in their preference of some of the
examples. Subjects were given a booklet in which to record responses; the booklet listed the tracks grouped
into sets and had room for comments after each set. The subjects were allowed to control the CD player
themselves and listened through headphones.
Subjects were eight members of the Manhattan Jazz Orchestra and headlining vocalist Johanna Grüssner. All
are professional jazz musicians based in New York. Subjects were given a small honorarium in appreciation
for their help. Most of them completed the task in approximately fifteen minutes. Due a technical problem, 5
of the subjects had sound coming from only one side of the headphones; the responses of these subjects are
not statistically differentiable from those of the others, however.
Results
From the standpoint of verifying my hypothesis, the experiment was an almost complete failure. It was,
however, instructive in other ways.
The only definitive results came in the sets which compared triplets on the basis of swing ratio. It was
assumed that the effects in question would be strong, and that therefore any real difference should be
virtually unanimously agreed upon. No statistically meaningful conclusions can be drawn from anything less
than virtually complete agreement with such a small number of subjects. For the first feel, comparing ratios
between .38 and .30, .30 was preferred by all eight subjects who expressed any preference. .34 was preferred
over .38, but too weakly to be statistically meaningful (5-2). Comparing ratios between .30 and .22, .30 and
.26 were tied (5-4) for first preference; all nine agreed that .22 was the worst. For the second feel, .30 was
preferred over .34, which was considered better than .38; two expressed no opinion and one each couldn't
decide between the adjacent pairs, but no responses contradicted that ordering. Comparing ratios between
.30 and .22, results were much more spread out. Each example got one of the rankings four times, one three
times, and one twice. .30 and .26 were equally preferred, both receiving four number one rankings. .22
seemed least preferred, receiving four number three rankings, three twos, and two ones. From a statistical
point of view, however, these differences are not meaningful.
It may support my hypothesis that with the group which was based on an original swing ratio of .285, a ratio
of .22 was unanimously felt to be worse than either .26 or .30, whereas when the original ratio was .196, all
three ratios were approximately equally preferred. For this to strongly support the theory, though, it would
be expected that when the two examples with ratio .22 were compared directly, the one with original ratio of
.196 would be strongly preferred over the one with original ratio .285. This was not the case; the preference
for the one with .196 original ratio was far too weak to be meaningful. I don't think that this experiment
succeeded in demonstrating anything at all about the role of articulation cues. I would also caution against
any reductive conclusions about swing ratios based on the results of the timing comparisons. The
preferences do seem quite clear, but they are in the absence of a musical context, and at one specific tempo.

swing2000
While, at a given tempo, some ratios may be more normative than others, in the right musical context ratios
much more extreme ratios than those tested here can swing.
Of the five pairings of examples with the same swing ratio from the two example groups, none produced
significant preferences. Similarly, of the sets which compared more synthetic examples with the originals
which they imitated, only one pair produced significant results, and in that pair the synthetic example had an
extremely noticeable pitch mismatch. Most surprisingly, the synthetic example which to my ear was the
most awkward and unmusical was actually preferred over its original by 5 subjects. Results are tabulated
more fully in Appendix 2.
The real value in the results of Experiment 2 lies in the focus it puts on the subjectivity of swing. In the
introduction I stressed the difficulties in characterizing swing which result from the variety of music which
is felt to swing. Swing is also difficult to characterize because it intersects with individual values which vary
from person to person. It was not uncommon to have two subjects identify the same parameter as marking
the most salient difference between examples, but to evaluate the examples in the opposite way based on the
same parameter. For example, they might agree that one example has more accentuation on beats two and
four than the other, but while one will hear this as good, the other will hear it as exaggerated. Or, in the case
of the synthetic example which had the extreme pitch mismatch, one subject liked the synthetic example,
saying that it "has an interesting two tone thing going on." Another subject, though, complained that "He's
fucking around with where he/she puts the stick on the ride in [the synthetic example]."
Another way in which responses were subjective was in the choice of parameters to give attention to. Even
such simple examples offered a variety of different potential foci of attention. One subject, for example,
made almost no comments other than 'too stiff;' it seems likely that he was bothered by the absolutely
inflexible repetition of the patterns. Another subject (the same one quoted at the end of the previous
paragraph) felt that almost all of the samples were terrible and expressed very few preferences; his one
moment of enthusiasm came when he heard the most synthetic example, the one made with repetitions of
just one hit. He said that that example was "the best so far because the drummer is not fooling around with
stick placement and volume." He made exactly the same comment when he encountered that example in a
subsequent set; it seems likely that he was disturbed by the pitch mismatch. In general, subjects frequently
called attention to different parameters in their written responses.
Finally, responses to swing are subjective because listeners do not differ only in the evaluation and attention
to their perceptions; they are able to affect their perceptions by their behavior. I first noticed this in the
course of Experiment 1, listening to Max Roach's ride cymbal playing on Move. I found that when I listened
hearing the half note as the tactus (an incorrect hearing to a jazz musician) I didn't hear any swing in the ride
pattern at all. I noticed no delay of the off-beat eighth notes, and no accentuation of beats two and four.
When I listened hearing a quarter-note tactus, however, both the very subtle delay of the swung eighths and
the emphasis on two and four became audible; the line started to swing. I had a similar experience when
preparing the examples for Experiment 2. I found that when I was disposed to an example swinging, when I
moved my body in response to it, even the worst examples had at least some swing to them. But if I sat
motionless as a detached and uninvolved listener, some of the better examples sounded sterile and
completely unswinging. Iyer frequently refers to listeners as coperformers (1998); from these experiences it
seems to me that swing is not simply a property of a sound signal, or even of a perception which may vary
from listener to listener but which is fixed for each signal-listener pairing. Rather, swing is, at least in part,
something which the listener participates in, and which the listener has the power either to encourage or to
resist.
Future Directions
In considering future versions of this experiment, there are some problems which could be readily solved,
but many others which would remain. The most obvious area of improvement is the example materials. As
manipulating sound files presents so much difficulty, it would seem desirable to precisely manipulate the
performer instead; the obvious solution is to use a computer-controlled piano which can both record touch

swing2000
and timing data and play on the basis of touch and timing input from the computer. This would eliminate the
problems such as pitch-mismatch while still allowing a note to be moved in time though remaining
unchanging in character.
The problems related to subjectivity of response, however, would remain. This kind of investigation into
swing presents an experimental catch-22. In order to isolate individual parameters, it is necessary to remove
much of the context, as changing one parameter also changes the relationship between that parameter and the
others present. But removing context only opens the door for the listener's personal responses and
assumptions about context to play an even greater role. Furthermore stripped-down examples which are
perceived as raw materials for music rather than as music itself invite the listener's participation much less
than real, complex musical examples. It's not clear that it would be profitable to repeat this experiment with
the relatively cosmetic (though necessary) change of example source; future experiments bear substantive
rethinking.
Toward a Characterization of a Swinging Performer
As noted above, this paper will not offer a thorough account of swinging motions in performance. There are,
however, some observations about impulse coordination and hierarchy which can serve as the beginning of
such an account.
Unless notes are significantly isolated in time, a performer will not usually execute all notes with equal
impulses. Rather, there will be dominant impulses coinciding with hypermetrical beats; all other notes will
group together gesturally with the notes receiving dominant impulses, joining either the dominant note that
follows or the one that precedes. Each of those notes will receive a subordinate impulse. These subordinate
impulses will be less active, and have the character of a rebound.
An analogy with sawing wood may help convey the intended concept. Imaging first sawing through a piece
of dense particle board. The hardness of the wood, combined with its internally conflicting grain, make it
difficult to saw. Each stroke is very active and must be independently initiated. The saw comes to rest at the
end of each push stroke, and the next pull must be entirely the product of a new impulse. In this case there is
no hierarchy of impulses. Contrast this with sawing balsa wood. It is so easy, and the saw moves through the
wood with so little resistance, that the pull feels like an easy rebound from the push. One main push may
even produce the next three as subordinate rebound strokes. In this case the rebound strokes receive impulses
which are subordinate to that of the main stroke. Note that these strokes are not literally rebounds; without
the intention of the person sawing, one stroke would come to an end and stop, even with balsa wood.
Although these strokes result from their own impulses, those impulses have less intentionality than the main
impulse; it feels almost as if they were happening of their own accord. It is in this sense that impulses are
hierarchized.
To return to a musical example, consider a string player or pianist playing a passage from Mozart featuring
running eighths in a quick tempo. Dominant impulses would fall either every measure or every half measure,
and the subordinate impulses would be relatively undifferentiated. This is necessary for the smoothness
which is stylistically typical; more frequent dominant impulses would make the passage sound beaty and
unmusical; they would also increase the physical tension of the performer. Note that this has the effect of
deëmphasizing the quarter-note level: dominant impulses are less frequent than the quarter-note level and
note changes more frequent. (Dominant impulses are usually audible, though not necessarily because of
being louder.)
Contrast this now with a jazz musician playing running eighths in a swing feel. While there will still be a
larger organization of four or eight notes, of which one will receive a dominant impulse, there will now be a
hierarchically intermediate impulse given to each note falling on a quarter-note beat. There is a gestural,
impulse hierarchized pairing of notes. (It is not universal that the strong impulses fall on the beat; at slow
and moderate tempi the intermediate dominant impulses often fall on the weak eighths. At extremely fast
tempi this kind of syncopation is rarely heard as it becomes extremely difficult to execute.) This will

swing2000
emphasize the quarter-note level, and the presence of more relatively superordinate impulses will also
increase the intensity. This hypothesis may account for those jazz musicians who swing even though they
use little or no rhythmic inflection. They are perceived as swinging because of the audible results of the
added middle level of impulse with its resultant strong-weak duple impulse pairs. The hypothesis may also
explain how pieces of music which are not in the swing feel may nonetheless seem to swing. Again, it is
important to state that the claim here is not simply that notes on the beat are louder. The claim is that the
difference in impulse control makes the action different; the difference in action makes the sound different.
It is a qualitative difference in sound that is being claimed as the crucial percept here.
This claim about differences in impulse hierarchy seem difficult to verify short of wiring performers to take
data about muscle use. Syllable choice in singing instrumental music, however, provides a way into these
issues. It is commonly observed that jazz and classical musicians sing instrumental music very differently;
classical musicians tend to vary the vowels they use less than jazz musicians. As the vowel chosen
determines the position of the vocal apparatus, and as these various positions involve known relative
amounts of muscular tension, analysis of syllable choice can yield patterns of relative tension and relaxation.
At first glance, the expected patterns conform very well to the hypothesis of impulse hierarchy. A classical
musician singing running eighths will often stay on one syllable, or change syllable infrequently, reflecting
locally undifferentiated levels of tension. A jazz musician, on the other hand, will often employ a pattern of
regularly alternating vowels, corresponding to a regular alternation in the level of muscular tension. Putting
this oversimplified analysis on an empirical footing was the purpose of Experiment 3.
Experiment 3
In Experiment 3 the same nine subjects used in Experiment 2 were recorded singing five instrumental lines
taken from well-known jazz compositions. They were told that the experiment looked at differences between
jazz and classical musicians in the ways that they communicate with other musicians by singing. They were
told not to approach the task like a singer giving a performance, but rather like a musician communicating
with another musician about how the music should go.
The examples chosen were as follows: the first alto part from the first sixteen bars of the first chorus (mm.
13-28) of "Basie - Straight Ahead" by Sammy Nestico, found in Wright (1982); the first alto part from the
first 8 bars of the fourth chorus (mm. 109-116) from the same chart; the flügelhorn part from the first 16 bars
of "Three and One" by Thad Jones, also found in Wright (1982); the first alto part from the first sixteen bars
of the second chorus, with the measure-long lead-in, (mm. 31-48) also from "Three and One"; and the
beginning of a lead sheet of Charlie Parker's "Ornithology," up to the downbeat of measure 12. The
examples each featured passages of running eighth notes in addition to other rhythms. The examples were
chosen for their familiarity, in the hopes both that the singing would be as expressive as possible and that
musicians would not need to concentrate on reading the music while singing; this was not intended to be a
sight-singing task.
The recordings have not yet been fully transcribed; they may well contain enough interest to be the subject
of a separate study. Initial processing reveals the following generalizations, however. Off-beat eighths are
typically either schwa sounds or not sung with vowels at all, but rather with the nasal voiced consonant /n/.
The schwa is produced with a very neutral, relaxed position of the vocal apparatus. On-beat eighths (and
some off-beat eighths which are tied into the next beat, functioning as anticipations) are typically more
active vowels, especially /i/ and /u/. /i/ is produced with the tongue high and in the front of the mouth, /u/
with the tongue high in the back of the mouth and with lips somewhat constricted. (Handel 1989, pp.
135-147) Not only are these both more active than the schwa, they are two of the three 'cardinal vowels', so
named because they are produced in the physically most extreme positions, thus bounding the space of all
vowels. (Handel 1989, p. 143) The most typical vowel patterns were based on regular alternations between
two vowels, with either /i/ or /u/ on the beat and the schwa off the beat. This fits perfectly with the
hypothesis of impulse hierarchy; the alternation between vowels that are maximally active and maximally
passive makes sense as an alternation between dominant impulses and rebounds.

swing2000
One of the subjects in Experiment 3 was remarkable for his musical knowledge and musical memory. With
each example he glanced at the sheet of music to see what was wanted and then put down the music and
sang from memory. He frequently sang more than had been requested, and in the case of the opening of
"Three and One" he sang some of what the rest of the band does during the rests in the flügelhorn melody.
His singing was therefore purely the expression of an already-formed musical conception, unencumbered by
the parallel task of forming a conception based on reading music. His rendition of "Ornithology," transcribed
as Example 1, demonstrates strategic musical use of the physicality of vowel sounds. From here the paper
will follow the informal vowel notation used in Example 1.
Rather than using mainly one of the more active vowels, he used both, exploiting the space between them.
He moved from the initial doo to dee on the third beat of measure one, using dih as an intermediate point for
a smooth transition. He then created an intensification into the goal of the phrase, the down beat of measure
2, by a direct alternation of the distant doo and dee. In measure three he left off the alternation in the third
through fifth eighths in order to create a sense of headlong drive into the two syncopated quarter notes. In his

swing2000
rendition the b-flat in measure 4 is heard as a surprise, a sudden turn in the melody which receives a very
different kind of accent from those that the syncopated quarters got. This was achieved through use of
syllables, as the dee is brighter than the dih of the quarters; it stands out particularly saliently because of
being preceded by a doo. The rest of the phrase is a relaxation, with less in the way of extreme contrasts. A
nice touch is the variety of upbeat accents, the first produced by a strong plosive d in the duh at the end of
measure 4, the second, on the second eighth of the second beat of measure 5, produced by use of a doo. A
similar effect was achieved in the next phrase, when he created a weak accent on the second eighth of beat
three by omitting the consonant on the preceding eighth. The anticipation of measure 9, which is heard as an
intensified sequence of the preceding phrase, is brought out by use of the bright bee, though somewhat
softened by the consonant choice. The accent on buh in measure 9 is very strong, set off by a complete stop
of breath just before it. It shows that different kinds of accent don't have to correlate, as both vowel and
consonant choice would make buh quite weak if not for the accent by physical intensification. The end of the
phrase is punched out with the direct dee doo alternation, both on quarter-note beats. The rest of the example
relaxes from that point of greater tension with more normative syllable alterations which do not juxtapose
cardinal vowels. In addition to providing very strong support for the hypothesis about impulse hierarchy, this
example shows that this kind of syllabic analysis could provide new analytic insights into scat singing.
Convergence of the Visceral and the Cognitive: Fast Hard Bop
So far this paper has concentrated on more visceral aspects of swing; there are of course also cognitive
aspects, and in the case of fast hard bop these can combine to begin to account for important aspects of style.
Issues of beat finding are implicit in the above discussion, as the presence of hierarchically intermediate
impulses coinciding with the tactus can aid the listener in finding the beat. These are not, however, the only
cues. Consider a metrical grid of the sort used in Lerdahl and Jackendoff (1983). It is perfectly symmetrical
between levels; any level could serve as the tactus as well as any other. Now apply this to a swing feel jazz
tune, and suspend the assumption that the quarter note gets the tactus. It would be typical to find phenomenal
accentuation on every weak quarter-note beat. As this would happen throughout much of the tune, it would
make it unlikely for the tactus to be at the half-note level or above, as that would mean that the entire piece
was syncopated. It would also be typical for each weak eight-note beat to come late. As the tactus is
generally regular, this would make it unlikely for the tactus to be at the eighth-note level or below. (A
similar conclusion, though on different grounds, is found in Iyer 1998, pp. 117-118). This leaves the
quarter-note level as the only preferred level for the tactus.
Now consider fast hard bop tunes. These often have tempi above 200 quarter-note beats per minute. If a
classical piece had such a fast tempo, it is extremely unlikely that the quarter note would be heard as the
tactus. An experienced listener would likely hear a half note tactus for a classical piece with quarter-note
tempo 200 or above. A fast hard bop tune can also be heard with a slower half-note tactus, but experienced
listeners, aided by both the hierarchically intermediate quarter-note impulses and the cognitive cues
discussed above, will cling to the quarter-note tactus. (Jazz musicians will simply know that that's where the
tactus is, based on elementary grammatical cues.) Thus the visceral and the cognitive join forces to make
perceivable a tactus which lies considerably outside of normal limits. The performance is likely to be
intense, because of the tension involved in generating the tactus-level intermediate impulses. This is
compounded by the listener, who often pulses along with the tactus physically. Musicians playing fast hard
bop create intensity in many ways, but not least by this exploitation of cognitive and perceptual tendencies.
Conclusions
Swing is a phenomenon of immense complexity which arises out of the interactions of many parameters and
which is in many ways subjectively perceived. This paper offers a very partial account of the causes of
swing in its proposal that there are modes of bodily coordination which are in some way swinging, and
which leave audible traces in articulation, timbre, rhythm, and intensity. We understand music to swing in
part because we decode the traces of swinging performers. One aspect of swinging bodily coordination, at
least in the case of consecutive eighth notes, is an alternation of dominant and subordinate impulses. This

swing2000
theory of alternating impulses was strongly supported by analysis of spontaneous scat singing. Providing
along the way a new paradigm for the analysis of scat singing, this method offers a window into the
simultaneous manipulation of multiple parameters toward musically expressive ends. The explanatory value
of impulse hierarchy was further demonstrated when issues of impulse hierarchy and of cognition were
coordinated in the account of aspects of fast hard bop. Experiment 2, which failed to demonstrate that the
mode of motion of the performer actually has audible effects, was valuable for highlighting the subjectivity
of swing. In doing so it provided support for the main hypothesis. If it is true that we perceive swing more
strongly when our bodies start to swing, then it plausible that the invitation to the dance might come through
sonic recognition of potential partners already out on the floor.
Acknowledgments
This research began in classes taught by Fred Lerdahl and Thanassis Rikakis, and they have both continued
their support and help throughout. Dave Gluck was extremely kind in allowing me to record his playing, and
in discussing the issues with which this paper is concerned; he even gave me a first lesson in playing the ride
cymbal. Johanna Grüssner and the members of the Manhattan Jazz Orchestra were also generous in putting
their musicianship and expert ears at my disposal, as well as their time and patience. Vijay Iyer helped me
through a fruitful conversation which included a contribution to the experimental design. Doug Abrams, the
music director of the Manhattan Jazz Orchestra, contributed to this research indirectly through many
conversations about music over the course of the last decade or so. Finally, Bret Horton spent a long day
assisting me in carrying out experiments. He ran Experiment 3 while I ran Experiment 2 concurrently;
without his help this research could not have been completed on time.

swing2000

swing2000

swing2000

swing2000
Bibliography
Friberg, A.K., & Sundström, A. (1999). Jazz drummers' swing ratio and its relation to the soloist. Presented
at the 1999 conference of the Society for Music Perception and Cognition.
Iyer, V. S. (1999). Microstructures of Feel, Macrostructures of Sound: Embodied Cognition in West African
and African-American Musics, Ph.D. dissertation, University of California at Berkeley.
Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA, MIT
Press.
Keil, Charles, 1987. Participatory Discrepancies and the Power of Music. Cultural Anthropology, 2,
275-283.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA, MIT Press.
Prögler, J.A. (1995). Searching for Swing: Participatory Discrepancies in the Jazz Rhythm Section.
Ethnomusicology, 39, 21-54.
Schuller, G. (1989). The Swing Era: 1930-1945. Oxford, Oxford University Press.
Wright, R. (1982). Inside the Score. Delevan, New York, Kendor Music, Inc.

swing2000
Back to index

1
Proceedings paper
A Stroop-like effect in Hearing -Cognitive interference between pitch and word for absolute pitch
possessors-
Kengo Ohgushi
Faculty of Music, Kyoto City University of Arts
13-6 Kutsukake, Ohe, Nishikyo-ku, Kyoto 610-1197 Japan
ohgushi@kcua.ac.jp
1. Introduction
The Stroop effect is a psychological effect which refers to cognitive interference between a color and a word
when a subject is asked to give the name of the color of a color word (Stroop, 1935). The response time in the
incongruent condition (e.g. word:red, color:green) is longer than in the congruent condition (e.g. word:red,
color:red). This effect seems to be something related to the cognitive conflicts produced when visual and
semantic information contradict one another. This kind of effect has not yet been reported in hearing. In our
university, we have many music students with absolute pitch. They can identify and vocalize the note name (=
pitch name) of musical tones as /do/, /re/, /mi/ and so on, when they listen to any musical tones. An interesting
idea occured to me. If a listener with absolute pitch listens to a vocal tone uttered as one of the seven note names
(Do,Re, Mi,...,Si) with a pitch different from the uttered note name, the listener may be confused by cognitive
interference due to the difference between the note name and the uttered word. If the note name and the uttered
word are the same, then the cognitive interference will not occur. The aim of this study was to discover whether
the cognitive conflict occurs between auditory (pitch) and semantic (word) information. If it does occur then a
Stroop-like effect occurs in the auditory system as well as in the visual system.
2. Method
2.1 Stimuli
A male singer vocalized each of seven words: /do/,/re/,/mi/,/fa/,/sol/,/la/,/si/, each with seven pitches: Do, Re,
Mi, Fa, Sol, La, Si, where the fundamental frequency of La tones was 442 Hz. These 49 vocal tones included 42
tones in which the vocalized word and the note name of the tone were different (incongruent condition) and
included 7 tones in which the vocalized word and the note name were identical (congruent condition). These
sung tones were recorded on a digital tape recorder (DAT).
2.2 Subjects
Five students with absolute pitch served as subjects. They were graduate students majoring in music at Kyoto
City University of Arts and all of them were able to identify perfectly 60 notes from C3 to B7 played on the
piano.
2.3 Experiment
The stimuli were presented to subjects in a random order through headphones, and the subjects were asked to
respond by uttering the note name of each of the 49 tones as quickly as possible ignoring the word pronounced.
The stimuli and the responses voices were recorded on the left and the right channels of a DAT, respectively.
The response time (RT) in the congruent conditions (e.g. word:/do/, note name:Do) and the incongruent
file:///g|/Mon/Ohgushi.htm (1 of 5) [18/07/2000 00:32:29]

1
conditions (e.g.word:/do/, note name:Re) were measured. Next, the subjects were asked to respond by repeating
the word of each of the 49 stimuli as quickly as possible regardless of the note name. Subjects practised 20 trials
before this listening experiment.
3. Results
3.1 Note-name response
Fig.1 shows the response time for each note-name judgment averaged over five subjects. The abscissa gives the
note name of each tone stimulus, and the ordinate represents the average response time in ms. Averaged data
over all notes, illustrated as the rightmost bars, indicated that the averaged response time for the congruent
condition RTc (=684 ms) was shorter than the average response time for the incongruent condition Rti (=807
ms). This difference is significant according to the t test (p< .05). But in the case of Sol and Si, RTc was slightly
longer than RTi. This seems to be due to the fricative /s/. This problem will be discussed later.
Fig.2 also shows the response time for each note-name judgment averaged over five subjects. It shows that RTc
is significantly shorter than RTd (p< .01), except for the /la/ voice. RTd for /so/ and /si/ are markedly longer than
for the others. Examining the time envelopes of /so/ and /si/ tones revealed that the voiceless portion of the tones
continued for 200~300 ms. This means that the time from the onset of the fricative /s/ to the onset of the vowel
was comparatively longer than that for the other tones.

1
3.2 Word response

Fig.3 shows the experimental results for word response. The abscissa gives the note name of each stimulus tone.
These results can be summarized as follows: 1) RTc was shorter significantly than RTi (p< .01), 2) the average
response time for a note-name judgment (RTc= 684 ms and RTi=807 ms, Fig.1) was longer than the average
response time for word shadowing (Rtc=516 ms and Rti=589 ms).
Fig.4 also shows the results of word response. For all words, RTc were significantly shorter than RTi (p< .01).
3.3 Individual differences

1
Fig.5 shows the individual differences in note-name responses. For all subjects, RTc was significantly shorter
than RTi (p< .01).
Fig.6 shows the individual differences in word response. For subjects AU and KT, RTc was almost the same as
Rti. RTc was not necessarily larger than RTi. The average RTc was not significantly shorter than the average
RTi.
4. Conclusion
It was shown that cognitive interference between sensory information (pitch)
and semantic information (word), occurs in audition as well as in vision. Here it
is called a Stroop-like effect. Further, a Stroop-like effect in reverse, i.e. the effect in which RTc becomes
shorter than RTi for a word shadowing task, was observed. This seems like a very interesting result. Sensory
information has a greater influence on semantic information in audition than in vision.
5. Acknowledgment

1
This study was supported by a Grant-in-Aid from the Ministry of Education of Japan, for Science Research
09871019 (1997~1999).
References
Stroop,J.R., (1935). Studies of interference in serial verbal reactions. Journal of the Experimental Psychology,
18,643-662.
Back to index

Detecting intonation errors in familiar melodies
Proceedings paper

Nancy E. Kelley
Manchester College
North Manchester, Indiana
Experiments on musical pitch perception have shown that intervals with equal frequency ratios are not always
perceived as the same by musicians, but are affected by the tonal context, or key, in which the intervals are heard
(Krumhansl, 1979; Krumhansl & Keil, 1982; Krumhansl & Shepard, 1979). Memory for the pitch of single tones
has also been shown to be affected by whether a series of tones intervening between the standard and comparison
tones is tonal or atonal (Dewar, Cuddy, & Mewhort,1977; Krumhansl, 1979). These findings are thought to result
from the use of an internal frame of reference (the scales) in the encoding of pitch. The present study is a test of the
hypothesis that nonmusician's recognition memory for pitch in the form of intonation judgments concerning notes
in familiar melodies will be affected by the tonal context in which the tones are heard. Specifically, it is expected
that differences in the tonal functions, or tonal stabilities, of the tones in their respective tonalities will contribute to
differences in the ability of listeners to perceive small changes in frequency when the notes are out-of-tune. Since
the more consonant steps in the scale are more easily remembered, it is predicted that they will also be more
discriminable from close neighboring tones.
Experiment One
In the first study, participants made intonation judgments concerning notes which were the second and fifth degree
of the scale in the key of the melody in which they were heard. On half of the trials, the tone being judged had an
absolute frequency of 256 Hz (middle C). On the other half, the frequency was 384 Hz (the G above).
Method
Participants: 27 musically-untrained listeners participated as part of one of the options
for fulfilling the experimental methodology activity requirement for an introductory
psychology course at Indiana University Purdue University Fort Wayne.
Materials: Eight melodies which were familiar to the listeners were synthesized and
presented using the Hypersignal software by Hyperception, Inc. The following four
melodies contained target tones that were the fifth degree of the scale: America; Jingle
Bells; Row, Row, Row Your Boat; and Doe, A Deer. The next four melodies contained
target tones that were the second degree of the scale: The Alphabet Song; When the
Saints Come Marching In; Happy Birthday; and Here Comes the Bride.
Procedure: Each participant was asked to judge whether a particular note, the "target" in
each of eight melodies, was in tune or not. The melodies were heard free field and
judgments were entered on an answer sheet. Six judgments were available which
reflected the listener's judgment about whether the note was "right" (in-tune) or "wrong"
(out-of-tune; sharp or flat) and the degree of confidence: definitely right, right, maybe
right, maybe wrong, wrong, definitely wrong. Each listener heard nine versions of each
melody in random order for a total of 72 trials. Each version differed in (1) whether or
file:///g|/Mon/Kelley.htm (1 of 6) [18/07/2000 00:32:31]

not the target was in tune and, if not, (2) the degree to which the target was out of tune.
The nine possible targets were separated by eighth-tone steps. Version one was a
half-tone flat, version two was 3/8th-tone flat, and so on up to version nine, which was a
half-tone sharp. Participants were informed of which note was the target by using the
words of the songs. The words were presented in written form with those words
corresponding to non-target tones in black, lowercase type and those for target tones in
red, uppercase type. For example, while listening to "Row, Row, Row Your Boat",
participants would see the following:
Row, row, row your boat
Gently down the STREAM
Merrily, merrily, ...
They then made their judgment concerning the note corresponding to "stream". These
judgments were made without feedback.
Results
An initial analysis showed that responses did not differ according to the absolute frequency of the target.
Therefore, the data were collapsed across both frequencies. Figure 1 shows the total number of "right" responses of
any sort as a function of the number of 1/8 tones the target note was out-of-tune. The bell shape of the curves are
typical for frequency discrimination data with two exceptions. (1) The point at which the target was most likely to
be heard as in-tune for the P5 condition was not centered on the objectively in-tune frequency, but was slightly
higher in pitch. (2) The curves are steeper (better discrimination) when the target was flat than when the target was
sharp for both P5 and M2.
The signal detection analysis shows a similar pattern. For both P5 and M2, two separate analysis was performed for
when the targets were sharp and when they were flat. Hit rates were the percentage of correctly responding "right"
when the targets were in-tune. The two False Alarm rates were the percentages of incorrectly responding "right"
when the target was either a quarter-tone flat or a quarter-tone sharp. While Hit rates were basically the same in all
four cases, False Alarms were greater when the out-of-tune targets were on the sharp side.

Experiment Two
In the second study, participants made intonation judgments concerning notes which were the sixth and eighth
degree of the scale in the key of the melody in which they were heard.
Method
Participants: 18 musically-untrained listeners participated as part of one of the options
for fulfilling the experimental methodology activity requirement for an introductory
psychology course at Indiana University Purdue University Fort Wayne.
Materials: Eight melodies which were familiar to the listeners were synthesized and
presented using the Hypersignal software by Hyperception, Inc. The following four
melodies contained target tones that were the eighth degree of the scale: This Old Man;
Santa Claus is Coming to Town; All I Want for Christmas; Brahm's Lullaby; The next
four melodies contained target tones that were the sixth degree of the scale: Highlands;
Oh, Shenandoah; Bicycle Built for Two; and Amazing Grace.
Procedure: The same procedure was used as in the first experiment.

Results
Once again, an initial analysis showed that responses did not differ according to the absolute frequency of the
target and the data were collapsed across both frequencies. Figure 2 shows the total number of "right" responses of
any sort as a function of the number of 1/8 tones the target note was out-of-tune. The peaks of both curves are at
the point where the targets are in tune. The overall steepness of the curve for P8 is greater than for M6, showing the
P8 to be more easily discriminated from near neighbors. There is an asymmetry in the curves for both P8 and M6
with a steeper fall-off on the flat side indicating better discrimination.
The signal detection analysis shows overall better discrimination for P8. There is a conservative bias to respond
"wrong" when the target is P8 and a bias to respond "right" when the target is M6.

Discussion
First of all, musically-untrained listeners are able to hear relatively small changes in intonation. For all the P8, P5,
and M2 scale degrees, they are able to detect a quarter-tone change from the point of subjective equality (which is
slightly sharp in the case of P5). For the M6, a change of a 3/8 tone is required for reliable discrimination.
Secondly, there are clear differences in the discrimination functions for the different scale degrees. Therefore, it is
evident that musically-untrained listeners' pitch intonation judgments are sensitive to the tonal functions of the
tones. It is not so evident what mechanism(s) is(are) responsible for these differences. It is not a simple case of
more consonant tones being more discriminable than less consonant tones. While P8 targets were most
discriminable, P5 was no better than M2. The differences in response bias for the different tonal functions suggest
that the accuracy of their memory representations may influence the criteria that is used in accepting a tone as
in-tune. In other words, if one has a good idea concerning what a note such as P8 should sound like, one will have
a more strict criteria for what one calls in-tune. Of course, that begs the question of why certain tonal functions are
better represented than others. Another possibility is that a perceptual quality associated with the different tonal
functions changes at a different rate in the tones surrounding the target tones used in this study. The differences in
the discrimination functions may reflect the degree of change in this subjective quality at 1/8, 1/4, 3/8, etc. tone
differences from the in-tune note. The greater the local change, the better the discrimination.
References
Dewar, K.M., Cuddy, L.L. & Mewhort, D.J.K. (1977). Recognition memory for single tones with and without
context. Journal of Experimental Psychology: Human Learning and Memory, 3,60-67.
Krumhansl, C.L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive
Krumhansl, C.L. & Keil, F.C. (1982). Acquisition of the hierarchy of tonal functions in music. Memory &

Cognition, 10, 243-251.

Krumhansl, C.L. & Shepard, R.N. (1979). Quantification of the hierarchy of tonal functionswithin a diatonic
context. Journal of Experimental Psychology: Human Perception & Performance, 5, 579-594.
Back to index

SELF-REGULATED USE OF LEARNING STRATEGIES IN INSTRUMENTAL PRACTICE
Proceedings paper
Self-Regulated Use of Learning Strategies in Instrumental Practice.

Dr. Siw G. Nielsen, Associate Professor, The Norwegian State Academy of Music, Oslo
Introduction
In this paper I describe the self-regulated use of learning strategies as part of musicians' learning process as they master
musical works for performance. Recent research in other areas than the musical one, suggest that self-regulated learning is
crucial for making learning effective (Schunk & Zimmerman, 1998). The construct of self-regulation refers to the degree
that individuals are metacognitively, motivationally, and behaviourally active participants in their own learning
(Zimmerman, 1994). The centrepiece of self-regulated learning is strategy selection, monitoring and revision (Borkowski &
Muthukrishna, 1992). Strategy use becomes self-regulated when students use learning strategies in conjunction with their
own characteristics (e.g. their skills and knowledge), the nature of the task (e.g. its organisation and modality), and other
situational factors (e.g. the purpose of the problem solving activity) (Brown, Bransford, Ferrara, & Campione, 1983).
Previous research in instrumental learning offers only limited information about musicians' self-regulated use of learning
strategies (e.g. Chaffin & Imreh, 1997; Gruson, 1981; Hallam, 1992, 1995; Miklaszewski, 1989).
Considering learning strategies as learning activities aimed to achieve a particular goal (Weinstein & Mayer, 1986), the
present study investigated two organ students' use of learning strategies in face of different task demands within practice
sessions. It looked at the initial stage of preparing a complex piece for concert performance, and at practice sessions of a
later learning period. It also explored the similarities and differences in self-regulated use of strategies that can be found
between these periods.
Based on the criticism of cognitive strategy research by the phenomenographical research on conceptions (Laurillard, 1986;
Marton & Säljö, 1986; Uljens, 1989), the conception of task demands of each student was focused. The students may
conceptualise task demands differently, and thus, different task demands may be of importance to their use of strategies.
Method
The subjects were two third-year organ students at the Norwegian State Academy of Music in Oslo. Their teacher described
them as gifted possessing a high level of technical skill. The works practised were the Prélude from "Prélude et fugue" in B-
major (opus 7) by Marcel Dupré, and the Salve Regina movement from the second Symphony (opus 13) by Charles Marie
Widor. Both pieces represent some of their most important works from the organ repertoire of the French Romantic period.
Before recording the initial practice sessions no special auditory or analytic prestudy work had taken place. The pieces were
parts of the students' preparations for their final examinations at the Academy. The student and their teacher selected pieces
as exemplars of moderate difficulty. This was important as awareness of how we think typically will occur spontaneously
only in situations when our otherwise smooth and well-formed activities do not lead to the results or goals desired. These
situations occur when the problem being solved is of moderate difficulty (Flavell, 1987).
The results presented are based on data gathered during the first practice session and during and immediately after the
second practice in the first and second learning period (each practice session lasting one hour). The students practised on a
familiar instrument in one of their usual practice rooms. The students gave a concert performance of the pieces a few weeks
after the last recorded session.
The information was gathered through the use of observation of practice behaviour, concurrent verbal reports of
problem-solving activities during a session, and retrospective debriefing reports of problem-solving activities given after
practice (Nielsen, 1997, 1998). During pilot studies the verbal reporting techniques were adjusted to fit the purpose of the
study and the natural practice situation. These procedures were performed according to the guidelines offered by Ericsson
and Simon (1993) and Taylor and Dionne (1994), and they included the conducting of a training session and prompting. The
complementary use of concurrent verbal protocols and retrospective debriefing reports provided frequent opportunities to
verify the data reported by problem solvers and to enhance validity in the interpretation of the data collected.
Considering this, the data for this study consist of a detailed listing of the students' behavioural and verbal activities made
file:///g|/Mon/Nielsen.htm (1 of 6) [18/07/2000 00:32:34]

from the videotapes from the two periods.

Results
It became clear that the students varied their use of strategies in face of different task demands within the two learning
periods studied. In the following, I present the results from each learning period, and then compare the self-regulated use of
strategies in these periods.
The first learning period
The students varied their use of strategies in face of different task demands within the first learning period. One student
especially used learning strategies in face of transitions between different patterns in the score (a pattern being the way in
which the musical material is organised within the boundaries of a bar). The other student especially used strategies in
response to more complex patterns.
The categories of the students' stated problem beliefs were mostly consistent between the students in this learning period,
and the stated beliefs primarily concerned problems with the reliability of emerging technical plans and their execution.
Little if any, of the stated beliefs concerned problems with interpretation and the more expressive qualities of the music in
this initial learning period. This implies that the expressive qualities of the material didn't represent any problems during
practice, or that the students didn't focus on these qualities at this stage. Thus, the students' conceptions of task demands may
have been to focus on the more technical problems. Further, some of their statements seem to indicate that these problems
are temporary (e.g. playing errors), while other problems exist throughout practice sessions (e.g. difficulty with transitions
between patterns, difficulty with synchronising the manual with the pedal).
In their responses to these technical problems, both students used strategies which reduced the amount of information to be
processed simultaneously. For example, they repeated parts in different segments, repeated each hand or the pedal
separately, or repeated both hands or one hand and the pedal separately. Both students also repeated segments in a slower
tempo whenever they experienced certain weaknesses in their performance in a given tempo.
Their use of strategies in response to these problems also differed. One of the students performed larger parts of the piece in
a slow tempo (far from the final concert tempo), and at the same time stressed the execution of swift changes between hand
positions rapidly as necessary to be able to perform the piece in the final concert tempo. The other student stressed the
performance of larger parts of the piece in a slow tempo (far from the final concert tempo) playing the correct notes and
rhythms, and at the same time defined rapidity in the performance of this piece as a problem to be solved in her future
practice of it. From her point of view it is useful to focus on several parts of the piece playing correct notes and rhythms in a
slow tempo, before she begin to work on a performance of the same parts in a tempo closer to the final concert tempo.
Thus, the students differed to what extent they emphasised a use of strategies aimed to achieve both rapidity and accuracy in
their performances. Student Dupre emphasised working with both aspects, while Student Widor only emphasised the
accuracy aspect and not the rapidity aspect in her work with the piece.
Considering the foregoing, some of the stated problem beliefs by the students that concerned weaknesses in their
performances, imply that the students have knowledge of what problems they will face in the mastery of a particular piece.
This knowledge may be based on the process of mastering other pieces. Some reflections about this on the student's part are
necessary to diagnose what the different problems involve, and how (s)he is going to solve these problems most efficiently
over time. What does the student choose to interpret as a weakness or as a strength of the performance? This reflection has
to be founded on the perception of what consequences certain weaknesses will have for the final performance of the piece.
The second learning period
The students varied their use of strategies in face of different task demands in the second learning period. One student
especially used learning strategies in face of transitions between different patterns in the score, and to a lesser degree in
response to more complex patterns. The other student especially used strategies in response to more complex patterns and in
response to problems with reading the score caused by an unfamiliar clef, and in face of interpretative problems.
The categories of the students' stated problem beliefs were mostly consistent between the students in this learning period,
and for both students the stated beliefs primarily concerned problems with the reliability of emerging technical plans and
their execution. However, some of the stated beliefs concerned problems with interpretation and the more expressive
qualities of the music in this second learning period. For Student Dupre only a small number of stated beliefs concerned
such problems, while a relatively large amount of the stated beliefs concerned such problems for Student Widor. This
difference between the students is elaborated in the following statements. The students were asked about how they would
practice the piece in this second learning period compared to the first:
«I will continue to focus on some technical problems. Especially, that twisted arm movement.

Otherwise,...musically it is much the same...simple in a way. I suppose, I will practice it [the piece] much in the
same way as I already have been doing.» (Student Dupre)
«I work more with the music, but at the same time so many notes were «uncertain» [errors]. It really ended out
much the same. I had planned to focus on the music, but it is always important to me that technical problems
are sorted out. At the same time, I could imagine how it [the piece] should sound and tried to attain this.»
(Student Widor)
Student Dupre's conception of task demands mainly seems to have been to focus on the more technical problems, and to a
lesser degree the expressive qualities. Student Widor had planned for and wanted to work on the expressive qualities of the
piece, but as she experienced problems with the reliability of emerging technical plans and their execution, she changed her
focus. However, when she had improved her execution of the parts, she changed back to focus on the expressive qualities.
In their response to technical problems, both students used strategies which reduced the amount of information to be
processed simultaneously (e.g. as in the first learning period). In response to interpretative problems, they played through
larger sections of the piece or took pauses where the score was studied. Both students also stressed playing through a large
part of the piece in a tempo close to the final tempo before they focused on another part. If they identified problems within
the part practised, they continued practising this part. Both emphasised a use of strategies aimed to achieve both rapidity and
accuracy in their performances in this learning period.
The two learning periods compared
The results show the following similarities and differences in self-regulated use of strategies between these periods:
a. the students varied their use of strategies in face of different task demands in the two learning periods.
b. one student especially used learning strategies in face of transitions in patterns and to a lesser degree in response to
more complex patterns.
c. the other student especially used strategies in response to more complex patterns and in response to problems with
reading the score caused by an unfamiliar clef, and in face of interpretative problems.
d. both students used strategies in response to more complex patterns in the second learning period, although complexity
was not the most important task demand for use of strategies for one of the students.
e. some problems were temporary, while others existed throughout each practice session.
f. both students focused on the technical execution in the initial learning period.
g. one student kept a technical focus in the whole time period, while the other gradually to a larger extent focused on the
expressive problems of the piece.
h. the students use of strategies were to some extent not predetermined, but seemed to depend on the success of the
ongoing strategy use, and the students available and accessible knowledge about their own learning process and the
task.
i. in their response to technical problems, both students used strategies which reduced the amount of information
processed simultaneously.
j. with their decisions to focus on the opposing alternatives «rapidity» and «accuracy» the students formed different
profiles in their use of strategies across the learning periods.
Discussion: A preliminary model
Based on the results from this study and from earlier research on learning activities in instrumental practice (e.g. Chaffin &
Imreh, 1997; Miklaszewski, 1989) and theory of self-regulated learning, a preliminary model of self-regulated use of
learning strategies during instrumental practice is presented in the following. It demonstrates the complexity and the
diversity of the students process of self-regulated use of strategies in practice, and consists of different factors that may
influence the student's use of strategy and their interrelations.
Considering theory of self-regulated learning important factors are the students own characteristics (e.g. their skills and
knowledge), the nature of the task (e.g. its organisation and modality), and other situational factors (e.g. the purpose of the
problem solving activity). In the following model «the students own characteristics» were conceptualised as their
metacognitive competence and self-efficacy beliefs, «the nature of the task» as their problem beliefs, and «the purpose of the
problem solving activity» as their self-evaluation of their performances during practice.
As such, the core of this model consists of the student's «problem belief», «strategy use» and «self- evaluation», and their
interrelations. Their contents depend on changes as the musical work is mastered. In the course of mastery, problem beliefs
may be revised (e.g. technical or expressive problems), and the student's self- evaluation relies on criteria that may be
revised (e.g. rapidity or accuracy criteria).

Results from this study and from earlier research on learning activities in instrumental practice (e.g. Chaffin & Imreh, 1997;
Miklaszewski, 1989), also suggest that the students' problem beliefs are influenced by patterns in the musical material and
that these beliefs may be revised due to the students' evaluation of their performance of the music. Further, the problem
belief may influence the strategy use during practice. Based on theory of self-regulated learning the metacognitive
competence of the students and their self-efficacy beliefs may also influence the strategy use. Their use of strategies may
also be independent of this control.
This imply that a model of the process of self-regulated use of strategies during practice must indicate a relationship
between problem belief and patterns in the musical material where the musical material forms the basis of the problem
belief. It must also indicate a relationship between problem belief and the student's self-evaluation of the performance where
the problem belief follow as a consequence of the self-evaluation. Further, it must indicate a relationship between problem
belief and use of strategies where the use of strategies follow as a consequence of the problem belief. Based on the theory of
self-regulated learning it must also indicate a relationship between the self-evaluation of the performance and self-efficacy
beliefs where self-efficacy beliefs follow as a consequence of the self-evaluation. Further, a relationship between
self-efficacy beliefs and a continued use of strategies must be indicated. Finally, it must indicate the possibility of
relationships between the student's metacognitive competence and use of strategies where the use of strategies follow as a
consequence of the student's metacognitive knowledge and control. These factors and their interrelations are summed up in
the following preliminary model (see Figure 1):
FIGURE 1
A preliminary model of the process of self-regulated use of strategy in instrumental practice.

This model illustrates the process of self-regulated use of strategies during practice. Based on a problem to be solved, the
student's strategy use, performance of the piece, and self-evaluation of the performance (the black line in the model), this
model indicates four problem solving alternatives to follow:
a. The student evaluates the performance as successful, and focus on a new problem (the lilac line in the model).
b. The student evaluates the performance as unsuccessful, but have confidence in the chosen strategy to solve the
problem. The student increase the effort by continued use of the same strategy to solve the problem (the red line
in the model).
c. The student evaluates the performance as unsuccessful, and have no confidence in the chosen strategy to solve
the problem. The student increase the effort by revising the strategy use in the continued problem solving (the
green line in the model).
d. The student evaluates the performance as unsuccessful, but the performance give reason to revise the problem
belief. The student increase the effort by revising the problem belief and then the use of strategy in the
continued problem solving (the blue line in the model).
However, this model is preliminary, and must be considered as forming conjectures of this process and not as a
generalisation of the present study's results.
Conclusions
To conclude, this study shows that able students self-regulate their use of learning strategies in face of different task
demands. As such it confirms some of the presumed self-regulated use of strategies that earlier research has suggested (e.g.
Chaffin & Imreh, 1997; Miklaszewski, 1989). This concerns the relationships between use of strategies and patterns in the
musical work practised based on the students' task conceptions. On the other hand, some of the hypothesis from earlier
research were not «verified». The present study shows that complexity may be of importance to the students' use of
strategies not only in the initial learning period, but also in later learning periods. It also shows that the students'
self-evaluations relies on criteria that may be revised in the course of mastery.
The results could be seen as demonstrating the student's need to monitor his or her use of strategies during practice as a
prerequisite for being able to self-regulate the use of strategies in face of different task demands in practice.
References
Borkowski, J. G., & Muthukrishna, N. (1992). Moving metacognition into the classroom: "Working models"
and effective strategy teaching. In M. Pressley, K. R. Harris and J. T. Guthrie (Eds.). Promoting academic
competence and literacy in school. San Diego: Academic Press. pp 477-501.
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering, and
understanding. In P. H. Mussen (Ed.). Handbook of child psychology: Cognitive development. New York:
Wiley. 4th ed. pp 77-166.
Chaffin, R., & Imreh, G. (1997). «Pulling teeth and torture»: Musical memory and problem solving. Thinking
and Reasoning, 3, 315-336.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analyses: Verbal reports as data. London: MIT.
Flavell, J. H. (1987). Speculations about the nature and development of metacognition. In F. E. Weinert and R.
H. Kluwe (Eds.). Metacognition, motivation, and understanding. New Jersey: Erlbaum. pp 21-29.
Gruson, L. M. (1981). What distinguishes competence: An investigation of piano practising. Unpublished Ph.D.
thesis, University of Waterloo, Canada.
Hallam, S. (1992). Approaches to learning and performance of expert and novice musicians. Unpublished
Ph.D. thesis, University of London, London.
Hallam, S. (1995, August). Qualitative changes in practice and learning as musical expertise develops. Paper
presented at the VIIth International Conference on Developmental Psychology, Krakow.

Laurillard, D. (1986). Att lära genom problemlösning. In F. Marton, D. Hounsell and N. Entwistle (Eds.). Hur
vi lär [How we learn]. Stockholm: Rabèn & Sjögren. pp 171-197.
Marton, F., & Säljö, R. (1986). Kognitiv inriktning vid inlärning. In F. Marton, D. Hounsell and N. Entwistle
(Eds.). Hur vi lär [How we learn]. Stockholm: Rabèn & Sjögren. pp 56-80.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of Music, 17,
95-109.
Nielsen, S. G. (1997). Self-regulation of learning strategies during practice: A case study of a church organ
student preparing a musical work for performance. In H. Jorgensen and A. C. Lehmann (Eds.). Does practice
make perfect? Current theory and research on instrumental music practice. NMH-publications 1997:1. Oslo:
The Norwegian State Academy of Music. pp 109-122.
Nielsen, S. G. (1998). Selvregulering av læringstrategier under øving: En studie av to utøvende
musikkstudenter på høyt nivå. [Self-regulation of learning strategies during practice: A study of two church
organ students]. NMH-publications 1998:3. Oslo: The Norwegian State Academy of Music.
Schunk, D. H., & Zimmerman, B. J. (Eds.) (1998). Self-regulated learning. From teaching to self-reflective
practice. New York: The Guildford Press.
Taylor, K. L., & Dionne, J. P. (1994, April). Accessing problem solving strategy knowledge: The
complementary use of concurrent verbal reports and retrospective debriefing. Paper presented at The American
Educational Research Association Annual Meeting, New Orleans.
Uljens, M. (1989): Fenomenografi - forskning om uppfattningar [Phenomenography - research on conceptions].
Lund: Studentlitteratur.
Weinstein, C. E., & Mayer, R. E. (1986). The teaching of learning strategies. In M.C. Wittrock (Ed.).
Handbook of research on teaching. New York: McMillan. pp 317-327.
Zimmerman, B. J. (1994): Dimensions of academic self-regulation: A conceptual framework for education. In
D. H. Schunk and B. J. Zimmerman (Eds.). Self-regulation of learning and performance: Issues and educational
implications. New Jersey: Erlbaum. pp 3-21.
Back to index

Similarity paper (Keele 2000)
Proceedings paper
Categorising Folk Melodies Using Similarity Ratings

Tuomas Eerola, Topi Järvinen, Jukka Louhivuori & Petri Toiviainen
University of Jyväskylä
Department of Music
P. O. Box 35
FIN-40351 Jyväskylä
FINLAND
{ptee, tjarvine, louhivuo, ptoiviai}@cc.jyu.fi
1 Introduction
1.1 Categorisation and similarity
The ability to classify musical styles is an important and intriguing task from the perspective of music cognition. This process, which
listeners usually do effortlessly, involves integrating a number of perceptual processes. Recent summaries on categorisation divide
these into two; 1) rule application 2) similarity computations (Smith & Patalano, Jonides, 1998; Hahn & Chater, 1998). This paper
considers the latter using the statistical frequencies of events, which have been shown to be influential in learning and perception of
language and sound patterns (e.g. Saffran, 1999). We also limit our line of inquiry into melodic similarity, since this allows to test and
develop the frequency-based measures of melodic similarity that aim to tackle some of the categorisation and classification challenges
music history holds for us.
1.2 Melodic similarity
There has been a moderate amount of research into melodic similarity and a number of experiments have shed light on the parameters
that give rise to this phenomenon. Findings by Dowling (1971, 1978) indicate that one of the main factors of similarity is contour
information, which is essential in short-term comparisons (Dowling & Bartlett, 1981) and in shorter melodies (Edworthy, 1985; Cuddy
et al, 1981). Some studies have concentrated on melodic archetypes (Rosner & Meyer, 1982, 1986), hierarchical structure (Serafine,
Glassman & Overbeeke, 1989), themes (Pollard-Gott, 1983), motifs (Lamont & Dibben, 1997), whether melodies use scalar or
non-scalar tones (Bartlett & Dowling, 1980; Dowling & Bartlett, 1981, 1988), and more recently, on transposed melodies (van Egmond
et al, 1996), the effects of pitch direction, contour and pitch information (Dewitt & Crowder, 1986; Eiting, 1984; Freedman, 1999;
Hofmann-Engl & Parncutt, 1998), and pitch range and key distance (van Egmont & Povel, 1994). Commonly, rhythm has been
considered as a separate entity (Palmer & Krumhansl, 1990; Simpson & Huron 1993; Gabrielsson, 1973) except by Monahan &
Carterette (1985), who studied both rhythm and tonal dimensions as constituents of similarity.
Theoretical models of melodic similarity include Cambouropoulos' (1995, 1997) formal definition of similarity based on the number of
coinciding attributes of melodies. Smith, McNab & Witten (1998; also Orpen & Huron, 1992) have defined similarity as the
complexity of the transformation process involved in mapping one object onto the other. Models that deal with contour and interval
information of the melodies include work by Deutsch & Feroe (1981), Ó Maidín (1998), and Hofmann-Engl & Parncutt, (1998). The
wide range of the focus of the research and the models can be credited to the multidimensional nature of melodic similarity. The
approach used in this paper differs from the previous approaches in the sense that both rhythm and pitch are considered as statistical
entities that are hypothesised to provide perceptually salient cues for similarity.
1.3 Similarity and statistical properties of the melodies
Classifying melodies and musical styles according to the statistical distribution of different intervals, rhythmic patterns, or pitches has a
long history in ethnomusicology (Freeman & Merriam, 1956; Lomax, 1968). Research on music cognition and learning has
demonstrated the effect of statistical information for learning and perception using both cross cultural studies (Castellano et al, 1984;
Kessler et al, 1984; Krumhansl et al, 1999) as well as studies using melodies in which the statistical properties of music have been
intentionally manipulated (Oram & Cuddy, 1995). The results show that listeners are sensitive to pitch distributional information.
Evidence from different modalities has also shown the importance of frequency information in cognitive processes (e.g., Saffran et al,
1999). In light of this evidence, it seems that statistical properties of melodies could provide a means for classification of musical styles
in terms of their perceptual similarity. Indeed, studies using this approach have been successful, for example, Järvinen, Toiviainen &
Louhivuori (1999) classified ten different musical styles based on the distributions of tones and tone transitions. The results, which
file:///g|/Mon/Eerola.htm (1 of 6) [18/07/2000 00:32:37]

were visualised by self-organizing maps (SOM), conformed with the musicological descriptions of the particular musical styles.
Related methodology has been used by others (e.g. Smaill & Westhead, 1993; Atalay & Placek, 1997; Hörnel, 1998) with success.
In spite of appealing results of the classifications based on statistical properties of melodies, it remains uncertain how well these
methods simulate human classification process. A behavioural categorisation task was designed to investigate how effectively the
statistical properties of the melodies can account for the categorisation by listeners. This was achieved by examining listeners'
similarity ratings and similarities of the statistical properties of the melodies.
2 Method
2.1 Subjects
17 undergraduate music students (mean age 23.4 years, SD = 4.2 years) took part in the study. They reported that they had studied
music for mean of 5.8 years (SD = 4.4), and having music as hobby for mean of 14.6 years (SD = 2.6). The listeners reported that they
were not familiar with the melodies used in the experiment. The data from all listeners were pooled into a single group for analysis as
the mean intersubject correlation was significant (r= .41, df= 103, p< .001).
2.2 Stimulus materials
Five distinct musical styles were selected for the experiment. These were North Sami yoiks (Y), Finnish Spiritual folk hymns (H), Irish
hornpipes (I), German folksongs (G) and Greek folksongs (C). Three typical melodies of each style were chosen by native experts and
the selection criteria included typicality, major mode, and moderate tempo. All melodies contained complete phrases, were
approximately equal in length, transposed to C-major, encoded as MIDI files, and had the same tempo (110 bpm), timbre (French horn
sample) and the velocity of tones. The playback was controlled by a computer.
2.3 Procedure
The task of the listeners was to rate the similarity of pairs of melodies on a scale of 1-9, using a slider on the computer screen. All 105
possible combinations were randomly paired and ordered. The interstimulus time between the melodies was 1500ms and the total
duration of the experiment was 75min. All listeners were tested individually. Before the actual experiment, listeners filled a musical
background questionnaire, read the instructions about the experiment and did three practice trials.
2.4 Similarity measures derived from statistical properties
The MIDI files of the melodies were converted into kern representation and analysed with Humdrum Toolkit (Huron 1994) and some
additional Perl and shell scripts. The first set of statistical measures, distribution of the tones, intervals, and tone durations, as well as
distribution of two-tone transitions, interval transitions, and duration transitions, were obtained for each melody. The similarities of
these measures were computed using the city block distance measure between all the statistical measures of the melodies. The second
set of similarity measures consisted of three difference measures between the intervals of melodies. First was normalised interval
difference, which, according Hofmann-Engl & Parncutt (1998), can explain a large part of the similarity ratings. The normalised
interval difference between two fragments was defined as the sum of the absolute differences between the intervals of two fragments
divided by the number of intervals. Additionally, two alternative versions of the normalised interval difference were used; the first was
computed using correlation coefficients between the intervals of the melodies and second disregarded the size of the intervals (i.e.
considered only the up-down patterns of the melodies). Unfortunately the melodies had different number of tones and consequently the
first 12 and last 12 tones and a combination of them were used instead. Finally, a number of descriptive variables of the melodies were
obtained. The first group of these consisted of correlation of tone profile to C major profile (Krumhansl & Kessler, 1982) and
predictability (an aggregate function of tonality, proximity and intervallic difference). The second group of variables concerned
roughly with the qualities of the intervals: mean proximity of tones, registral return, registral direction, closure, intervallic difference,
consonance, tessitura and mean pitch. The third group of variables portrayed the rhythmic qualities of melodies and were syncopation,
rhythmic variability and rhythmic activity (no. of tones/sec). Again, similarities of these measures were obtained by computing the city
block distance between all measures of the melodies.
3 Results
3.1 Perceived distances between the melodies
The mean similarity rating of the listeners were analysed using multidimensional scaling (ALSCAL), which is able to produce spatial
distances between the items that are regularly interpreted as the salient dimensions that underly the perceptual experience (see
Nosofsky, 1992). The difference between the 2-dimensional (R2= .81) and 3-dimensional (R2= .89) solution was small. If we look at
the two dimensional solution (Figure 1), few clear patterns emerge; the Greek songs (G1, G2, G3) can be seen as being further away
from others, meaning that these melodies were most distinctly different from all the other melodies. The three yoiks (Y1, Y2, Y3) were
perceived as more homogenous and can be seen clustered closely together. It is interesting to note that Finnish Spiritual folk hymns
(H1, H2, H3) were fairly similar and listeners found one German folksong and one Irish Folksong (I2, G2) to bear a resemblance to
hymns, which was reasoned to occur because of the number of common elements they share. However, it was expected that three
samples per style is not enough to create stylistic categories and thus the results reflect more individual properties of the melodies.

Figure 1. Two-dimensional classical multidimensional scaling solution (R2 = .81, N = 15). Y=Yoiks,
H=Hymns, G=German, I=Irish, C=Greek.
3.2 Association between the statistical properties of melodies and listeners' similarity ratings
A comparison between the similarities derived from the statistical properties of the melodies and listeners' similarity judgements was
done by multiple regression analysis. The similarity measures were regressed upon similarity ratings of listeners for all pairs of
melodies. The overall prediction rate was fairly low, (R2= .41, F= 22.67, df= 3,101, p< .001) and revealed that the distribution of
duration transitions explained 20%, note transitions 13% and normalised interval difference (of first 12 and last 12 intervals) 6% of the
variance in listeners' similarity ratings. In other words, the melodies that possessed similar rhythms and had similar note transitions and
interval differences were judged to more similar by the listeners. In a previous study (Hofmann-Engl & Parncutt, 1998), normalised
interval difference accounted for 76% of the similarity judgements but this was not the case here. One reason for this could be the
difference in the number of tones of the melodies, which makes it difficult to use this measure properly. As the connection between
statistical properties and perceived similarities of the melodies was only moderate, the salient dimensions of listeners' ratings was
studied in more detail.
3.3 Salient dimensions of the melodies
The dimensions of the scaling solution were correlated with the descriptive variables of the melodies. Dimension 1 in the two
dimensional solution correlated with mean pitch, predictability, registral return, registral direction, and rhythmic activity (r= .926, r=
.804, r= .549, r= .594, r= .629, respectively, p< .05 and df= 13 in all cases). Dimension 2 correlated with rhythmic variability (r=.662,
df=13, p<.01). Thus, there is no obvious explanation for dimension 1, but it could be interpreted as the predictability of melodies,
which consists of regularity of big intervals and mean pitch height. Dimension 2, again, can be interpreted as rhythmic dimension. In
the three dimensional solution, the results displayed the same pattern of correlations and the third dimension correlated with the

consonance of successive pitches (r= .52, df= 13, p< .05). Factor analysis (PCA) was used to test these interpretations, i.e. whether it
could reduce the number of components corresponding with the selected labels for the axes. The first component explained 30%,
second 22% and third 15% of the variance and were labelled "quality of intervals", "rhythmical aspects", and "tonality". Although both
regression and PCA analysis suggest that in broad terms, pitch and rhythm could be possible categorisation factors of these melodies
corresponding with the findings by Monahan & Carterette (1985), these analyses clarified that the dimensions could not be easily
interpreted because of the multidimensional nature of the melodies used in the experiment.
3.4 Classification of melodies using statistical features of the melodies
The statistical frequencies of the individual melodies were subjected to a hierarchical cluster analysis. The results of this analysis were
consistent with listeners' similarity judgements, distinguishing Yoiks and Greek folk songs as separate categories. Also, the grouping of
melodies that represented different styles but were rated as highly similar (e.g., H2, I2, G2), was apparent in the cluster solution. Still,
these results must be viewed with caution, as the statistical properties of melodies were not particularly effective predictors of listeners'
similarity ratings.
3.5 Conclusions
Only moderate success was achieved by explaining the perceived similarities of the melodies with the similarities of the statistical
properties of the melodies. There might be several reasons for this. Firstly, the small number of statistical data available does not do
justice to the statistical approach. In the previous studies, considerably larger samples were used (e.g. 100 melodies in Järvinen et al,
1999; 80 works in Atalay & Placek, 1997) and therefore the results of the previous studies might reflect more appropriately the musical
style in general, whereas the responses here were more driven by the unique features of the melodies. In effect, these unique features
probably caused listeners to adopt different strategies when making the similarity judgements, which is not accommodated by the
measurement models. Secondly, the melodies were fairly long and did not contain equal amount of tones, causing difficulties for
listeners and similarity models. Thirdly, the measurement models did not consider any weighting of the events although it is plausible
that all events are not perceptually equal in a melody.
4 Discussion
Categorisation of melodies using similarity ratings was compared with categorisation by means of statistical features of the melodies.
The results of the study suggest that these statistical measures are able to capture only few basic aspects of the structures which portray
common salient dimensions to which listeners pay attention whilst categorising melodies. The overall problems in this approach lie in
the a) multidimensional nature of the melodies and b) oversimplified representation of melodic information. In future studies, the
multidimensionality could be reduced by better control of the parameters. Also, the effects of musical expertise should be taken into
account as well as different experimental paradigms (e.g. free classification task) should be employed. However, the main implication
of the current study is that classification method based on statistical properties of melodies offers only a moderate degree of success.
One reasonable explanation for this is offered by Keil and his colleagues (1998, p. 107) when explaining why higher-order tabulations
of similarity usually fail: "bottom-up statistical patterns do not always drive reasoning: we often use high-level schema to impose
interpretations of statistical patterns". One way of imposing higher level structure to the frequency-based classification methods would
be distinguishing and weighting the frequencies according to their perceptual prominence. The other is to consider what kinds of rules
are actually applied in evaluating melodic similarity. Nonetheless, questions about the representation and the similarity of melodies
remain central in music perception and using several approaches in this investigation will be essential in future.
5 References
Atalay, B. V., & Placek, R. (1997). Machine versus human: Responding to the task of identifying eras for keyboard pieces. In A.
Gabrielsson: Proceedings of the Third Triennial ESCOM conference, Uppsala University, pp. 521-526.
Bartlett, J. C., & Dowling, W. J. (1980). Recognition of transposed melodies: A key-distance effect in developmental
perspective. Journal of Experimental Psychology: Human Perception and Performance, 6, 501-515.
Bartlett, J. C., & Dowling, W. J. (1981). The importance of interval information in long-term memory for melodies.
Psychomusicology, 1(1), 30-49.
Bartlett, J. C., & Dowling, W. J. (1988). Scale structure and similarity of melodies. Music Perception, 5(3), 285-314.
Cambouropoulos, E. & Smaill A. (1995). A computational model for the Discovery of Parallel Melodic Passages. Proceedings of
the 11 Colloquio di Informatica Musicale, Bologna.
Cambouropoulos, E. (1997). The role of similarity in categorisation: Music as a case study. In A. Gabrielsson: Proceedings of the
Third Triennial ESCOM conference, Uppsala University, pp. 533-538.
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of North India. Journal of
Experimental Psychology: General, 113, 394-412.
Cuddy, L. L., Cohen, A. J., & Mewhort, D. J. K. (1981). Perception of structure in short melodic sequences. Journal of
Experimental Psychology: Human Perception and Performance, 7, 869-883.
Deutsch, D. & Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological Review, 88(6),
503-522.

Dewitt, L. A., & Crowder, R. G. (1984). Recognition of novel melodies after brief delays. Music Perception, 3(3), 259-274.
Dowling, J. W. (1971). Recognition of inversions of melodies and melodic contours. Perception & Psychophysics, 9, 348-349.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review, 85,
341-354.
Edworthy, J. (1985). Interval and contour in melody processing. Music Perception, 2(3), 375-388.
Eiting, M. H. (1984). Perceptual similarities between musical motifs. Music Perception, 2(1), 78-94.
Freedman, E. G. (1999). The role of diatonicism in the abstraction and representation of contour and interval information. Music
Perception, 16(3), 365-387.
Freeman, L. C. and Merriam, A. P. (1956). Statistical classification in anthropology: An application to ethnomusicology.
American Anthropologist, 58, 464-472.
Gabrielsson, A. (1973). Similarity ratings and dimension analysis of auditory rhythm patterns. I & II. Scandinavian Journal of
Psychology, 14, 138-160, 161-176.
Hofmann-Engl L. & Parncutt R. (1998). Computational modeling of melodic similarity judgments: Two experimetns on
isochronous melodic fragments. http://freespace.virgin.net/ludger.hofmann-engl/similarity.html
Huron, D. (1994). UNIX tools for musical research: The humdrum toolkit reference manual. Stanford, CA: Center for Computer
Assisted Research in Humanities.
Hahn, U. & Chater, N. (1998). Similarity and rules: Distinct? Exhaustive? Empirically distinguishable? Cognition, 65(2-3),
197-230.
Hörnel, D. (1998). A multi-scale neural-network model for learning and reproducing choral variations. In W. B. Hewlett & E.
Selfridge-Field (Eds.). Melodic Similarity: Concepts, Procedures, and Applications. Cambridge, Massachusetts: MIT Press. pp
141-157.
Järvinen, T., Toiviainen, P., & Louhivuori, J. (1999). Classification and categorization of musical styles with statistical analysis
and self-organizing maps. Proceedings of the AISB'99 Symposium on Musical Creativity. Edinburgh: AISB, 54-57.
Keil, F. C., Smith, W. C., Simons, D. J., & Levin, D. T. (1998).Two dogmas of conceptual empiricism: Implications for hybrid
models of the structure of knowledge. Cognition, 65(2-3), 103-135.
Kessler, E. J., Hansen, C., & Shepard, R.N. (1984). Tonal schemata in the perception of music in Bali and the west. Music
Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation
of musical keys. Psychological Review, 89(4), 334-368.
Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodic expectation in Finnish Spiritual Folk
Hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception, 17, 151-196.
Lamont, A. & Dibben, N. (1997). Perceived similarity of musical motifs: An exploratory study. Proceedings of SimCat 1997: An
Interdisciplinary Workshop on Similarity and Categorisation, Department of Artificial Intelligence, University of Edinburgh, pp.
143-149.
Lomax, A. (1968). Folk song style and culture. Washington, D.C.: American Association for the Advancement of Science.
Monahan, C. B., & Carterette, E. C. (1985). Pitch and duration as determinants of musical space. Music Perception, 3, 1-32.
Nosofsky, R. M. (1992). Similarity scaling and cognitive process models. Annual Review of Psychology, 43, 25-54.
Ó Maidín, D. (1998). A geometrical algorithm for melodic difference. In W. B. Hewlett & E. Selfridge-Field (Eds.). Melodic
Similarity: Concepts, Procedures, and Applications. Cambridge, Massachusetts: MIT Press. pp 65-72.
Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional information in melodic sequences.
Psychological Research , 57, 103-118.
Orpen, K. & Huron, D. (1992). The measurement of similarity in music: A quantitative approach for non-parametric
representations. Computers in Music Research, 4, 1-44.
Palmer, C., & Krumhansl, C. L. (1990). Mental representations for musical meter. Journal of Experimental Psychology: Human
Perception and Performance, 16, 728-741.
Palmer, S., (1983). The psychology of perceptual organization. A transformational approach. In (ed.), Human and Machine
Vision , (PP. 269-339). New York: Academic Press.
Pollard-Gott, L. (1983). Emergence of thematic concepts in repeated listening to music. Cognitive Psychology, 15, 66-94.

Rosner, B, & Meyer, L. B. (1982). Melodic process and the perception of music. In The psychology of music (ed. D. Deutsch),
pp. 317-42. Academic, New York.
Rosner, B, & Meyer, L. B. (1986). The perceptual roles of melodic process, contour, and form. Music Perception, 4, 1-40.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and
adults. Cognition, 70(1), 27-52.
Simpson, J. & Huron, D. (1993). The perception of rhythmic similarity: A test of a modified version of Johnson-Laird's theory.
Canadian Acoustics, 21(3), 89-90.
Smaill, A., & Westhead, M. D. (1993). Automatic characterisation of musical style. Music Education: An artificial Intelligence
Approach, pp. 157-170. Springer-Verlag.
Smith, E. E., Patalano, A. L., & Jonides, J. (1998). Alternative strategies of categorization. Cognition, 65(2-3), 167-196.
Smith, L. A., McNab, R. J., & Witten, I. H. (1998). Sequence-based Melodic comparison: A dynamic-programming approach. In
W. B. Hewlett & E. Selfridge-Field (Eds.). Melodic Similarity: Concepts, Procedures, and Applications. Cambridge,
Massachusetts: MIT Press. pp 101-117.
Serafine, M. L., Glassman, N., & Overbeeke, C. (1989). The cognitive reality of hierarchic structure in music. Music Perception,
6(4), 397-430.
van Egmond, R. & Povel, D-J. (1996). Perceived similarity of exact and inexact transpositions. Acta Psychologica, 92(3),
283-295.
van Egmond, R., Povel, D-J., & Maris, M. (1996). The influence of height and key on the perceptual similarity of transposed
melodies. Perception and Psychophysics, 58, 1252-1259.
Back to index

Engagement and Experience: A Model for the Study of Children's Musical Cognition
Proceedings paper
Lori A. Custodero
Teachers College, Columbia University
lac66@columbia.edu
Background
Engagement in authentic musical experiences provides a critical vantage point from which to study musical thinking.
Observations of children's active engagement with the environment suggest that their creative solutions to presented intellectual
problems provide insight into their cognitive processes (Bloom, 1993; Duckworth, 1996). Although investigations of children's
musical problem solving in clinical contexts (e.g. Bamburger, 1999) have led to important theories regarding musical thinking,
these studies typically involve older children. The current research model, based on the paradigm of flow experience
(Csikszentmihalyi, 1975, 1988, 1993, 1997), was developed to systematically address the complexities of musical problem
solving (and problem finding) behaviors in young children's naturalistic environments.
Flow experience refers to a state of optimal enjoyment, when participants are thoroughly involved in intrinsically rewarding
activity. Defined by an individual's perceived match between high challenge and high skill levels, it creates an ideal learning
situation, since in order to sustain flow, skills must improve to meet challenges. This dynamic interaction between skills and
challenges, also known as emergent motivation (Csikszentmihalyi, 1982), is self-perpetuating: As an individual's skill level
improves through practice, challenges must become increasingly complex. Observing children's attempts to solve the problem
of sustaining flow through adjustment of their own challenge levels in an intrinsically rewarding activity such as music making,
will conceivably provide information regarding cognitive processes.
Although flow experience has traditionally been studied in adults and adolescents, it may be hypothesized that young children
must be in constant pursuit of flow-type experiences. Maintaining the flow state may represent an inherent means of engaging
with the world, their need to keep sufficiently challenged reflective of their expanding skills base. Several models from
developmental psychology speak to the intrinsic nature of challenge-seeking behavior. Studies of mastery motivation focus on
children's dispositions for engagement in appropriately challenging activities (Barrett, Morgan, & Maslin-Cole, 1993; McCall,
1995). Children's inherent drive to learn about the world through transforming phenomena in their environment is posited by
Feldman (1994) as a rationale for creativity. One might interpret this "transformational imperative" as a flow sustaining
strategy, especially in the context of creative musical experience.
As children strive to keep skills and challenges balanced through mastering or transforming activity, they are employing both
problem solving and problem finding behaviors. In his studies on children's mathematical thinking, Siegler (1996) has found
variability within and between children as they demonstrate choices of cognitive strategies, supporting the idea that children's
solutions to problems are not universal or simply categorized.
Drawing upon the evidence above, the observational flow model is grounded in the following assumptions: (a) flow experience
provides a window into cognition by virtue of its challenge-skill dynamic; (b) young children are compelled to seek out
appropriately challenging activity to maintain the flow state; and (c) the strategies young children use to monitor their flow
states during musical experiences may provide cues to their musical thinking.
Since conventional flow research relies on self-reports of adolescents and adults, operationalizing children's flow-monitoring
behaviors was crucial to adapting the original methodology for younger participants. A valid and reliable observational protocol
was developed for use with young children ages 4-5 in a music instructional setting utilizing the researcher-developed Flow
Indicators in Musical Activities (FIMA) coding form (Custodero, 1998). This measurement tool included nine affective and
eight behavioral indicators, as well as a global measure of flow used as the dependent variable in multiple regression tests. Both
groups of indicators were submitted to factor analysis resulting in four affective and three behavioral factors.
Affective factors found to predict flow included Potency (alert, involved, active) and Self-concept (satisfied, successful),
replicating findings in conventional flow research (e.g., Csikszentmihalyi & Larsen, 1987). Behavioral factors found to predict
flow were more provocative. Skill loaded with anticipation, expansion, and extension -- consistently observed transformations
of the teacher-delivered material. Challenge, defined by self-assignment and self-correction behaviors as well as a deliberate,
controlled quality of gesture, loaded with adult awareness. Peer awareness loaded with imitation intensity, which did not predict
file:///g|/Mon/Custoder.htm (1 of 8) [18/07/2000 00:32:40]

flow.
Whether or not these flow indicators in musical contexts have similar meaning for children of other ages has not yet been
investigated. The purpose of the current study is to explore the observational flow model's relevance for understanding musical
cognition in children younger and older than those in the original study. Observations of infants and toddlers, as well as
school-aged children in the context of adult-guided music activity were analyzed for comparative behavioral manifestations of
flow experience.
Method
Participants represented four groups of children in three different settings; all groups were located in an urban region of the
Northeastern U.S. Informed consent was granted by parents of the children. Group 1 included four female and four male infants
ranging in age from 7 to 23 months (M = 14.25, SD = 5.75). They were enrolled in a university laboratory daycare facility,
where there were four caregivers. Group 2 included
seven female and three male toddlers ranging in age from 25 to 34 months (M = 30.2, SD = 2.57). They were in the same
facility as Group 1, in their own room with four caregivers.
Group 3 included three female and three male kindergarten children aged 5-6 years from an inner-city public school violin
program. Children were chosen for the program based on lottery; they were dismissed from their regular classes twice weekly
to attend group violin lessons in the basement cafeteria of their large public school. The teacher is an internationally known
string pedagogue with a prescribed, technique-based approach. The setting was chosen for its unique teaching environment; the
class, because of its beginner status.
Group 4 included two female and three male children aged 6-8 enrolled in an after school program. The instructor was a
certified Dalcroze teacher whose less directive approach featured creative movement and singing; classes were held at an
affluent community music school. The setting offered an alternative curriculum and teaching style as well as a similar onset of
instruction relative to Group 3.
Groups 1 and 2 participated in regular weekly music sessions with the researcher and an assistant. These sessions were
videotaped, four sessions during a two month period were randomly chosen for analysis. The sequence of activities during these
sessions was determined by the children: the researcher would enter the room with a bag of familiar music playthings - scarves,
plastic microphones, small musical instruments, books to sing, and recordings, each representing a specific song or repertoire of
activities. The children were free to choose what they wanted to play with, move to, or sing. Caregivers actively participated in
the 30-40 minute sessions. Written reports on children's music making outside the sessions were collected twice in the 2-month
period from both parents and caregivers.
Two 45-minute classes were videotaped for both Group 3 and Group 4. In Group 3, children stood in formation for most of the
session, activities centered on holding and positioning the violin. In Group 4, children were either moving around the large
room to the teacher's piano playing or sitting in a circle on the floor singing scale patterns and clapping rhythms. Both settings
included teaching assistants; neither had parent participation. There were no parent reports collected; discussions with the
teachers provided general information about participants and pedagogical approach.
Videotapes were coded using an adapted version of the FIMA form (Custodero, 1998). Behavioral flow indicators that had
meaningful associations with observed flow were used descriptively to generate rich, accurate depictions of behaviors in these
new contexts. Observed indicators included (a) self-assignment, (b) self-correction, (c) deliberate gesture, (d) anticipation, (e)
expansion, (f) extension, and (g) awareness of others (see Table 1.) The last category differed from the first six in that it was a
constant which varied in degree of relevance to the child's experience, rather than an indicator that was discernibly present or
not present. Narrative descriptions were constructed for each child in clear view during each music activity; these portraits
included as many of the indicators as were distinctly observable. Interrater reliability was addressed through consensus between
multiple coders.
Results
Group 1
Self-assignment and self-correction.
In the infant group, observable self-assignment behaviors took the form of initiating activities with objects. Examples included
putting the scarf over their faces to initiate the peek-a-boo song, and requesting movement or singing activities by physically
bringing books and CDs to the researcher. More subtle behaviors were also observed, such as the child's touching a maraca to
her toe, reflective of specific movement to a song we had done in previous weeks. Children over 12 months were able to
demonstrate these behaviors consistently. Younger children's initiating behaviors included reaching for a maraca and requesting
to be picked up and held as a dance partner; there were also instances of their initiating the peek-a-boo song with scarves. When
unable to physically activate their own participation, there were often affective signs of invitation to caregivers requesting
inclusion in the musical activities. The spirit of initiation was observed in delayed active participation of the youngest

participants, who needed time to process the visual and aural cues before joining the communal activity. A plastic microphone
also engendered self-assignment - children's vocal exploration was observed 2-3 times each week for short periods, lasting a
minimum of 30 seconds.
Self-correction was observable in engagement with the accessible rule-based song, peek-a-boo. Consisting of four 4-beat
phrases, the rule is to wait until the 3rd phrase to "pull down the scarf, " followed by "Peek-a-boo!" sung in higher pitch with
exaggerated contour. Several instances on the tape showed children starting to pull the scarf down prematurely, then stopping
and waiting for the musically appropriate moment. Children also demonstrated self-correcting behaviors as they categorized
instruments (bells and maracas) to be stored in separate containers.
Deliberate gesture.
Peek-a-boo was characterized by a very controlled and deliberate quality of gesture. Simple hand motions were added to all the
songs performed together - these provided an important means for children without language to participate and demonstrate
their formal understanding. One example was the "Five Little Ducks" song, which has a recurring phrase "Quack, quack, quack,
quack" represented with the thumb touching the four fingers grouped together. Another chant had four specific gestures
representing four phrases; pointing the index finger and shaking it to the beat was performed in an especially deliberate way.
The rhythmic quality of these activities generated the deliberateness; similar intentional, animated movement was witnessed in
the playing of the instruments and in varied and abundant responses to recorded music.
Anticipation, expansion, and extension.
The few instances of observed anticipation were related to songs for which children could demonstrate the structure, as
described above. More common were expansions of the activities, most often incorporating exaggerated movement utilizing the
whole body, which was not observable at the onset of the activity. Extensions of the presented materials took on two forms:
behaviors occurring immediately after the activity's completion, and a type of delayed imitation that occurred late within the
music session. An example of immediate extension was increased vocalizing after listening and moving to a recording - this
was quite common and was not limited to recordings of vocal music. Delayed imitation usually involved an activity we had
done together early in the session which was reprised by a child putting a scarf over his/her head or handing a book to the
researcher to sing.
Awareness of others.
Adult awareness was a salient feature of the infant music-making environment. Children used caregivers to provide security and
approval for participation; however, there were individual differences in the degree to which these interactions were needed.
One 17-month-old, who had the highest number of self-initiating behaviors, tended to look only occasionally to the adults for
recognition of his participation. A 13-month-old needed consistent physical interaction with the caregivers as a part of his
experience, taking a scarf from the pile then running back to a caregiver for assistance in his participation with peek-a-boo.
Children also looked to the adults as models, especially with instrument playing and movement.
These young children used the adults in their environment to elevate the emotional aspect of musical experience - while dancing
in the caregivers' arms they seemed blissfully content. Awareness of peers was less consistent: Infants seemed most aware
during the most familiar games and songs, vicariously enjoying the experience of peek-a-boo, for example. They were also
aware of peers when vying for both researcher and caregiver attention, and when issues of object ownership came to the fore.
Group 2
Self-initiated musical experiences in the toddler group were affected by the increased language capacities and physical mastery
of the children. They used objects to initiate their own musical behaviors, and would also verbally request specific adult-guided
activities. Children were assertive in their initiating, and were involved in prolonged focused episodes. One 28-month-old sang
into the toy microphone for over 2 minutes; the content material sounded like a combination of improvised song and familiar
religious chant. Self-correction was expressed infrequently both verbally and behaviorally in movement representations of the
rule-bound repertoire and categorizing of musical instruments.
Deliberate gesture.
Toddlers' use of hand motions and purposeful playing of instruments was associated with observations of focus and
involvement. Their increased mobility allowed for demonstration of deliberate qualities in their whole body responses to
musical games and recordings. One girl had particularly animated, deliberate locomotor movements in direct response to both
recorded and live music; she was the most prominently featured participant in caregiver and parent reports.
Anticipation was consistently observable in children's movement and singing behaviors. Examples included the "Bell Horses"

song that ends with children standing up and running around the room, then returning to a sitting position. Toddlers showed
their focused involvement by responding at the beginning of the musical phrase that prompts the movement, rather than waiting
until its completion. In another context, the presence of maracas provoked a boy to anticipate a song by singing an excerpt.
Expansions of the presented material were frequently observable, and were defined by children's using the material in new and
unexpected ways. Examples included the addition of body movement to chants and songs, singing responses to repertoire
previously only sung by adults, and using musical objects, such as the microphone to heighten the experience. Extensions of the
musical materials were apparent during the course of the session, for example, when children would say "Let's do it again."
These extensions were in immediate temporal proximity to the activity. Caregivers and parents reported additional extensions
during the week:
B. came into the toddler room at 1:00 today. All the kids were running in circles and squealing. B. saw them and
started singing "Bell Horses."
[E. took] two towers made of duplo bricks, put it on his face and said "peek-a-boo!" then started humming the
peek-a-boo song.
There were individual differences in the awareness of adults - some children preferred intimate, shared experiences with the
caregivers, and some were more independent or preferred interactions with peers. In general, there was much peer awareness
and modeling of both peers and adults. The free movement to recorded music generated the most social interaction.
Group 3
Self-assignment in this context was most evident during the non-instructional period at the beginning of the class, when the
teacher was tuning individual violins.
During the teacher-directed, technique-based lessons, self-assignment appeared as an immediate result of self-correction, which
most often relied on awareness of others. In this intense group learning environment children seemed more responsive to the
teacher's work with fellow students than to direct interactions. Self-correction was always associated with observed focused
involvement:
As they are getting closer to REALLY playing the violin, J's attention becomes very focused. He is actively
engaged, answering teacher's questions and clearly anticipating. During this newly energized time, he
self-corrects, adjusting his "stop sign" hand in preparation for playing.
Deliberate gesture.
Deliberate gesture was especially evident in this group while they were engaged with their violins, although individuals were
inconsistent in the degree to which they exhibited this characteristic. Differences were associated with focused attention; when
children seemed distracted or dissociated from the task, the quality of their gestures was nonchalant and perfunctory.
There were few examples of children's taking owne7rship of the musical material during the two lessons: the most frequent
strategy observed was anticipation. Of the several instances recorded, all seemed related to past experiences with technical
performance procedures such as placing the instrument in position. One example of expansion noted by all coders was when a
student exaggerated a breathing exercise. No examples of extensions were observed.
Adult awareness was perhaps the most consistent indicator observed; the teacher was the focus of attention. Interactions did not
visibly facilitate flow directly, and often were associated with noticeable negative affect and inability to focus. Awareness of
interactions between teacher and peers often led to self-correction - the indirect nature of the intervention seemed to allow
individuals the opportunity to feel in control of their own learning. Peer awareness was observed to be useful in providing
necessary information regarding skill; thereby assisting in the monitoring of flow experience.
Group 4
Children in this group demonstrated self-assignment during the period of time prior to the onset of instruction, most notably in
the exploration of body movement, activity commensurate with the class content. Several instances of problem finding
behaviors were observed as children sought new melodic and rhythmic patterns beyond those delivered by the teacher.
Self-correction was notably associated with peer awareness when experience was less than optimal, and was internally

generated when children were focused and involved in the activity.
Deliberate gesture.
Deliberate gesture proved to be very salient in this movement-oriented context and was most consistent in the most highly
skilled participant. Inconsistencies within individuals were associated with variation in observed involvement and focus;
concentration was most evident when gestures were deliberate. During singing activities, the quality of gesture was observably
linked to skill level: Children who were most proficient used deliberate gestures; children less proficient showed improvement
when the quality of gesture was made more purposeful.
Anticipation was apparent when children raised their hands to answer questions and verbally hypothesized about what the
teacher was writing. During movement activities, anticipation was evidenced through increased control in demonstrating metric
awareness. Expansion, in this context encouraged by the teacher, was common; children regularly offered new movements to
express musical ideas. A disposition toward this flow-monitoring strategy was witnessed in one student who consistently
expanded activities to keep himself sufficiently challenged. Few instances of extensions were observed.
Adult awareness was observed to be both flow-facilitating and inhibiting. Focused experiences were supported by teacher's
musical cues; verbal interjections seemed to detract from deep involvement. One highly skilled child was keenly focused on the
teacher for much of the class, apparently expecting to be sufficiently challenged. This same child seemed to have her flow
experience interrupted when the teacher praised her movement: prior to the teacher's comment, movement was focused and
internally generated, afterward, it was much more self-conscious. During one episode of tactile modeling by the teaching
assistant, another child was notably resistant - it was clear the intervention was not invited.
Peer awareness was most noticeable when children who were exhibiting less skill looked to classmates for ideas. Because the
teacher did not assume the role of model, children were highly imitative of one another.
Discussion
Results indicate that flow-related behaviors are observable, supporting findings in the original study (Custodero, 1998).
Self-assignment, self-correction, deliberate gesture, anticipation, expansion, and extension were all positively associated with
skill, focus, and involvement. Similar pervasive influences of social interaction were observed: Children looked to adults to
help define challenge in tasks; they looked to peers for imitation when the challenge level was perceived as inappropriate. Data
in this exploratory study revealed issues of development, environment, and individual temperament.
McCall's (1995) cautionary note that mastery motivation cannot occur before the infant distinguishes between cause and effect
was supported in the observations of infants' musical self-assignment and self-correction - infants over 12 months showed more
of these behaviors. Toddlers exhibited an abundance of self-assignment behaviors; self-correction, however, was superceded by
the compelling nature of anticipating the musical cues, especially those associated with movement. The within-session delayed
participation observed in the infant group was not apparent in the toddler group, which speaks to the perception of structured
activity in these older children.
Self-initiated activity in the presence of adults was less observable in Groups 3-4. It is believed that dispositions toward
appropriate behavior in learning environments had been established by the schooling experience of these children, resulting in
the waning of flow in the context of formal instruction suggested by Csikszentmihalyi (1993).
A deliberate quality of gesture was the most universally applicable indicator of flow by virtue of its representation of focused
involvement. Older infants and toddlers were very deliberate in their rhythmic movement responses to live and recorded music;
older children may have internalized this seemingly inherent response. The manipulation of musical instruments provided an
important means for observing quality of gesture for all ages, this is supported in the original study. Especially noteworthy was
the similarity between the use of hand gesture by the infants and toddlers as a means to express song repertoire and the elevated
singing skill experienced by the older children in the Dalcroze group who used deliberate hand gestures to assist in pitch
accuracy. Goldin-Meadow (2000) hypothesizes that gesture "has the potential to be involved in innately driven as well as
non-innately driven learning - that is, to be a general mechanism of cognitive growth" (p. 237).
The transformational behaviors of anticipation, expansion, and extension were highly influenced by both developmental and
environmental factors. Anticipation was not consistently apparent in the infants; there were notable increases in its frequency in
toddlers, and even more in the preschoolers from the original study. Since this behavior relies upon the immediate response to
instructional presentation, it is more dependent upon the quality of interaction between teacher and student.
Expansion was evident in the younger groups in spontaneous movement responses; Group 4 also exhibited expansion in their

purposeful variations of movement activities, giving further credence to the cognitive significance of gesture cited above. As
with the infants, these expansions seemed to indicate personal relevancy, as they were not universal to the group either in style
or substance - a finding supported in the original study. The dramatic differences between the two school-age groups in terms of
transformational behaviors also suggest the pervasive role of environment.
Extension followed what looks to be a developmental trajectory: for infants it often involved delayed imitation while still in the
context of the music session; for toddlers and preschoolers (in the original study), extensions occurred both immediately and
removed from the instructional context. For the older children in this study, extensions were not apparent in the class sessions
and may have taken place in the form of practicing or in play.
In all groups, the distinction between invited and uninvited intervention was clearly evident and contributed to the child's ability
to demonstrate musical cognition. Awareness of peers seems to be manifested in vicarious enjoyment by infants; physical
maturation of toddlers allows for peer modeling, which was observed in the preschool and school-age groups as well.
Differences in the need for adult interaction speak to the influence of temperament in individual learning strategies, a theory
supported by cognitive developmental psychology (Rogoff, 1990; Siegler, 1996). A related finding was the indication of
individual dispositions toward a single transformational behavior. Longitudinal case studies of participants from the original
study (Custodero, 2000) revealed that preschool music flow profiles were predicative of music learning patterns and
experiences as preadolescents.
Implications
The exploratory nature of the investigation elicited many implications regarding the flow model's relevancy for understanding
music cognition. Confirmatory studies of the observational model using both qualitative and quantitative designs need to be
implemented in a variety of music learning environments. Longitudinal case studies are needed to identify changes in flow over
time within individuals. In addition to issues of generalizability, context, and development, findings for individual flow
indicators suggested specific areas of inquiry to be addressed in future research.
Examples of self-assignment and self-correction revealed the musical "problems" most compelling to participants; these
behaviors were surprisingly infrequent in the school-age groups. Studies of children's musical lives outside the classroom
(e.g.,Campbell, 1998) show evidence of much self-assignment. Future research might investigate adult-free individual or group
practice environments, to see if and how children play (and exhibit musical understanding through self-correction) with material
delivered during instruction.
The use of gesture provided evidence of cognition in terms of formal structure, rhythm, and pitch accuracy. Existing studies on
deliberateness and gesture in music focus on adult performance (e.g., Lehmann & Ericsson, 1997; Skadsem, 1997); future
examinations of relationships between qualities of gesture and musical skill in children are needed.
Anticipations, expansions, and extensions of presented activities provided clues to the depth of understanding of musical
concepts and skills, and suggested developmental, environmental, and disposition factors toward music learning strategies. The
lack of anticipation in the school-age groups may have been more environmentally induced - future research will involve
comparisons between many music teaching contexts to see if these behaviors are generalizably relevant to older children.
Future research will include data on participants' lives outside of the class sessions to see if and how musical skills are extended
past the boundaries of the instructional setting.
Variation in the use and reaction to social interactions made clear the situated quality of cognition (Rogoff, 1990). Differences
in the need for adult interaction are compelling and need to be investigated through the case study approach of individual
children over time as previously cited. In all groups, the distinction between invited and uninvited intervention was clearly
evident and contributed to the child's ability to demonstrate musical cognition - a finding that has implications for both research
and practice contexts.
Observing the self-initiated, social, and transformational indicators of flow experience provides a model for witnessing musical
cognition in the context of authentic engagement. If one imbues the idea of enculturation (Sloboda, 1985) with personal agency
by acknowledging children's inherent drive to seek out appropriately challenging experiences, this contextual framework may
provide valuable information regarding the interface between the natural learning style of the young child and musical training.

References
Bamberger, J. (1999). Learning from the children we teach. Bulletin of the Council for Research in Music Education, 142,
48-74.
Barrett, K. C., Morgan, G. A., & Maslin-Cole, C. (1993). Three studies on the development of mastery motivation in infancy
and toddlerhood. In D. Messer (Ed.), Mastery motivation in early childhood: development, measurement, and social processes

(pp. 83-108). New York: Routledge.
Bloom, L. (1993). The transition from infancy to language: Acquiring the power of expression. New York: Cambridge
University Press.
Campbell, P. S. (1998). Songs in their heads: Music and its meaning in children's lives. New York: Oxford.
Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass.
Csikszentmihalyi, M. (1982). Learning, "flow," and happiness. In R. Gross (Ed.), Invitation to lifelong learning (pp. 166-187).
Chicago: Follett.
Csikszentmihalyi, M. (1993). The evolving self. New York: Harpers Collins.
Csikszentmihalyi, M. C., I. S. (1988). Optimal experience: Psychological studies of flow in consciousness. New York:
Cambridge University Press.
Csikszentmihalyi, M. L., R. (1987). Validity and reliability of the experience-sampling method. The Journal of Nervous and
Mental Disease, 175(7), 526-536.
Custodero, L. A. (1998). Observing flow in young children's music learning. General Music Today, 12(1), 21-27.
Custodero, L. A. (2000). Engagement and interaction: a multiple perspective investigation of challenge in children's music
learning. New Orleans, LA: Annual Meeting of the American Educational Research Association.
Duckworth, E. (1996). The having of wonderful ideas and other essays on teaching and learning. (2nd ed.). New York:
Teachers College Press.
Feldman, D. H. (1994). Beyond universals in cognitive development. (2nd ed.). Norwood, NJ: Ablex.
Goldin-Meadow, S. (2000). Beyond words: The importance of gesture to researchers and learners. Child Development, 71(1),
231-239.
Lehmann, A. C., & Ericsson, K. A. (1997). Research on expert performance and deliberate practice: Implications for the
education of amateur musicians and music students. Psychomusicology, 16, 40-58.
McCall, R. (1995). On definitions and measures of mastery motivation. In R. H. MacTurk & G. A. Morgan (Eds.), Mastery
motivation: origins, conceptualizations, and applications (pp. 273-292). Norwood, NJ: Ablex.
Rogoff, B. (1990). Apprenticeship in thinking: cognitive development in social context. New York: Oxford.
Siegler, R. S. (1996). Emerging minds: The process of change in children's thinking. New York: Oxford University Press.
Skadsem, J. A. (1997). Recognition of intensity contrasts in the gestures of beginning conductors. Journal of Research in Music
Education, 45, 509-520.
Sloboda, J. A. (1985). The musical mind: The cognitive psychology of music. New York: Oxford University Press.
Back to index

HOW TWO VOICES MAKE A WHOLE:
Proceedings paper
How Two Voices Make a Whole:

Contrapuntal Competition for Attention in Human and Machine Pulse-Finding
Joel Snyder Petri Toiviainen
Department of Psychology Department of Music
Cornell University University of Jyväskylä
While listening to music, people often form a simplified representation of the temporal structure in the music. This simplified representation often takes the
form of an approximately isochronous series of pulses with inter-pulse intervals near 600 ms (Clarke, 1999; Fraisse, 1982; Parncutt, 1994; Snyder &
Krumhansl, 1999; van Noorden & Moelants, 1999). These pulses provide not only the basis of our perceptual representations of rhythm and meter, but also may
underlie our ability to produce rhythmically appropriate actions (Schubotz, Friederici, & Cramon, 2000) such as playing music in an ensemble, dancing in time
with music, or simply tapping our feet or fingers to the perceived pulse.
While the output of the human pulse-finding system is relatively simple (i.e., a quasi-isochronous series impulses), the computational ability to reduce the
temporal complexity of music to a pattern of pulses is extremely impressive. In this sense, pulse-finding is analogous to an equally-impressive ability to reduce
improvised melodies to a simple theme (Large, Palmer, & Pollack, 1995). Whereas in melodic reduction, subjects identify which notes are most essential, in
pulse-finding subjects must find the primary durational unit, the period, and its onset position with respect to musical events, the phase. Furthermore, the
representation of pulse is dynamic in that the period and phase can change incrementally to compensate for small-scale timing perturbations (Semjen, Vorberg,
& Schulze, 1998; Thaut, Miller, & Schauer, 1998), and categorically from one global state to another (e.g., from a quarter-note period to an eighth-note period,
or from a down-beat phase to an up-beat phase).
In many polyphonic musical styles, certain instruments are designated to provide rhythmic and harmonic structure supporting the melody. Examples include the
rhythm section in jazz, parts of the orchestra during a symphony or concerto, and the left-hand or thumb in many styles of piano and guitar music, respectively.
Therefore, in a recent study on pulse-finding in piano ragtime (Snyder & Krumhansl, 1999), it was not surprising that removing the left-hand part led to
degraded performance in pulse-finding. However, the situation is potentially different in contrapuntal music in which no one voice constantly provides the
rhythmic structure. This democratic quality of contrapuntal music, gives rise to a potential problem in identifying how the perception of pulse arises. By simply
examining scores of such music, it is not always apparent what information is available for pulse-finding. It seems likely that pulse-finding in contrapuntal
music is generally more difficult than music with a rhythm section because of the attentional demands associated with searching for a voice carrying pulse cues.
Contrapuntal music raises other interesting questions for students of pulse-finding, such as the extent to which one voice will dominate attention at any given
point in time, what musical features determine the voice to which attention is directed, and whether the perception of pulse can be influenced by multiple voices
file:///g|/Mon/Snyder.htm (1 of 10) [18/07/2000 00:32:43]
at once.
In addition to behavioral studies of pulse-finding, many researchers have proposed computational models to account for this ability. Before describing these
models, it is useful to keep in mind what properties we desire in a model of pulse-finding. Firstly, a model of pulse-finding must exhibit the ability to find the
pulse that humans hear for a given excerpt of music, and in a similar amount time as humans. Secondly, the model must exhibit robust performance in the
presence of the normal timing deviations found in human musical performance (Palmer, 1997). Thirdly, the model should show instabilities of period and phase
when humans do. And lastly, before making any strong psychological or biological claims about the model, one must show that the model relies on similar
mechanisms as people. While this last test of a model is crucial from the standpoint of experimental psychology, we will provide preliminary data establishing
the mathematical validity of a model of pulse-finding, using contrapuntal music.
Currently, the predominant class of models used for musical pulse-finding rely on oscillatory units that entrain in a 1:1 fashion to periodic components in
musical stimuli (Gasser, Eck, & Port, 1999; Large, 1994; Large & Jones, 1999; Scheirer, 1998; Toiviainen, 1998; for a detailed examination of rule-based
models though, see Desain & Honing (1999)). Oscillatory modelers have proposed adaptive oscillation as a mechanism for dynamically tracking the pulse in
real musical performances. In other words, the oscillatory units in these models are able to adjust the period and phase to compensate for temporal deviations
from isochrony in the music. In addition, they give a simple quasi-periodic response to complex musical patterns, corresponding well to the behavioral output of
human pulse-finding. This is attractive because the proposed oscillatory mechanism could not only give rise to the sensation of pulse but also could directly
drive rhythmic sensory-motor output such as tapping. Oscillatory models bear important similarities to an influential hypothesis of rhythmic cognition,
Dynamic Attending Theory (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). This theory assumes that internal attentional rhythms underlie our ability
to track and find structure in time-varying patterns. Such a mechanism may be used to represent time intervals, perceive metrical structure in speech and music,
and produce rhythmic motor patterns.
Past work shows that adaptive oscillator models can dynamically track musical and non-musical patterns (Gasser, Eck, & Port, 1999; Large, 1994; Large &
Jones, 1999; Scheirer, 1998; Toiviainen, 1998). However, experimentalists have yet to thoroughly compare the models to human performance on musical
pulse-finding. Therefore, the first goal of this study is to collect new behavioral data on human pulse-finding and to determine whether an oscillator model of
pulse performs similarly to humans on this task. The particular model we test uses oscillating units similar to the one described by Toiviainen (1998; see
Toiviainen & Snyder, 2000 in these proceedings for new developments).
For the behavioral experiment and for testing the model, we chose a single contrapuntal organ duet composed by J.S. Bach (1685-1750), BWV 805. It is the
final work in a series of four organ duets for a single performer. More specifically, the left-hand of the organist plays the bottom voice, while the right-hand
plays the top voice. We refer to the two imitative voices as the right-hand part (RH), the left-hand part (LH), and to the two parts together as both. In the
experiment we present the RH, the LH, and both versions of short excerpts to subjects who tap the perceived pulse of the music. We then attempt to model
human performance with the adaptive oscillator system described above, and analyze musical features that possibly influence performance.
EXPERIMENT
For this experiment, we selected 8 eight-measure excerpts from a MIDI version of BWV 805. For each excerpt, LH, RH, and both versions were presented to
subjects on separate tapping trials. We will focus on determining the relative influence of the two voices on performance, whether starting position of the
excerpts influences tapping performance, and whether spontaneous tapping tempo predicts the period of musical tapping.
Method
Subjects

Fourteen musically-experienced Cornell University undergraduates (M =5, F = 9) with a mean age of 19.8 years have participated so far, for extra-credit in
psychology courses. Our criterion for inclusion in the experiment was at least eight years experience singing or playing an instrument, at least some of which
must have been in the two years prior to testing. The mean years playing music was 12.4 and the mean familiarity with BWV 805 on a scale from 1 to 7 was
1.6. No subject reported any hearing loss or motor disorders.
Materials and Stimuli
Subjects were tested using a software-based MIDI system running on a Power Macintosh 350 MHz G4 machine. We prepared stimuli using the Digital
Performer sequencing software. During the experiment, the experimenter operated a MAX interface that stored presentation orders, played the stimuli, and
collected tapping responses from a Yamaha KX88 keyboard while subjects listened to stimuli and percussive auditory feedback with AKG K-141 headphones.
The MAX interface sent MIDI information internally to Unity DS-1 software, converting the MIDI code to audio information. The computer sent audio
information to a Yamaha 1204 mixing console that connected to the headphones.
BWV 805 is in 2/2 meter and the tempo was set to an inter-beat interval of 800 ms or a metronome marking of 75 half-note beats per minute. The piece had a
church organ timbre selected from the Unity DS-1 timbre library, and was metronomic such that the inter-beat interval is constant throughout the piece. We
chose four pairs of excerpts from the piece such that between paired excerpts, the LH and RH parts play similar patterns. Half of the excerpts begin on the first
eighth-note of the measure and the other half begin on the third eighth-note of the measure. All excerpt pairs have the same starting point. Three versions of
each excerpt were created, one with both voices (stimuli 1-8), one with only the LH (stimuli 9-16), and one with only the RH (stimuli 17-24). Procedure
Four orders of the twenty-four stimuli were randomly selected without replacement. To counterbalance for order of presentation, we created an additional four
orders by taking the reverses of the four selected orders. Each subject tapped to either the four selected orders or their reverses. We constructed four
latin-squares, yielding sixteen orders for presenting the four blocks to subjects. The first two latin-squares are for the first four random orders and the second
two latin-squares are for the reversed orders. The final two stimuli of each block were also presented in the beginning of the block and were not used in any
analyses. Thus, each subject tapped to each stimulus once in each of four blocks, in addition to two additional un-analyzed stimuli at the beginning of each
block. Between each stimulus presentation, monotonic piano tones played with randomly-selected inter-onset intervals, meant to function as memory washouts.
The time interval between the end of the piano tones and each new stimulus presentation was 1600-3200 ms long.
For each subject, testing occurred in a single one-hour session. First, each subject tapped spontaneously for two 30 sec trials to determine preferred tempo.
Second, each subject tapped to two sets of practice stimuli, each set consisting of a LH, RH, and both versions of an excerpt from another organ duet by Bach,
BWV 803. These trials were as described for the experimental blocks. Third, subjects tapped to the four blocks of BWV 805 stimuli. For the musical tapping
trials, the instructions told each subject to begin tapping with the dominant hand on the keyboard after finding the beat mentally, and to tap consistently through
the end of the music. Lastly, each subject filled out a questionnaire pertaining to past music and dance experience.
We analyzed tapping behavior for each trial individually using several measures, respectively indexing tapping period and phase, switches in period and phase,
and the time it takes to start tapping periodically. We calculated all performance measures off-line, using only the tap onset time. For graphical explanations of
the tapping period and phase, see Figure 1. The first step in analyzing these data was to determine whether the period associated with each tap corresponded to a
period of 2, 4, or 8 eighth-notes. If a tap period was 100 ms or more from all of these periods, the tap was aperiodic. Next, for each periodic tap, we determined
the phase as follows. For a tap period of 2 eighth-notes, if the tap phase was an eighth-note metrical position of 1, 3, 5, or 7, the tap phase was 1. Otherwise, if
the period was 2 and the tap position was 2, 4, 6, or 8, the phase was 2. Similarly, if the tap period was 4 eighth-notes, tapping at positions 1 or 5 counted as a
phase of 1, tapping at positions 2 or 6 was a phase of 2, tapping at positions 3 or 7 was a phase of 3, and tapping at positions 4 or 8 was a phase of 4. Lastly for
tap periods of 8, the tap phase simply corresponded to the eighth-note metrical position of the tap. Together, the tap period and phase defined the mode of
tapping, as shown in Figure 1. For example, mode 2_1 implies tapping with a period of 2 eighth-notes and with a phase of 1. For mode of tapping, results will
be expressed in proportions of total taps for each mode.
Period switches occurred when the current tap period did not match the previous tap period, but did match the following tap period. Period switches included
switches to or from aperiodic tapping. A phase switch occurred when the previous tap period matched the following tap period, but the current tap phase did not
match the previous tap phase. This means that a phase switch could only occur when the period did not change. Finally, beats to start tapping (BST) was the
time in ms from the first note to the first aperiodic tap, divided by the eighth-note duration, 200 ms.
Figure 1. Three schematic measures depicted at the eighth-note level with 2/2 meter. Taps are indicated by bold, underlined metrical positions. Fourteen
possible periodic modes of tapping are shown, each characterized by a period (2, 4 or 8 eighth-notes) and a phase. Aperiodic tapping is tapping with a period
other than 2, 4, or 8 eighth-notes.
The top half of the Appendix displays the average value on each variable, for each stimulus, across all subjects. The three periodic modes in which subjects
mostly tapped are the first three columns in the Appendix. These modes are 2_1, 4_1, and 4_3. For each stimulus, subjects mostly tapped in mode 4_1.
However, for these data, it is important to note differences between subjects. Seven of the subjects tapped over 85% of the time with a mode of 4_1, while three
subjects tapped over 85% of the time with a mode of 2_1. The remaining three subjects tapped around 50% of the time with a mode of 4_1 and between 20-50%
of the time with a mode of 4_3. In other words, for all subjects, the tapping period was extremely consistent over the course of the experiment. Most subjects
consistently tapped with a period of four eighth-notes, while the remaining subjects tapped with a period of 2 eighth-notes. In general, subjects tapped with
similar modes across the three excerpt versions. None of the main effects of version were significant for tapping to the three predominant modes or for tapping
aperiodically. This indicates that both the RH and LH have sufficient cues to pulse, and that neither seems to play a consistently dominant role in pulse-finding.
Additionally, there was a non-significant tendency for subjects to tap more with a mode of 4_1 for excerpts that started on the first beat of the measure, and vice
versa for 4_3. For excerpts that began on the first beat of the measure, subjects tapped 68% of the time in 4_1 and 5% of the time in 4_3, whereas for excerpts
that began on the third beat of the measure, subjects tapped 57% of the time in 4_1 and 16% of the time in 4_3.
Puzzlingly, subjects' mean spontaneous tapping period did not correlate with the mean proportion of taps with a period of four eighth-notes for each stimulus,
r(334) = -.03, or with any of the specific tapping mode proportions. The mean spontaneous tapping period across subjects ranged from 554-991 ms inter-tap
interval (ITI) with a mean of 754 ms, notably higher than the conventional 600 ms value cited in the literature for preferred tempo (Clarke, 1999; Fraisse, 1982;
Parncutt, 1994; van Noorden & Moelants, 1999).
Subjects switched period .15 times per trial and switched phase .30 times per trial. Slightly fewer switches in period occurred for the both versions though the
effect of version was not significant, p = .14. However the effect of version for switches in phase was significant, with fewer switches occurring for the both
versions, F(14,182) = 5.29, p < .025. For BST, subjects started tapping periodically after around 8 eighth-notes, or one measure. This shows extremely fast
synchronization, considering that one measure is only two tapping cycles for the subjects who tapped with a period of four eighth-notes. BST differed between
versions, F(2,26) = 9.02, p < .005, with a mean of 7.8 for both versions and over 9.0 for the LH and RH versions.
To summarize the findings, subjects showed generally better performance when both voices were present, although the effects were generally not large.
Importantly, the mode of tapping was not strongly influenced by whether the LH, RH, or both voices were present. This is in contrast to previous findings in
ragtime, in which the LH part seemed to play a predominant role in pulse-finding (Snyder & Krumhansl, 1999). In the present case however, the findings do not
mean that for each excerpt, both voices had an equal role in pulse-finding. Instead, it is possible that the voice that most influenced performance simply differed
between the different excerpts. This is a point for future analyses. The most popular mode of tapping for subjects was 4_1, indicating a period of four
eighth-notes (tapping twice per measure), with a phase that includes the first and fifth eighth-note positions in the measure. Thus, a clear preferred mode of
tapping emerged which corresponded very well to the notated meter. Subjects tended to tap more in mode 4_1 for excerpts that started on the first rather than
the third eighth-note beat. Lastly, there was no relationship between spontaneous tapping rate and the period of musical tapping.
MODEL
To test the general performance of the model against the human data, we tested the model on each of the twenty-four stimuli from BWV 805. We only give a
brief description of the model here. Toiviainen (1998) describes the basic oscillatory unit, but for a detailed explanation of the current model, see Toiviainen &
Snyder (in these proceedings). The present model consists of a bank of oscillators that become active after the beginning of the music. Each oscillator
phase-locks to the music with a specific mode, characterized by its period and phase with respect to the music. Each oscillator is also characterized by its
resonance strength, or its activity value. The oscillator with the highest resonance value, the winner, drives the output of the model, which is a series of discrete
pulses, analogous to human taps. Nevertheless, the resonance values of the other active oscillators continue to evolve over the entire musical excerpt. In this
manner, a challenging oscillator can overtake the current winner. However, this only occurs if the challenger's resonance value exceeds the winner's resonance
value by a certain percent. The model is most influenced by the presence of notes with relatively long durations. In other words, if a particular metrical position
contains long notes, the model is likely to tap with a period and phase defined by this metrical position. Therefore, the model is not very influenced by
ornamentations such as trills or appoggiatura.
Because the model is deterministic, we only tested the model once on each of the twenty-four stimuli. The behavior of the model, for each excerpt, is shown in
the bottom half of the Appendix for all performance variables. Below this are correlation coefficients between the human and model performance on each
variable across the twenty-four stimuli. Currently, the pattern of performance for the model across the stimuli does not match the human performance well at
all. None of the correlations are close to significance. For mode of tapping, this is clearly because the model tends to tap with a mode of 4_3, while the human
subjects more often tap with a mode of 4_1.
In terms of absolute performance levels on the other measures, the model compares reasonably well to humans. The model taps slightly more aperiodically than
humans, with 8% of taps for the model versus 2% of taps for humans. In contrast, humans switch tapping period and phase more than the model, with values of
.15 and .30 switches in period and phase for humans, and values of .08 and .02 for the model. However, these values all indicate stable tapping performance.
For beats to start tapping (BST), mean performance is very close between humans and the model, with 8.82 BST for humans, and 8.25 for the model.
MUSIC ANALYSIS
To determine musical cues to pulse that were available to both humans and the model in the stimuli, we analyzed the music on a number of dimensions on the
eighth-note level, thus ignoring ornamentation. Our approach was to focus on building a measure of accent based on two dimensions, the number of note onsets
and their duration (inter-note interval within a voice). Following Parncutt's (1994) model, we define durational accent, Ad, at time t as follows:
Ad(t) = (1 - e-d(t)/τ)i, (1)
where d is the inter-note interval between a note at time t and the next note at t+1 in a particular voice. The time constant, τ, is 500 ms and corresponds to a
saturation duration and the accent index, i, is 2 ms corresponding to the minimum discriminable IOI difference. These parameter values are those used by
Parncutt (1994). We add the durational accent, Ad(t), to onset presence, O(t):
A(t) = Ad(t) + O(t), (2)
to obtain the total accent value, A(t). O(t) equals 0 for absence and 1 for presence of note onsets for each eighth-note position. A(t) was calculated at each t, for
each voice separately. A(t) can range from 0 to 2 for each voice independently because both Ad(t) and O(t) range between 0 and 1. To obtain a measure for both
voices together, we took the mean of the values for the two voices at each t. Because this particular calculation of accent is not experimentally verified, we use
it simply as an indication of note events and their duration across two different musical dimensions and across the two contrapuntal voices. We apply two types
of analysis to this measure of accent: auto-correlations to determine periodicity information in the stimuli, and average value per eighth-note metrical position to
determine phase information in the stimuli.
Auto-correlation, displayed in an auto-correlogram, refers to correlating a series of numbers with itself at different relative phase lags (or phase shifts). The size
of the correlation coefficient at a particular phase lag indicates the strength of periodicities in the series corresponding to that phase lag. Based on the 2/2 meter
and on subjects' performance, we would expect to find strong correlations at lags of 2, 4, and 8 eighth-notes, corresponding to quarter-note, half-note, and
whole-note periods, respectively.
As shown on the left of Figure 2, across all excerpts, the strongest peak in the auto-correlogram is at a period of 8 eighth-notes, or the measure length. The next
highest peak is at a period of 2 eighth-notes, or the quarter note duration. Interestingly, these peaks are both larger than the peak at 4 eighth-notes, which is the
period with which most subjects tapped. One explanation for this is that subjects are unlikely to tap at a whole-note period because this is 1600 ms, near the
upper-bound of synchronization. Similarly, tapping at the quarter-note period, 400 ms, is close to the lower-bound of synchronization. Therefore, tapping with
an 800 ms period, the half-note, may simply be the most comfortable tapping period that fits with the metrical structure.
For the average accent per metrical position (shown on the right of Figure 2), there are clear peaks at odd metrical positions, corresponding to relatively strong
positions. In particular, the strongest peak across excerpts occurs at positions 5 and 3. Strong peaks also appear on positions 1 and 7. Given this analysis, it is
perhaps not surprising that the model tapped predominantly at 4_3 because both positions 3 and 7 had a large number of note onsets and also had relatively long
duration notes. In addition, there were many trills on position 5, leading to lower weighting of this position by the model. In contrast to the model, subjects
predominantly tapped with a mode of 4_1, or at metrical positions 1 and 5.
Figure 2. On the left, the auto-correlogram of accent per eighth-note lag across the eight excerpts for the versions with both voices. On the right, the average
accent per eighth-note metrical position across the eight excerpts for the versions with both voices.
One possibly important difference between the model and humans is that the model does not take melodic information into account in determining which notes
are salient. Therefore, we calculated the number of melodic direction changes and melodic intervals per metrical position. We focus on phase cues here because
the model and humans did not differ as much in tapping period. The top of Table 1 shows the mean number of melodic direction changes per metrical position,
while the bottom shows the mean number of each melodic interval size per metrical position. We calculated both measures across all excerpts for the both voice
versions. Clearly, more melodic direction changes occur at the first metrical position than at other metrical positions. For the melodic intervals, there are more
semi-tone intervals on the first and fifth eighth-note metrical positions than at other metrical positions. This pattern does not appear for other interval sizes.
Thus, these melodic cues could be influencing subjects' tapping phase. On the other hand, these cues do not influence the model's behavior because it does not
take melodic information into account in determining the importance of note events.
Note. Mean values were calculated across the original (both voices) eight excerpts from BWV 805, consisting of 64 total measures, and indicate proportion of
time at each metrical position a particular event occurred across the two voices.
DISCUSSION
In summary, we tested human subjects and an oscillatory model of pulse on stimuli derived from a two-voice contrapuntal work by Bach, BWV 805. Subjects
tapped with a similar phase and period, usually with a mode of 4_1, regardless of whether only one of the voices or both voices were present. This suggests that
period and phase information is present in both voices throughout the excerpts we used. The model likely taps on mode 4_3, or anti-phase to humans, because
these positions often contain long notes, and because position five often contains trills, which the model does not weight highly due to their composition of
short duration notes.

A possibly important difference between the model and humans is that the model does not pay attention to melodic features in the music. In the excerpts we use,
there are often melodic direction changes on the first beat of the measure. In addition, there are often semitone intervals on the first and fifth eighth-notes of the
measure. While we cannot be sure these melodic cues influence humans' tapping phase, they are good candidates for explaining why the model and humans tap
at different phases from each other. Therefore, in this example of contrapuntal music, a higher prevalence of musical events at theoretically strong positions
may define the pulse, as is observed in other periods of western art music (Palmer & Krumhansl, 1990). This is in spite of the fact that no one voice is
designated as the accompaniment.
On other performance variables, such as switches in tapping mode and time to find the beat, subjects performed slightly better when both voices were present.
The model performed well on the excerpts, finding the pulse in a similar time as humans and showing good stability of tapping mode. These results suggest that
current models that rely on purely temporal information may not realistically model the information that human subjects use to find musical pulse. We hope to
develop the model further to take pitch information into account, and to more precisely determine aspects of the music that predict performance on the
individual stimuli. Thus, we are optimistic that the oscillatory model can provide a good account of human pulse-finding.
References
Clarke, E.F. (1999). Rhythm and timing in music. In Deutsch, D. (Ed.), Psychology of Music, 2nd edition. New York: Academic Press.
Desain, P. & Honing, H. (1999). Computational models of beat induction: The rule-based approach. Journal of New Music Research, 28, 29-42.
Fraisse, P. (1982). Rhythm and tempo. In Deutsch, D. (Ed.), Psychology of Music, 2nd edition. New York: Academic Press.
Gasser, M., Eck, D., & Port, R. (1999). Meter as mechanism: A neural network that learns metrical patterns. Connection Science, 11, 187-215.
Jones, M.R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323-355.
Jones, M.R. & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491.
Large, E.W. (1994). Dynamical representation of musical structure. Unpublished doctoral dissertation. The Ohio State University, Columbus, OH.
Large, E.W. & Jones, M.R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106, 119-159.
Large, E.W., Palmer, C., & Pollack, J.B. (1995). Reduced memory representations for music. Cognitive Science, 19, 53-96.
Palmer, C. & Krumhansl, C.L. (1990). Mental representations of musical meter. Journal of Experimental Psychology: Human Perception and
Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms. Music Perception, 11, 409-464.
Scheirer, E.D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society of America, 103, 588-601.
Schubotz, R.I., Friederici, A.D., & von Cramon, D.Y. (2000). Time perception and motor timing: A common cortical and subcortical basis
revealed by fMRI. NeuroImage, 11, 1-12.
Semjen, A., Vorberg, D., & Schulze, H.H. (1998). Getting synchronized with the metronome: Comparisons between phase and period correction.
Snyder, J. & Krumhansl, C.L. (1999). Cues to pulse-finding in piano ragtime music. Society for Music Perception and Cognition Abstracts.
Evanston, IL.
Thaut, M.H., Miller, R.A., & Schauer, L.M. (1998). Multiple sychronization strategies in rhythmic sensorimotor tasks: phase vs period correction.
Biological Cybernetics, 79, 241-250.
Toiviainen, P. (1998). An interactive MIDI accompanist. Computer Music Journal, 22, 63-75.
van Noorden, L. & Moelants, D. (1999). Resonance in the perception of musical pulse. Journal of New Music Research, 28, 43-66.
Back to index

When listening to music, one generally evaluates, or encodes each pitch in terms of its relationship to preceding pitches, and
Proceedings paper
Recent Discoveries in the Psychophysiology of Absolute Pitch

Laura A Bischoff
Shepherds College
When listening to music, one generally evaluates, or encodes each pitch in terms of its relationship to
preceding pitches, and to the standard pitches comprising the underlying scale. However, there are
individuals who do not make such relative assessments of tones. Such listeners can recognize and
identify tones in an "absolute" manner and are said to possess a skill commonly referred to as
"Absolute Pitch" (AP). The AP skill is generally defined as the ability to name or produce a note
without the use of a reference pitch. Subjects with AP appear to possess a pitch processing apparatus
that is quite distinct from that of subjects without AP. Considerable data suggest that there are
physiological differences between AP possessors and non-possessors. For instance, Chouard and
Sposetti (1991) found that absolute pitch possessors and non-possessors differ in cochlear echo
durations. Similarly, Schlaug, Jancke, Huang, and Steinmetz (1995) published MRI scans that showed
that for musicians who possess absolute pitch, the left planum temporale area of the brain is larger by
50-60% than the corresponding area in the right hemisphere. This difference in size is not found in
musicians who do not possess absolute pitch. Most recently, Baharloo, Johnston, Service, Gitschier,
and Freimer (1998) found evidence for a genetic component in the AP ability. After testing over 600
subjects, they found a strong concordance of the AP phenotype in families for which two or more
first-degree relatives with self-reported AP were tested. Likewise, Gregerson and Kumar (1996) found
a strong concordance of the AP phenotype in three sets of identical twins with AP. The authors claim
that these data indicate a substantial genetic component underlying the development of AP. Finally,
most relevant to the current study, Zatorre et al., (1998) have demonstrated that in subjects with AP
pitch judgments are made without activating the right frontal cortex.
The most common model to account for the manner in which individuals with and without AP process
tones proposes that musicians with AP possess a "fixed" or "stable" template in their memories in
which the musical names attached to given frequencies are permanently represented (Lockhead and
Byrd, 1981; Siegel, 1974; Ward and Burns, 1982). This template allows them to name tones without
the use of a reference pitch. Conversely, this model states that musically trained persons without the
absolute pitch ability, or musicians with "Relative Pitch" (RP), behave as though they have a movable
template, or internal frame of reference, which is permanently calibrated in terms of the pitch relations
among the notes of the musical scale. Hence, they have the ability to attach labels to intervals, or
distances between pitches, rather than isolated pitches themselves. The proposed model presumes that
these permanent tonal templates used by AP musicians reside in what is referred to as "working
memory" (Klein et al., 1984). Working memory is generally defined as the mental function involving
both the comparison of long-term memories to incoming sensory information, and the transference of
current sensory information from short-term memory to long-term memory. Musicians with AP do not
seem to employ working memory when naming pitches. In other words, the reference templates in the
AP subjects do not move in and out of working memory, and thus no updating of the context within
which tones are evaluated takes place. However, individuals with RP do appear to create and evaluate
temporary representations of either the sound of the current pitch or some other point of reference in
file:///g|/Mon/Bischof.htm (1 of 9) [18/07/2000 00:32:46]

their working memory. These representations then allow them to name the pitch by judging its
distance from the reference.
It is this aspect of the memory model that led Klein, Coles and Donchin (1984) to predict that the
behavior of an Event Related Brain Potential (ERP) known as the P300 would differ in AP and RP
subjects. Their prediction was derived from a theory of the P300, proposed by Donchin (1981, see
also Donchin and Coles, 1988). This theory views the P300 as a manifestation of the activity of
processing modules invoked if and when deviant stimuli indicate the need for a revision in the scheme
of the operating context that governs the subject's information processing.
Klein et al. (1984) reported a successful confirmation of the prediction. However, as will be seen
below, different investigators obtained conflicting results -- some succeeding in replicating Klein et
al. (1984), and others not. In this paper we describe a study that proposes a resolution to the conflict
by demonstrating that the strategy used for pitch judgment is a crucial element of the task, and by
noting that there is a systematic difference between subjects with AP who do, and do not, possess the
ability to make relative pitch judgments.
The ERP represents electrical activity in the brain, recorded between a pair of scalp electrodes, that
has a specific pattern of positive and negative voltages. These voltages have a consistent temporal
time course for each subject, electrode pair, and "eliciting event" (the term "event" is being used here
instead of "stimulus" since ERP's have been shown to occur even in the absence of a stimulus -- such
as when an expected stimulus is omitted during a sequence of stimuli, e.g. Sutton et al., 1967). This
temporal consistency allows for the extraction of the ERP from the ongoing EEG activity by means of
a process known as "signal averaging." This technique extracts all activity that is time-locked to a
specific eliciting event and effectively eliminates activity that has no consistent temporal relationship
to the event.
The ERP is generally viewed as a sequence of "components" (see Donchin, Ritter and McCullum,
1978, for a discussion of the concept of ERP component). Each component is conceptualized as a
manifestation of the activity of a voltage dipole, or an element of neural circuitry activated at a
specific interval following the triggering event. In other words, the components manifest specific
processing needs invoked by the interaction between the occurrence of the eliciting event, the
subject's task and strategy, and the circumstances of the presentation. Thus, ERP components make it
possible to monitor, in addition to the subject's overt responses to events, "covert" processes such as
the updating of memory, and the changing of future strategies.
The ERP component of interest in the present report is the P300, a positive going voltage change
(hence "P") with a latency of about 300 msec following the triggering event. This component was
discovered by Sutton and his colleagues (1965) and has since been intensively studied (see reviews by
Fabiani et al., 1987, Donchin, 1981, Donchin and Coles, 1988). It is most often elicited in the
so-called "Oddball Paradigm," in which a subject is presented with a sequence of events that can be
classified into two categories. The subject is then assigned a task that cannot be performed without
deciding which events belong to which category. If the probability of occurrence of events in one
category is low (the "rare" events), a P300 will be elicited by these events. The P300 has been
successfully employed in the study of different aspects of pitch processing (e.g., Besson and Faíta,
1995; Besson, Faíta and Requin, 1994; Besson and Macar, 1987; Cohen and Eretz, 1991; Cohen,
Granot, Pratt and Barneah, 1993; Crummer, Hantz, Chuang, Walton, and Frisina, 1988; Ford, Roth,
and Kopell, 1976; Granot and Donchin, 1996; Hantz, Crummer, Wayman, Walton, and Frisina, 1992;
Hantz, Kreilick, Braveman and Swartz, 1995; Hantz, Kreilick, Kananen and Swartz, 1997; Klein,
Coles and Donchin, 1984; Paller, McCarthy and Wood, 1992; Verleger, 1990; Wayman, Frisina,
Walton, Hantz and Crummer, 1992).

Donchin and his colleagues (Donchin, 1981; Donchin and Coles, 1988) proposed the "Context
Updating" theory to account for the functional significance of the P300. The assumption is that the
P300 is elicited whenever the circumstances require a revision in the model of the operating context
that governs the subject's information processing. For instance, in pitch memory tasks, musicians
without AP must continually compare incoming pitch information with their internal framework of
the scale, or with a representation of a tone they have just heard. Hence, their model of the "auditory
environment" is continually modified to take into account the discrepancies between the actual and the
modeled context.
Conversely, if subjects with AP maintain permanent representations of pitches, they do not update
their mental scheme during auditory judgments, and therefore the rare events in an auditory Oddball
Paradigm will not elicit a P300. This was the prediction that Klein et al. (1984) tested. They presented
both visual and auditory oddball sequences to AP and to RP subjects. Klein et al. (1984) reported that
the amplitude of the auditory P300 was reduced, sometimes almost to zero, in the AP subjects, but not
in the RP subjects. Both groups displayed a normal P300 in response to the rare events in a visual
oddball sequence. Hence, the absence of the P300 in the AP subjects was specific to the auditory
sequences. Furthermore, Klein et al. (1984) reported a negative correlation between the amplitude of
the P300 and the subjects' performance on a pitch-screening test that was administered to each subject
prior to the oddball task. The better the performance on the pitch screening test, or the stronger the
subject's AP skill, the smaller the amplitude of the P300 elicited in that subject by the rare auditory
stimuli. This small P300 amplitude elicited by AP subjects during auditory tasks has commonly been
referred to in the literature as the "AP effect."
Several investigators have reported successful replications of Klein et al. (1984). For instance,
collaborative research efforts between psychologists and music theorists at the University of
Rochester show that musicians with AP also exhibit a reduced P300 in oddballs involving sine tones,
piano tones, musical intervals, and various instrumental timbres. Hantz et al. (1992) investigated
differences in pitch processing in three different groups of subjects including musicians with AP,
musicians without AP, and nonmusicians. The authors presented subjects with two different oddballs;
a contour discrimination task consisting of ascending and descending intervals, and an interval
discrimination task consisting of minor and major thirds. In both tasks, the AP musicians exhibited
smaller P300 amplitudes than either of the other two groups and significantly smaller amplitudes than
the musicians. This work was corroborated by Wayman et al. (1992) who assessed the effects of
musical training and AP ability on event-related brain activity in response to sine tones. Again, three
different subject groups were used. The task was a simple oddball consisting of 500 and 1000 Hz
tones. The results show that AP musicians exhibited a significantly smaller P300 amplitude than
either the musicians or nonmusicians. Finally, Crummer et al. (1994) examined the perceptual and
neurophysiological differences in subjects with varying musical training during timbre discrimination
tasks. Three stimulus series were used comprising three different types of timbre discrimination
varying in difficulty. These included string instruments in the same family (cello and viola), flutes
made of different material (silver and wood), and like instruments of slightly different size (B-flat
versus F Tubas). Results show that the AP group had smaller P300 amplitudes and shorter latencies
for all three series than the other groups. These results are interesting in that AP subjects were now
showing a reduced P300 in timbre discrimination tasks in addition to pitch discrimination tasks. The
authors suggest that the pitch differences within the harmonics of the note played may account for the
observed waveform differences.
Although these studies concur with the main finding of Klein et al. (1984), they diverge from it in that
they do not find a correlation between AP subject test scores on the pitch screening measure and P300
amplitude. It is important to note, however, that while these authors employed the same pitch

screening measure as the one used by Klein et al. (1984), there are some differences in the exact
implementation of this test. Overall, their screening procedure is more difficult in that the test tones
and inter-stimulus-intervals are shorter. Furthermore, these authors did not give the subjects a practice
session prior to the test as Klein et al. (1984) did, and they only provide half credit for semitone errors
as opposed to full credit as in the Klein et al. (1984) study.
In other instances, attempts to replicate the Klein et al. (1984) experiment yielded contradictory
results. Johnston (1994) found no correlation between the P300 and AP ability. While Johnston (1994)
attempted to replicate the ERP procedure of Klein et al. (1984) as closely as possible, he did employ a
more difficult pitch screening procedure and he added a third subject group consisting of
non-musicians. In both the visual and the auditory oddballs, the non-musician group showed the
largest P300's, followed by the AP group, while the P300's of the non-AP musician group appeared
the smallest. In his attempt to explain the discrepancies between the results of his study and the Klein
et al. (1984) results, Johnston (1994) concluded that AP musicians might show a greater relative
change in P300 over the course of an experimental run. In particular, this might occur if AP ability
makes these subjects more prone to habituate or become more accustomed to the stimuli in the
pitch-discrimination condition. In other words, he suggests that Klein et al. (1984)'s AP subjects may
have habituated to the stimuli faster than those in his study. Generally, this habituation process has
been correlated in the literature with a smaller P300.
Another replication of the Klein et al.(1984) study was performed by Bischoff et al. (1995 --
unpublished data). Bischoff et al. (1995) used the same pitch screening and ERP procedures as Klein
et al. (1984). However, contrary to the data obtained by Klein et al. (1984), Bischoff et al. (1995)
found that the ERPs elicited in the AP musicians were similar in both visual and auditory modalities.
In addition, both Johnston (1994) and Bischoff et al. (1995) found no correlation between the
amplitude or area of the P300 and the percentage of correctly identified tones in the pitch screening
tests. In fact, in Bischoff et al.'s study, some of the AP musicians who scored the highest on the pitch
screening measure also displayed the largest P300's.
Finally, Hantz et al. (1995) presented AP musicians, non-AP musicians, and non-musicians with a
"pitch memory series" which consisted of a target pitch (G4) interspersed throughout a series of
random pitches. They found that all three groups produced some late positive (P300) activity for the
pitch memory series. Hantz et al. (1997) likewise report that AP subjects produce robust P300's during
a melodic and harmonic closure task.
It seems evident from the above survey that there is considerable variance, across subject groups and
laboratories, in the degree to which P300 is indeed absent in AP subjects. Such variance may be
accounted for by subtle differences among the tasks used in the different studies and to individual
differences among AP subjects that have not been considered in the design of the studies. Our most
recent AP study examines these possibilities. Specifically, we examine the possibility that the
contradictory results may be due to variability in the degree to which different studies used tasks that
placed a heavy demand upon pitch memory. It is conceivable that the subjects in some of these studies
were not consistently using their absolute pitch capabilities. Hantz et al. (1995) even claim that the
task used in their study was not sufficiently difficult to warrant the use of an absolute pitch strategy.
Hence, in order to observe the differences in processing in AP and RP musicians, we have designed a
more challenging pitch memory task in which the use of the AP ability could be beneficial. At the
same time, the task is not beyond the capabilities of musicians who possess good relative pitch
abilities.
In our most recent AP experiment, subjects (who were screened for both AP and RP ability) were
presented with two auditory oddball tasks, and one visual oddball task that served as a control. In the

first auditory task (pitch memory task) subjects were asked to differentiate between diatonic and
nondiatonic tones within a tonal framework. We assumed that AP subjects would use their AP ability
in this task because of its difficulty level. In the second task (the contour task) subjects were asked to
differentiate between tones moving upwards or downwards. This task would require both AP and RP
subjects to hold each tone in working memory in order to successfully determine the direction of
motion from one tone to the next. Hence, with this task, which is "relative" in nature, we hypothesized
that the RP subjects would perform better, in terms of accuracy and RT, than the AP subjects.
Our research (reported at ICMPC5 and expanded upon since) suggests that the relationship between
the amplitude of the P300 component of the ERP and the AP ability depends on the degree to which
subjects employ their AP ability during a musical task. This, in turn, may be greatly influenced by the
task as well as by the level of the AP ability of the individual subject. For example, those AP subjects
who also possess strong relative pitch skills and who reportedly used a relative pitch strategy when
performing the pitch memory task, elicited a large P300 to the rare, nondiatonic tones. Conversely,
those AP subjects with weaker relative pitch skills showed a smaller P300 to the rare nondiatonic
tones. Thus, the P300 amplitude may serve as a marker of the type of strategy (absolute or relative)
that subjects employ in a given situation.
Second, this research lends further support to the "Context Updating" theory of P300 that assumes that
the P300 is elicited whenever the circumstances require subjects to modify their model of the
operating context. That is, the P300 reflects the maintenance of a model of the environment that is
continually adjusted to take into account the discrepancies between the actual and the modeled
context. Several accounts of AP in the literature claim that AP subjects have a permanent template, or
a set of internal standards, in their memory that contains representations of pitches. Hence, because of
this template, they are able to "fetch the name of a tone without comparing the representation of the
tone they have just heard with a recently fetched representation of a standard." (Klein et al., p. 1306).
RP subjects, on the other hand, are assumed to have a more movable conceptual template, so they
must continually compare incoming pitch information with their representation of the scale. Our
results are consistent with the predictions derived from this model. Substantial support for this model
has also been reported in a recent study by Zatorre et al. (1998). Through the use of positron emission
tomography and cerebral blood flow studies, these authors found that, when listening to tones, AP
subjects show activation of the left posterior dorsolateral frontal cortex while control subjects do not.
On the other hand, they observed activity within the right inferior frontal cortex in controls but not in
AP subjects during an interval task. The authors claim that, "subjects without AP use tonal working
memory in both tasks, but AP possessors may not need access to this mechanism for interval
classification because they are able to classify each note within the interval by name" (p. 3177). They
further claim that, "this conclusion is concordant with the reported absence of the P300-evoked
electrical component..." (p. 3177).
Third, this research has demonstrated that there are indeed varying levels of the AP ability; some AP
subjects have, in addition, strong relative pitch abilities and others do not. These differences are
reflected quite consistently by the behavior of the P300 component. For instance, those AP subjects
who are the slowest and least accurate during the contour task generally show the smallest P300
amplitude to the pitch memory task and perform the least well on the relative pitch test. These
findings corroborate the work of Albert Bachem who presented a detailed classification of the levels
of AP after an extensive study of the ability between the years of 1937 and 1955. They also
corroborate a hypothesis proposed by music psychologist Jay Dowling in 1986:
It seems unlikely that the ability for absolute pitch is bimodally distributed in the population, that is,
that some have it and others do not. It seems more consonant with our experience that people possess

the ability in varying degrees and that whether the ability shows up depends on the particular task
demands the person faces (p. 122).
Furthermore, they lend further support to the call for a multidimensional measure of AP which could
be assessed on the basis of standardized tests. Such a need was first expressed by Takeuchi and Hulse
in 1993. These tests would provide detailed information about subjects' degrees of AP. In addition,
they would reflect performance on both identification and production tasks, as well as sensitivity to
timbre, pitch register, and pitch class. Moreover, results from this study show that researchers need to
exercise caution when selecting subjects for AP experiments. Subjects must be thoroughly screened
and various aspects of their pitch ability need to be assessed.
Finally, this research strengthens the idea that the memory system for pitch and interval distances is
distinct from the memory system for contour. According to Dowling (1978), when melodies are
processed, both interval information and contour information are extracted. However, whereas the
interval information is stored in long-term memory only after a key has been established, the contour
information is extracted immediately, regardless of the key information, but it rapidly decays as the
melody progresses. That is, these two dimensions of melody function independently. This was nicely
demonstrated in the current study. All of the RP and AP subjects exhibited a reduced or even
nonexistent P300 to the rare stimuli in the contour task (all P300's were less than 5 microvolts). If the
RP subjects were actually focusing on specific pitch information, they would have been involved with
relating each pitch to the rest of the musical context and probably would have responded more
profoundly to the rare, nondiatonic tones as the majority of them did in the pitch memory task. In the
case of the AP subjects, when they were asked to focus on specific pitch information in the pitch
memory task, their waveforms were generally larger than those shown to the contour task. This
demonstrates that they, too, are able to ignore specific pitch information when they change their
listening strategies and focus specifically on contour shifts.
Finally, one of the trends in the data revealed that most of the subjects showing a strong AP/reduced
P300 relationship report having relatives who are either musicians or who also possess the AP ability.
These findings are interesting in view of recent reports (described above) by geneticists of a strong
genetic component in the AP ability. The data from the current study show a strong correlation
between the P300 amplitude of the AP subjects and the number of their relatives who are either
musicians or have the AP ability. The subjects with the greatest number of relatives who are
musicians tend to be the ones with the smallest P300 amplitudes to the pitch memory task. Likewise,
the subjects with the greatest number of relatives with the AP ability tend to be the ones with the
smallest P300 amplitudes to the pitch memory task. In the future, it would be interesting to explore
this genetic component further.
References:
Bachem, A. (1955). Absolute Pitch. Journal of the Acoustical Society of America, xxvii, 1180.
Baharloo, S., Johnston, P.A., Service, S. K., Gitschier, J, and Freimer, N.B. (1998). Absolute Pitch:
An Approach for Identification of Genetic and Nongenetic Components. American Journal of Human
Genetics, 62, 224-231.
Besson, M., and Faïta, F. (1994). Electrophysiological Studies of Musical Incongruities: Comparison
Between Musicians and Non-Musicians. In, Proceedings of the Third International Conference on
Music Perception and Cognition (pp. 41-43), Liège, Belgium: ICMPC.
Besson, M., Faïta, F., and Requin, J. (1994). Brain Waves Associated with Musical Incongruities
Differ for Musicians and Non-Musicians. Neuroscience Letters, 168, 101-105.

Besson, M., and Macar, F. (1987). An Event-Related Potential Analysis of Incongruity in Music and
Other Non-Linguistic Contexts. Psychophysiology, 24, 14-25.
Chouard, C.H., and Sposetti, R. (1991). Environmental and Electrophysiological Study of Absolute
Pitch. Acta Otolaryngol., 111, 225-230.
Cohen, D., and Erez, A. (1991). Event-Related Potential Measurements of Cognitive Components in
Response to Pitch Patterns. Music Perception, 8, 405-430.
Cohen, D., Granot, R., Pratt, H., & Barneah, A. (1993). Cognitive Meanings of Musical Elements as
Disclosed by Event-Related Potential (ERP) and Verbal Experiments. Music Perception, 11/2,
153-184.
Cook, E.W., & Miller, G. (1992). Digital Filtering: Background and Tutorial for Psychophysiologists.
Psychophysiology, 29(3), 350-367.
Crummer, G.C., Walton, J.P., Wayman, J., Hantz, E.C., & Frisina, R.D. (1994). Neural Processing of
Musical Timbre by Musicians, Nonmusicians, and Musicians Possessing Absolute Pitch. Journal of
the Acoustical Society of America, 95/5, 2720-2727.
Donchin, E. Surprise!. . . Surprise? (1981). Psychophysiology, 18/5, 493-513.
Donchin, E., and Coles, M. G. H. (1988). Precommentary: Is the P300 Component a Manifestation of
Context Updating? Behavioral and Brain Sciences, 11/3, 357-374.
Dowling, W. J. (1978). Scale and Contour: Two Components of a Theory of Memory for Melodies.
Psychological Review, 85, 341-354.
Dowling, W. J., and Harwood, D. L. (1986). Music Cognition, Orlando: Academic Press, Inc.
Ford, J., Roth, W., and Kopell, B. (1976). Auditory Evoked Potentials to Unpredictable Shifts in
Pitch. Psychophysiology, 13/1, 32-39.
Granot, R., and Donchin, E. (1996). An ERP note on Musical Scales: The First Scale Tone is
Processed Differently. (In preparation).
Gratton, G., Coles, M.G.H., & Donchin, E. (1983). A New Method for off-line Removal of Ocular
Artifact. Electroencephalography and Clinical Neurophysiology, 55, 468-484.
Gregerson, P., and Kumar, S. (1996). The Genetics of Perfect Pitch. American Journal of Human
Genetics Supplement, 59, A179.
Hall, D. E. (1982). Practically Perfect Pitch: Some Comments. Journal of the Acoustical Society of
America, 71, 754-755.
Hantz, E.C., Crummer, G.C., Wayman, J.W., Walton, J.P., & Frisina, R.D. (1992). Effects of Musical
Training and Absolute Pitch on the Neural Processing of Melodic Intervals: A P3 Event-Related
Potential Study. Music Perception, 10/1, 25-42.
Hantz, E.C., Kreilick, K.G., Braveman, A.L., & Swartz, Kenneth P. (1995). Effects of Musical
Training and Absolute Pitch on a Pitch Memory Task: An Event-Related Potential Study.
Pyschomusicology, 14, 53-76.
Hantz, E.C., Kreilick, K.G., Kananen, W., and Swartz, K.P. (1997). Neural Responses to Melodic and
Harmonic Closure: An Event-Related-Potential Study. Music Perception, 15/1, 69-98.

Johnston, P. A. (1994). Brain Physiology and Music Cognition. Unpublished Doctoral Dissertation,
University of California, San Diego, California.
Klein, M., Coles, M.G.H., and Donchin, E. (1984). People with Absolute Pitch Process Tones without
Producing a P300. Science, 223, 1306-1309.
Krumhansl, C. L. (1979). The Psychological Representation of Musical Pitch in Tonal Context.
Lockhead, G.R., and Byrd, R. (1981). Practically Perfect Pitch. Journal of the Acoustical Society of
America, 70, 381.
Paller, K.A., McCarthy, G., and Wood, C.C. (1992). Event-Related Potentials Elicited by Deviant
Endings to Melodies. Psychophysiology, 29/2, 202-206.
Rakowski, A. (1978). Investigations of Absolute Pitch. In E. P. Asmus, Jr. (Ed.), Proceedings of the
Research Symposium on the Psychology and Acoustics of Music (pp. 45-57). Lawrence: University of
Kansas Division of Continuing Education.
Rakowski, A., and Morawska-Bungeler, M. (1987). In Search for the Criteria of Absolute Pitch.
Archives of Acoustics, 12, 75-87.
Schlaug, G., Jancke, L., Huang, Y., & Steinmetz, H. (1995). In Vivo Evidence of Structural Brain
Asymmetry in Musicians. Science, 267, 699-701.
Siegel, J. (1974). Sensory and Verbal Coding Strategies in Subjects with Absolute Pitch. Journal of
Experimental Psychology, ciii, 37.
Stumpf, C. (1883). Tonpsychologie I. (Hirzel: Leipzig).
Sutton, S., Tueting, P., Zubin, J., & John, E.R. (1967). Information Delivery and the Sensory Evoked
Potential. Science, 155, 1436-1439.
Takeuchi, A.H., & Hulse, S.H. (1993). Absolute Pitch. Psychological Bulletin, 113/2, 345-361.
Verleger, R. (1990). P3-evoking Wrong Notes: Unexpected, Awaited, or Arousing? International
Journal of Neuroscience, 55, 171-179.
Ward, W.D., and Burns, E.M. (1982). Absolute Pitch. In, D. Deutsch Ed., The Psychology of Music.
(New York: Academic Press), 431-451.
Wayman, J.W., Frisina, R.D., Walton, J.P., Hantz, E.C. and Crummer, G.C. (1992). Effects of
Musical Training and Absolute Pitch Ability on Event-Related Activity in Response to Sine Tones.
Journal of the Acoustical Society of America, 91/6, 3527-3531.
Zatorre, R.J., Perry, D.W., Beckett, C.A.,Westbury, C.F., and Evans, A.C. (1998). Functional
Anatomy of Musical Processing in Listeners with Absolute Pitch and Relative Pitch. Proceedings of
the National Academy of Science, 95, 3172-3177.
Back to index


Vaughn
RECONSTRUCTING MELODIES: HOW CHILDREN FROM INDIA AND THE U.S. PARSE CULTURALLY
FAMILIAR AND UNFAMILIAR MELODIES.
Dr. Kathryn V. Vaughn
kvv@mediaone.net
Background:
Melodic pattern recognition has been studied in terms of contour similarity and
the principles of expectation enhanced by implied harmonic context. However,
the developmental aspects of our ability to recognize a simple tune are not
fully understood. Bamburger's work with children re-arranging tunes with
musical bells showed that training affects the degree to which a sense of motor
efficiency enters into performance for different age groups. Dowling's work on
memory for contour and the role of rhythm has also shed light on the age at
which the recognition of a tune becomes stable in an musically ambiguous
context.
Aims:
The purpose of this study is to compare cross-culturally children's cognitive

strategies in recreating melodies from digitally sectioned "blocks" of
synthesized and live performances using graphical a computer interface. The
technology allows us to "play back" each step in the child's process of
deciding how the musical pieces fit together, e.g., what is disposable, and
what is essential. In looking at how the musical blocks are spatially arranged
on the computer screen we gain some insight into the organization of the
child's thinking about his/her performance as well as the musical content.
method:
Two groups of elementary school children, one from the United States and one
from New Delhi were given individual instruction in using a laptop computer to
perform tunes on musical blocks. The blocks were of a variety of shapes and
were assigned to segments of a sampled melody such that no identifiable pattern
of blocks would infer any melodic pattern. Familiar melodies were considered
"in-culture" such "Twinkle, Twinkle Little Star." The unfamiliar,
"out-of-culture" melodies were taken from the simple repetitive working songs
of Kaluli women of Papua New Guinea. Blocks were randomly distributed on the
screen before each subject trial. Subjects were then allowed click and listen,
drag and place, or select and throw away any tone blocks, such as duplicate
notes or sections. Unlimited time was allowed. . The subject finally performed
the piece by clicking the mouse on each of the musical blocks in sequence.
Play-backs and revisions were allowed until the child was satisfied. Each
interaction event throughout and all performances were stored for analysis.
Results:
Final layout of the musical segments fell into several categories of spatial
arrangements, from circular, to a staircase shape, and in the case of "Twinkle
Twinkle", a contour which followed the melody. There was no significant
file:///g|/Mon/Vaughn.htm (1 of 2) [18/07/2000 00:32:46]

Vaughn
difference by gender. However, there was a significant difference in spatial

arrangement by culture group. The Indian students uniquely used a pattern
starting sequentially from the top to the bottom of the screen which was not
found in any of the responses from the U.S. group. The success rate for
reconstructing the unfamiliar melody was below 25%. However, the spatial
strategies were the same no matter what the melody. Analysis comparing
standardized school measures of creativity for the U.S. subjects will be
discussed. There were no problems encountered in using the technology with
previously inexperienced children.
Conclusions:
The difference in organizational strategies between the Indian and Western

children has implications for understanding how the type of music learned at an
early age may influence how melodies are encoded and reproduced. The low
success rate for reconstructing the unfamiliar "out-of culture" song in both
groups may further support this theory. Clear spatial layouts characterized all
responses, regardless of familiarity with using the computer as the musical
instrument. This shows that the musical framework is independent of the means
of expression to some extent. We will discuss possible applications for
authentic assessment of creativity using this technology for melodic
reconstruction
Back to index
file:///g|/Mon/Vaughn.htm (2 of 2) [18/07/2000 00:32:46]

The majority of Year 12 students in New South Wales sit for the Hig
Proceedings paper
A METACOGNITIVE ACCOUNT OF MUSICAL KNOWLEDGE AND MUSICAL PROCESSING

Robert H. Cantwell, Neryl Jeanneret, Yvette Sullivan & Ian Irvine
Faculty of Education
University of Newcastle, Australia
Traditionally, accounts of musical learning and musical products have focussed on both a restricted conception of the
possibilities of musical knowledge in conjunction with the explication of what may be seen as lower-order cognitive
processes (J. McPherson, 2000; G. McPherson & Thompson, 1998; Swanwick, 1998). Consequently, descriptions of
musical products such as composition and performance have been restricted in their explanatory power to interpretations of
competence based on issues such as "technique", "craft" and "quality" that fail to provide depth and a convincing notion of
musical excellence (NSW Board of Studies, 1999; Williams, 1999).
The paper proposes a framework for analysing the construction and use of musical knowledge based upon a four-level
model of cognition (Cantwell, in preparation). This model specifies four interactive components underlying learner activity
in any domain (see Figure 1): an operative component descriptive of the real time cognitive operations utilised in the
process of learning; a regulative component descriptive of those processes used in planning, controlling and regulating the
learning processes; a construct component descriptive of the beliefs and understandings about learning that act to drive the
regulative activity; and an efficacy component descriptive of situationally induced competency judgements influencing the
quality of engagement and volitional behaviours.
file:///g|/Mon/Cantwell.htm (1 of 11) [18/07/2000 00:32:51]

Figure 1 : Four level model of cognition underlying the current conceptualisation of musical learning (Cantwell, in
preparation)
Conceptually, the regulative, construct and efficacy components represent different aspects of metacognitive knowledge,
while the operative component represents implementation at the cognitive level. It is contended in this model that what
occurs at the operative level is both driven by and reflective of decisions made at the metacognitive level of task analysis.
The process of metacognitive decision making is further presumed to include interactions between the three metacognitive
components. Efficacy judgements, for example, are likely to predict qualities of task engagement through the situationally
determined judgement of potential competence in addressing and completing the task. Such decisions are mediated by
construct level conceptions of task and task requirements. For example, individuals can be said to approach learning with an
array of understandings of learning and expectations about learning. Individual theories of knowledge and knowing (eg.
epistemological beliefs; beliefs about intelligence; self-regulatory knowledge; depth and breath of domain knowledge), as
well as individual theories of self as learner (eg. motivational goals, attributional beliefs, efficacy beliefs) all contribute in a
situationally specific way to determine both the direction and form of task engagement, and through this, the quality of
regulative activity in controlling real time learning.
Musical learning, production, performance and assessment are argued in this paper to be explicable through consideration
of processes involved at the operational, regulative, construct and efficacy levels of human cognition. Whilst
acknowledging the uniqueness of the knowledge base of music, this model nonetheless situates musical excellence within a
common framework of learning in other domains. Empirical support for this proposition will be drawn from three areas:
from research into the planning processes of musicians (Cantwell & Millard, 1994, Sullivan & Cantwell, 1999); from
research into the composing processes of musicians (Irvine & Cantwell, 1999; Irvine, Cantwell & Jeanneret, 1999), and
from research into the assessment of musical thinking (Jeanneret, 1999).
Strategic processes in musical planning
In recent work, Cantwell and Sullivan (Cantwell & Millard, 1994; (Millard) Sullivan & Cantwell, 1999) have investigated
the planning processes of both novice and experienced musicians in learning new score. In both studies, evidence indicated
that factors emerging from the construct level in particular provided the strongest pointers to higher level strategy use
associated with a higher quality planning focus.
In their original study, Cantwell and Millard (1994) speculated that the level at which musicians form intentions in learning
new music may relate to factors beyond the ability to simply read or decode musical score. The processes involved in
learning new music were conceptualised by Cantwell and Millard to be comparable to those involved in the reading and
comprehension of text material. Thus, it was argued, the shift from decoding to comprehending in text learning may be
parallelled in the shift from notational decoding to meaning construction in learning new music. Understanding the quality
of planning in learning new music, then, necessarily involves acknowledgement of conceptualising different potential levels
of meaning, and through this, the potential use of increasingly complex strategic repertoires.
We may conceptualise individual differences in musical learning, then, as reflective of an interaction between underlying
musical epistemologies, prior knowledge states and situationally induced motivational states. Musical epistemology refers
to the individual's conceptualisations of both the structure of musical knowledge and the possibilities of musical knowledge
(that is, both the limits of what I "know" about music and the limits of what it may be possible to "know" at some

indeterminate future point). Additionally, musical epistemology refers to conceptions of how such knowledge has been and
may be constructed. That is, conceptions of music incorporate a strategic as well as substantive component.
In Cantwell and Millard's (1994) work, reported approaches to learning (Biggs, 1987) were seen as indicative of underlying
motivational and epistemological dispositions. Typically, students will report a disposition towards a deep or surface
approach to learning. A deep approach represents a combination of intrinsically derived learning goals with a strategic bias
towards the construction of complex and highly personalised meanings. A surface approach, on the other hand, represents a
combination of extrinsically derived learning goals with a strategic bias towards the reproduction of conventional categories
of the target knowledge. Cantwell and Millard's (1994) data from 8th Grade music students revealed that the surface/deep
distinction discriminated between individuals in terms of the structural complexity of the planning processes and the
strategic complexity associated with the production of such plans. Two examples from a deep oriented student and a surface
oriented student learning the same score may illustrate:
I could handle this ... so I would probably be sure that I am playing the correct notes and can pitch them
straight away ... and then build on that trying to make the piece musical. I'd do this by smoothing it all out,
playing the notes evenly, being expressive ... I mean following the dynamics or doing my own if it sounds too
boring - and yeah, I'd learn it first and then add to it, although I could probably do both at once. (Deep
oriented student, Cantwell & Millard, 1994, p 58)
I'd just ... um ... keep playing it, and if I got any mistakes I'd just fix it up ... like I'd keep going over the same
spot if I kept getting it wrong. Just like that I s'pose ... 'til I got it right. (Surface oriented student, Cantwell &
Millard, 1994, p 56)
In these instances, the differences in planning focus appear quite glaring. For the deep oriented student, learning the score
involved overlaying the notational elements with musical meaning - " trying to make the piece musical". In both
understanding of the nature of music, and in variety of strategies called upon to put in place such meaning, the deep
oriented student indicated a much more complex musical epistemology than was evident in the reproductive focus of the
surface oriented student ("I'd keep going over the same spot if I kept getting it wrong").
In a second study, (Millard)Sullivan and Cantwell (1999) tested a causal model in which prior knowledge and approaches
to learning were predicted to influence the complexity of strategy use and through this, the quality of planning focus.
Fifty-three tertiary music students reported on the way they would learn two scores to the point of performance
competence. In order to control for the effect of pattern identification in conventional notation, one score was presented in
20th Century graphic notation. Measures of approach to learning were also taken. Verbal protocols provided by the
participants were analysed for the presence of low level, mid level and high level strategies (see Table 1), and were
categorised in relation to the level of musical meaning aspired to in planning (see Table 2). For both the traditionally
notated score and the graphically notated score, clear relationships between approach to learning, complexity and level of
strategy use and quality of focal planning were observed. Those musicians reporting (at the construct level) a deep approach
to musical learning utilised a greater array of both mid level and high level strategies in constructing musical meaning, and
focussed planning at higher levels of abstraction than was the case for musicians reporting a surface approach to music
learning.
Examination of protocols from the students in Sullivan and Cantwell's study revealed that the epistemological assumptions
underlying a deep approach for these more expert subjects involved more complex and more abstract levels of musical
understanding. Where in the Cantwell and Millard study higher quality responses found points of integration in the music,
planning was nonetheless constrained to the parameters of the score itself. While this more limited epistemology was also
evident in responses of some of the Sullivan and Cantwell subjects (eg. Levels 3 and 4 in Table 2), some students displayed
significant development of epistemological understanding, allowing for constructed meanings to move beyond the literal
interpretations imposed by the attributes of the scores themselves. The exchange between the interviewer and subject 35 in
Sullivan and Cantwell's data illustrates this:
So you would learn this one differently from the other one. You didn't start at the beginning and then the end
lines and work in really slowly as you said for the other one.
Not for this sort of performance, no. I don't think its ... probably in about fifty years time to be exact, this bit of
notation here is the way music continues to grow. We'll be sitting down and looking at the semi-breves, and
things we have now like in a classroom, with spiked hair and big, huge, green fingernails hanging off, and
some strange instruments no-one's ever seen and someone will say this is how music's done in the twentieth
century, and everyone will be totally bamboozled by the fact that we have two minims making up a semibreve,
and four quavers making up... yeh, the exactness of it all. Music evolves, music notation has evolved or is
continuing to evolve. (S35, graphic score)

In short, the research reported by (Millard) Sullivan and Cantwell provides significant evidence for an emphasis on factors
not conventionally considered in explaining musical competence. While all musicians in their studies had attained a high
degree of technical competence in terms of notational fluency, it was still possible to discriminate between musicians on the
basis of the structural complexity of their planning processes, which in turn were seen to reflect fundamental differences
both in the complexity of the strategic repertoire underlying the planning and, importantly, in the driving epistemological
assumptions underlying intention formation and implementation.
Table 1: Strategic behaviours evident in the planning behaviours of musicians engaging new score (from Sullivan &
Cantwell, 1999)
Low level strategies Mid-level strategies High-level strategies
Association: linking of two or Speed Alteration: Approaching the Interpretation: Strategy of imposing
more musical elements without learning of a piece at a tempo slower meaning on small parts, sections, or
transforming the musical than that which is set or preferred the whole of a piece thereby
meanings transforming the score into something
original and meaningful
Rote learning: Reproducing Chunking: strategy of sorting Patterning: involves searching for
larger or smaller units of music smaller, relatively unmeaningful units underlying themes, ideas, styles,
with the aim of memorisation. of musical information into larger, variations and other less obvious
Does not involve more meaningful units. structures so that a clearer
transformations of meaning understanding of the whole piece may
be gained.
Trial and error: reasonably Linking: When new musical Prioritising: Involves the sorting of
unsystematic strategy selection, information is referenced to prior relevant and important musical
which is persevered with until knowledge (e.g. composer, genre or information from less relevant and
unsuccessful and then replaced style, known terms and symbols) important information into a hierarchy
so that the goal may be achieved in an
orderly fashion.
No response/Avoidance: where Scanning: A deliberative overview of Monitoring: Process of testing

no clear strategy is apparent or the score to identify elements in the oneself, checking whether strategies
where perceived difficulty is music that may for example pose are working, asking questions about
overcome simply avoiding the problems, create interest or enhance strategic behaviour and so forth.
piece. fluency.
Sight reading: A strategy of Research: A form of external .

"playing through" a piece with recourse in which assistance with
no specific aim beyond understanding specific elements of the
"hearing" the general sound or music is sought to aid interpretation of
level of difficulty. the score
External recourse: Using an . .

external resource to solve the
musical problem rather than
developing alternative strategies
Table 2: Categorisation of focal planning levels of musicians engaging new scores (from Sullivan & Cantwell, 1999)
Planning level Sample protocol

LEVEL 1: No response, irrelevance, or How would you approach this score?

avoidance
I really don't know. I find this terribly strange
You're not familiar with this music?
No
Any ideas on how you would learn this?
I don't have any. I do but ... no .. but ... I don't know
(S09, graphic score)
LEVEL 2: Focus on discrete elements of How would you go about learning this one?
musical information (e.g. notes, dynamics,
rhythms) without considering overall intent I don't know. I'd have to learn what all the symbols meant.
and design of the music
Are you familiar with this type of music?
I've never seen anything like it before.
How do you think you would learn it for an exam or performance?

Well, I guess I would just take the symbols, like I'd take those as long
notes like the long lines, and those things like staccato or just sharp ...
and this could be vibrato. I don't really know. I guess I'd try and work
out what the symbols mean. (S15, graphic score)
LEVEL 3: Still a focus on components, but How would you go about learning this piece for a performance?
there is evidence of linking elements, and of
prioritising more important elements above I'd probably go through it once just to know where all the melodies and
less important (e.g. melody over everything go and find the difficult passages, work on those bits and
note-learning) then try and put it all together, and with the difficult passages, like work
on them slowly. (S04, trad. score)
LEVEL 4: Focus shifts to the full score, If you had to learn this for a big performance, what steps would you
but is limited in perspective to the take to do that?
prescriptions of the score rather than its
transformation into an original I'd look at the whole piece overall, just to see the main style of it and
interpretation what common features are throughout the whole thing, ... look at the
time signature, look for accidentals ... I'd probably be fingering part of
it to while I was doing that .... Then I'd probably play it through once,
then I'd go back and fix mistakes bar by bar, then putting it in the
context of the bar, then the next bar, then put them together, keep going.
Then the parts where I had more problems just work on them again,
slower .. (S36, trad. score)
LEVEL 5: A transitional level in which the So this is a different sort of score. How would you go about
musician acknowledges the need to approaching this one?
consider underlying features of the score
(new interpretations, patterns, ideas, To make this score, or as I was reading it, the different lines suggested
stylistic traits and so forth) but the focus of different feelings or bursts of energy, or longer drawn out periods. As
planning still remains on the prescriptive for the first line, relating to the pitch I suppose, it'd be a longer pitch
elements of the score and then suddenly down and long, and then it would be boom, boom,
boom, three big bursts of energy. To me it would be more a process of
feeling my way around the piece, getting a feel for each different line.
(S13, graphic score)
LEVEL 6: The focus here is on the How would you go about planning to learn this piece for a
meaning of the music - what it is trying to performance?
say, what the composer intended, what the
performer wishes to express. The learning When I first see it you just read it through, learn, have a look to see
of the components of the score now serve to how you'd have to play it like whether it goes up or down and what kind
scaffold more create and interpretative of movement it is, arpeggios or scales, or how fast or slow, just general
emphases kinds of things. Then, when you've got an idea, I'd just play it through,
master the notes, and then, depending upon what style of music it was,
work out what kind of interpretation would suit it best, like which kind
of expressive techniques you should use kind of thing. (S03, trad. score)

LEVEL 7: Here the musician develops the So you would learn this one differently from the other one. You didn't
meaning of the music to incorporate other start at the beginning and then the end lines and work in really slowly
domains, to discuss the music in abstract as you said for the other one.
and philosophical terms, to consider
audience reactions, and to propose highly Not for this sort of performance, no. I don't think its ... probably in
original ideas and interpretations. about fifty years time to be exact, this bit of notation here is the way
music continues to grow. We'll be sitting down and looking at the
semi-breves, and things we have now like in a classroom, with spiked
hair and big, huge, green fingernails hanging off, and some strange
instruments no-one's ever seen and someone will say this is how music's
done in the twentieth century, and everyone will be totally bamboozled
by the fact that we have two minims making up a semibreve, and four
quavers making up... yeh, the exactness of it all. Music evolves, music
notation has evolved or is continuing to evolve. (S35, graphic score)
Strategic processes in musical composition

In this section, it is argued that, as was evident in relation to the musical planning processes reported in the previous
section, differences between musical processes and products in composition are indicative of the influence of higher order
musical thinking linked to the use of more sophisticated metacognitive processes in planning, controlling and regulating
complex musical thought. This proposition will be illustrated by reference to three case studies.
The task of composition
Reitman (1965) viewed composition as an ill structured problem solving activity. To solve ill structured problems requires
reflective or 'mindful' (Salomon & Globerson, 1987) planning and processing, a knowledge base of sufficient abstraction to
provide a framework for interpreting complex data (Cholowski & Chan, 1994), and the availability and accessibility of
domain specific problem-solving strategies (Cantwell & Moore, 1996). Planning involves establishing goals, generating
specific strategies to realise these goals, and reflecting on the appropriateness of those strategies to the attainment of the
compositional goals. Based on these evaluations, goals are realised or new goals are set. Monitoring the behaviour of
composing and evaluating the effectiveness of the music that is created sets in motion a cycle of 'doing' and 'evaluating',
and where change is necessary, regulating the cycle.
Three Case Studies
The three musicians completed a musical composition task whilst thinking out aloud. Two of the composers were students
in their first and second years of university study, the third was a professional composer. Videos and cassettes recorded the
composers' verbalisations and composition processes, which provided protocols for later analysis. Protocols were coded in
relation to the attentional focus of the composer. These foci were based on the application of conventional problem-solving
categories (e.g Lawson, 1991) and are described in Table 3. The categories are not necessarily indicative of the quality of
composition. Rather, categorising attentional foci allows inferences to be drawn about the "mindfulness" with
Table 3: Categories used in coding the attentional focus of composers
Category Descriptor Attributes
1 Problem representation The specification of the "genre" adopted

to satisfy the specific conditions
associated with the task
2 Deliberative planning Verbalised statements of planning
3 Improvised planning Generating sequences of musical sounds

on a musical instrument
4 Trialing Repeated playing of improvised planning
5 Transcribing The notational writing of music

6 Monitoring Verbalised statements, and playing of

short sections of music to review or
make judgements about short sections of
music or notes
7 Evaluation Verbalised statements, and playing of

sections of music to review or make
judgements about large sections of music
which the composer addresses the task, and the strategic variation implied by that process. Attention at the extreme
categories of the taxonomy (Categories1 &2 and Categories 6 & 7) may be associated with more explicitly metacognitive
activity. The central categories are more representative of operational level activity in "real time" composing.
Attentional shifts, then, between the extreme categories and the central categories may be seen as indicative of reference to
the more abstract construct level understandings driving decision-making at the regulative and process levels.
Figure 2: Representation of attentional focus for the novice composer

Figure 3: Representation of attentional focus for the intermediate composer
Figure 4: Representation of attentional focus for the expert composer
By way of illustration, the attentional focus of the three composers over the time of composing are included in Figures 2, 3
and 4. The X- axis on these illustrations indicate the category of attentional focus at any given point in time. The Y-axis
indicates real time.

The most notable observation from these illustrations is the increasing spread of categories involved in the composition
process as one moves from the more novice-like to the more expert-like composer. Consistent with expectations, the novice
(Case 1) gave no explicit attention to the global aspects of either problem representation or evaluation. That is, the
composition activity was closely tied to the act of composing as a regulated activity, but one which, in all likelihood,
involved little reference to higher order elements of meaning and understanding. The limited amount of monitoring is more
indicative of reflection on the surface features of the composition than on the overall meaning and direction of the
composition. Case 5, as an example of intermediate level problem-solving, reflected many of the focal and strategic
attributes of the novice. However, the processes differed in the nature of initial planning and, to a lesser extent, in the use of
monitoring and evaluation. The composer explicitly reflected upon the nature of the problem as a starting point, and then
focussed on a trialing strategy for the majority of the time spent composing. The expert composer, on the other hand, while
also giving initial time to representing the musical problem, spent a far greater proportion of time compared to the more
novice-like composers, in deliberative planning and in monitoring and evaluating.
On the basis of these case studies, experts in composition appear to be distinguished from novices in the range of attentional
foci used, in the cyclic nature of the attentional focus, in the depth of processing and in the level and variety of strategic
planning brought to bear on music composition. These case studies lend support to the view that higher-order thinking
underpins the capacity to regulate highly complex musical thought.
Assessment of musical thinking
The previous sections have emphasised the complex nature of musical learning. In particular, the research into planning and
into composing have indicated a significant role for higher-order cognitions in explaining the form and quality of musical
outcomes. Our overall theoretical framework suggests that explanations of higher order competencies lie in the nature of the
understandings and beliefs constructed about learning (or music) that act to drive the process of learning itself. It seems,
therefore, a reasonable position to suggest that the assessment of musical outcomes ought also to reflect such a range of
conceptual complexity as has been illustrated in the processes musicians employ to create music.
Assessment processes and criteria for musical products currently in operation in the New South Wales Higher School
Certificate (HSC) have, until recently, been based on norm-referencing. Criteria drawn from syllabus outcomes exist for all
aspects of the testing procedure including composition, performance and sight reading, but they have been organised into
five bands of "ranking desciptors" that dictate a normal curve based on the comparison of students (Board of Studies,
1999). There has been a gradual move towards criterion referencing and the use of benchmarks for the training of
examiners that is due for full implementation in 2001 where student achievement will be reported on a single set of
performance bands that will cover all aspects of the testing procedure including performance, composition, musicology and
aural. The development of criteria for the new system has brought to light certain inadequacies of the current system of
descriptors. This is not to imply questions about fair and credible assessment in the past. Rather, it is through the constant
development and refinement of criteria that the lack of a framework explaining musical products from a convincing
research base has become more apparent.
The descriptors for composition, for example, are largely skill-based and struggle to acknowledge musical thought in words
such as "musically convincing", "sustained involvement in the composition process", and "convincing development of
ideas" (see NSW Board of Studies, 1999, p137). Such phrases can be subject to varying interpretations that give rise to
discrepant results. This lack of a framework for the analysis and assessment of music knowledge is not confined to the
assessment of high school students. Throughout the data reported by Williams (1999) in her interviews in the tertiary
sector, the same emphasis on skill exists, with a number of references to the "craft" of composition and "technical
proficiency" without further explanation and little acknowledgment of the part cognition might play in this development.
While the notion of "originality", implying high level thought, is recognised, it would seem that skill criteria are easier to
articulate in an explanation of musical excellence than the constituents of "originality". Similar issues exist in the area of
performance. The HSC criteria for "outstanding" include reference to "sophisticated self-expression" and "a musically
sensitive and personal interpretation" (NSW Board of Studies, 1999, p.136), while Williams (1999) refers to the assessment
of performance often coming down to judgements about "musical" versus "unmusical" playing (p.29). These are words
musicians seem to understand intuitively but are often at a loss to explain in more quantifiable terms without breaking a
work down into its (often skill-based) component parts and thereby losing the sense of "whole" that is fundamental to
understanding the effect of the work (Jeanneret, 1999).
Clearly, high levels of cognitive and metacognitive musical competencies are built into notions such as "sophisticated" and
"musical", but it seems to us that this is not necessarily reflected in the current language of musical assessment. Evidence
from both the Cantwell and Sullivan's research and from Irvine's research would suggest that reference to the nature of
higher order cognitions is fundamental to discriminating between different qualities of musical outcome. We may propose,
for example, that a more sophisticated musical epistemology (as an example of a construct level belief) would acknowledge
the possibility of new expressive forms within existing musical genre (as a possible meaning of "sophisticated"), and, when

combined with an appropriate motivational state and strategic repertoire, would give rise to greater complexity of musical
thought demonstrated in the quality of the musical product (see for example, the student protocol in Level 7 of Table 2
above).
We would suggest that the four level model of cognition provides a comprehensive theoretical base through which
understanding of the underlying cognitive processes and the associated driving beliefs and understanding responsible for
the production of musical learning may be more comprehensively and validly assessed.
Note:
1. The majority of Year 12 students in New South Wales sit for the Higher School Certificate (HSC), a public examination,
the results of which students depend upon for entrance into tertiary institutions and courses. The certificate students receive
documents both the results of the examination and an assessment mark provided by the school for each subject. Students
may elect one of two music courses to present as part of their HSC and are able to specialise areas of performance,
composition and musicology over and above certain core requirements. All students must perform a minimum of two
pieces and the submission of a composition is mandatory for students who elect Music 2.
References
Biggs, J. (1987). Student Approaches to Learning and Studying. Hawthorn, ACER.
Board of Studies, NSW (1999). 1998 HSC Examination Report: Music. Sydney: Board of Studies.
Cantwell, R (in preparation). A framework for analysing student learning.
Cantwell, R. & Millard, Y.(1994). The relationship between approach to learning and learning strategies in
learning music. British Journal of Educational Psychology. 64, 47-65.
Cantwell, R. & Moore, P. (1996). The development of measures of individual differences in self-regulatory
control and their relationship to academic performance. Contemporary Educational Psychology. 21, 500-517.
Cholowski, K., & Chan, L.K.S. (1994). Knowledge-driven problem solving models in nurse education.
Journal of Nursing Education. 34 (4), 148-154.
Irvine, I. (1999). Musical composition, learning and assessment. Sounds Australian. 53, 31-33.
Jeanneret, N. (1999). Music assessment: What happens in the school sector? Sounds Australian. 53, 18-21.
Lawson, M. (1991). Managing problem-solving. In Biggs, J. (Ed.). Teaching for Learning. ACER, Hawthorn.
McPherson, J. (2000). Assessing musical performance: Current views and practice. Unpublished paper,
Faculty of Education, University of Newcastle.
McPherson, G. & Thompson, W.F. (1998). Assessing music performance: Issues and influences. Research
Studies in Music Education 10, 12-24.
Reitman, W. (1965). Cognition and Thought. New York, Wiley.
Salomon, G. & Globertson, T. (1987). Skill may not be enough: The role of mindfulness in learning and
transfer. International Journal of Educational Research. 11, 623-638.
Swanwick, K. (1998). The perils and possibilities of assessment. Research Studies in Music Education 10,
1-11.
Sullivan, Y. & Cantwell, R. (1999). The planning behaviours of musicians engaging traditional and
non-traditional scores. Psychology of Music. 27, 245-266.
Williams, C. (1999). Questions on assessment in the tertiary sector. Sounds Australian. 53, 25-31.
Back to index


Dr Ian Cross
MUSICAL CATEGORIES, ETHNOSCIENCE AND COGNITIVE ANTHROPOLOGY
Dr Ian Cross
ic108@cus.cam.ac.uk
Background:
A central problem in the study of music cognition is that of the

incommensurability of most anthropological theory with the methods and theories
of cognitive science; how can cognitive science accommodate music's cultural
actuality? One approach to this problem is to conduct empirical studies in
other than western culture(s).
Aims:
The aim of this paper is to employ the results of an empirical study conducted
in Northern Potosí, Bolivia (in collaboration with an ethnomusicologist, Henry
Stobart) to examine critically current theories of music cognition and
cognitive anthropology and to outline a framework within which a generalised
model of the cultural dynamics of music cognition might be developed.
Main contributions:
The empirical study, of the categorical perception of rhythmic relations,

started from an "ethnoscientific" position which anticipated that general
cognitive mechanisms would best account for fundamental aspects of the
perception and production of musical rhythms in different cultures. However,
the results of the study could only be interpreted in the light of highly
culturally-specific factors, suggesting that certain major tenets of current
theories of both music cognition and cognitive anthropology required to be
recast. On the basis of these results, the elements of a generalised model for
the instantiation and transmission of music in culture and cognition is
sketched out, drawing on Sperber's (1996) notions of "relevance" and of an
"epidemiology of representations" in the context of Lakoff's (1987)
"experiential realism".
Implications:
Over the last decade, anthropologists and psychologists have increasingly

realised the need to account for cognition in its cultural context. This paper
is intended as an initial exploration of the frameworks within which
anthropological and cognitive-scientific conceptions of the dynamics of music
in perception and performance might be reconciled, or at least rendered
commensurable.
Back to index
file:///g|/Mon/Cross.htm [18/07/2000 00:32:52]

OVERCOMING CHILDREN'S GENDER-TYPED PREFERENCES FOR MUSICAL INSTRUMENTS:
Proceedings paper
OVERCOMING CHILDREN'S GENDER-TYPED PREFERENCES FOR MUSICAL INSTRUMENTS: AN INTERVENTION STUDY

A.C. Harrison and S.A. O'Neill, Department of Psychology, Keele University
Introduction
Social learning theory (e.g., Mischel, 1970) stresses the importance of observational learning in the acquisition of gender-typed behaviour. Accordingly, exposure to
counter-stereotypic role-models has often been used to modify children's gender-typed preferences (e.g., toy, activity, occupations), (Katz, 1986). Findings have, however, been
mixed. Extensive exposure altered both young children's stereotypes and verbally stated preferences (Koblinsky & Sugawara, 1984), but another sustained, 10 week
intervention programme did not produce any significant differences in the gender-typed occupational preferences of kindergarten children (Weeks & Porter, 1983). Results have
been similarly inconclusive for short-term interventions. DeLeo, Moely, and Sulzer (1979) reported provision of role-models to be effective in modifying the gender-typed
beliefs of 4-6 year olds, whereas television presentation of counter-stereotypic role-models met with little success with either 7 year olds (Drabman, Robertson, Patterson,
Jarvie, Hammer, & Cordua, 1981) or 10-12 year olds (Williams, LaRose, & Frost, 1981). Huston (1983) argued that brief exposure to one model is unlikely to produce
powerful effects because whilst this strategy might influence children's stated beliefs about what is masculine or feminine, it does not, however, lead to changes in attitudes or
affective reactions to gender-inconsistent behaviour.
Studies have been most successful with 4-10 year olds. Weeks and Porter (1983) proposed that attempts to influence children's gender-typed beliefs and preferences may be
more successful when such stereotypes become more flexible. Thus, the present research focused on 7-8 year olds, the age at which gender-typed beliefs become more flexible
(e.g., Carter & Patterson, 1982). This is also the age at which the majority of children in British schools are first offered the opportunity to learn a musical instrument (the
domain focused upon in the research).
A growing body of research suggests that children have well-established gender-stereotypes regarding musical instruments (e.g., Abeles & Porter, 1978; Delzell & Leppla,
1992; Harrison & O'Neill, in press; O'Neill & Boulton, 1996), and that these stereotypes are closely related to their preferences for musical instruments (Harrison & O'Neill, in
press; O'Neill & Boulton, 1996). However, very little research has been conducted to develop intervention programmes designed to help children overcome gender-stereotyped
beliefs and self-imposed restrictions on their choice of musical instruments. The need to address gender-typed beliefs in other domains such as sport, science, and mathematics
has been recognised and attempts made to challenge such beliefs. It is important that issues regarding gendered participation in music are similarly addressed.
For the most part, research has focused on establishing the existence, rather than challenging the maintenance of gender-stereotyped beliefs about musical instruments. Many
researchers have, however, stressed the need to do so, particularly by the use of role-models. For example, Kelly (1997) suggested that presenting counter-stereotypic
role-models may be "used to encourage the performance of gender-specific instruments by the non-traditional gender" (p. 54).
Extant research (e.g., Bruce & Kemp, 1993; Pickering & Repacholi, 1998; Tarnowski, 1993) suggests that the provision of counter-stereotypic role-models may reduce
adherence to gender-stereotypes in expressing preferences for musical instruments. However, methodological limitations of these studies have meant that it is difficult to assess
the efficacy of such an approach. The majority of studies have taken no measurement of initial preference, which means that researchers could not conclude that a preference
change occurred, as many authors of these studies have suggested. No studies in this area have examined any enduring effects of intervention strategies. The present study
aimed to further our understanding by conducting a short-term longitudinal study to investigate the effect of exposing 7-8 year olds to instruments played by either stereotypic
or counter-stereotypic role-models.
Research Questions
file:///g|/Mon/Harrison.htm (1 of 9) [18/07/2000 00:32:55]

The key questions were:
1. Do 7-8 year old girls' and boys' preferences for musical instruments change after they are presented with role-models which either agree or disagree with the children's
stereotypes?
2. To what extent do any changes in children's instrument preferences remain 7 months later?
Based on the findings of Bruce and Kemp (1993), we expected that girls would show greater preference for 'masculine' instruments (trumpet, guitar, drums) after observing
female musicians playing those instruments, and that boys would indicate stronger preference for 'feminine' instruments (piano, flute, violin).
Method
Participants
Participants were 357 children (M = 185, F = 172) aged 7 and 8 years (Mean age = 8:1 years, SD = .29, range 7:7 to 9:3) attending junior schools in the south-west region of
England. Twelve schools participated in the study, forming three clusters of four schools. Schools were selected to include rural and suburban catchment areas with pupils from
working and middle class families. This age group was targeted because the majority of children had not yet participated in formal instrumental instruction, but would be
offered the opportunity to begin learning an instrument at school within the next year.
Measures
A full description of measures is provided by Harrison and O'Neill (in press). Individual interviews were designed to obtain information about the children's preferences for, and
gender-stereotyped beliefs about, specific musical instruments. Participants were presented with line drawings (without performers) and audiocassette recordings of 'masculine'
(trumpet, guitar, drums) and 'feminine' (piano, flute, violin) instruments. They were asked to rank-order the six instruments according to their preferences, from the one they
would most like to learn to play to the one they would least like to learn to play. Children's gender-stereotyped beliefs about the instruments were assessed by asking children to
indicate whether the six instruments would be played by girls, boys, or both boys and girls. Children also completed a classroom-based measure of instrument preferences. This
comprised a single sheet with the same line-drawings of the instruments as used in the interviews. Participants were asked to circle and number the instruments in order of
preference.
Intervention concerts
The research design involved presenting a series of short demonstration concerts to two of the three school clusters during the research period. A full description of the
intervention concerts is provided by Harrison and O'Neill (in press). The concerts took place during the same week, after completion of Time 1 interviews. Schools were
randomly assigned into three groups. Cluster 1 schools received concerts played by gender-consistent role-models (e.g., female musician playing piano, male musician playing
guitar). Cluster 2 received concerts played by gender-inconsistent role-models (e.g., female musician playing guitar, male musicians playing piano). The third group of schools
did not receive concerts. All musicians were peripatetic music teachers and competent performers. The same music was performed to all schools and was selected to contrast in
both tempo and musical style.
Procedure
Measures were administered on three occasions. At Time 1, instrument preferences and gender-stereotyped beliefs about those instruments were assessed during individual
interviews. 234 participants were interviewed by the female researcher, and 123 children were interviewed by a male research assistant. Preferences were also assessed with a
classroom-based measure. Intervention concerts were performed at two of three clusters of schools approximately one month after Time 1. Instrument preferences were
measured using a classroom-based measure immediately after the concerts (Time 2) and approximately seven months after the intervention concerts (Time 3). All follow-up
(Time 3) interviews were conducted by the female researcher.
Prior to the Time 1 interviews, the researcher was introduced to the children in the classroom by the class teacher. The children were informed about the study and reassured
that no-one in the school would be told about their replies. The children's consent to participate was obtained by asking them to raise their hands if they were happy to help the
researcher in "trying to find out what girls and boys think about some activities that children do". Only one child declined to participate. Parents were also contacted by letter
prior to the study and invited to alert the relevant school if they did not wish for their child to participate in the study. Interviews took place at the selected schools in a quiet
place, usually in the corridor or school library. Prior to each individual interview, in order to facilitate honest answers and satisfy ethical guidelines, participants were again told
that their replies would remain confidential with respect to the adults and children in their school. They were told that the interview was not a test and that there were no right or
wrong answers, and that they may stop at any time if they so wished.
The classroom-based measure of instrument preferences was administered when all interviews had been completed (Time 1). Immediately following the concerts (Time 2), the
children returned to their classrooms and their instrument preferences were assessed using the same classroom-based measure. For schools in Clusters 1 and 2, this was
administered immediately following the concerts. Cluster 3 schools (control schools) were visited by the research assistant approximately one month after Time 1 and the same
measure administered.
The follow-up phase of the study (Time 3) was conducted approximately seven months after the intervention concerts. We had aimed to commence the follow-up phase six
months after the concerts, but this was not possible due to constraints of the schools' time-tables. Prior to the Time 3 interviews, the researcher was re-introduced in the
classroom by the class teacher. The children were reminded about their previous interviews. Interviews were conducted in a similar manner to the Time 1 interviews.
Results
Instrument preferences
87.4% of the children were present at Time 3 (n = 312, M = 155, F = 157). Harrison and O'Neill (in press) reported that girls liked piano and flute less when they had seen a
man play the instruments. Boys liked guitar less when they had seen a woman play the instrument. Both girls and boys and liked drums more and violin less regardless of the
gender of the musician they had observed playing these instruments. The present study aimed to investigate whether there was a lasting effect of providing intervention concerts
on participants' gender-typed instrument preferences. Tables 1 (females) and 2 (males) shows the mean rank given for each of the six instruments by participants from each of
the three school clusters at Time 1, Time 2 and Time 3. Data are taken from the classroom-based measure.
Six repeated-measures ANOVAs were carried out on each instrument separately to investigate any lasting effect of the intervention concerts on female, and separately, male
participants' preferences. How much male, and separately, female participants liked each instrument were the dependent measures, with one between-subject factor (school
cluster) and one within-subject factor (three levels of time period - Time 1, Time 2, Time 3).
Female participants
There was a main effect of Time for piano (F (2, 292) = 5.62, p < .005) and drums (F (2, 292) = 16.09, p < ,0005). A main effect of Cluster was found for piano (F (2, 292) =
3.07, p < .04) and guitar (F (2, 292) = 3.34, p < .04). There was also a Time x Cluster interaction for piano (F (4, 292) = 4.40, p < .002), violin (F (4, 292) = 3.97, p < .005), and
drums (F (4, 292) = 4.39, p < .002).
Post hoc one-way ANOVAs were conducted to compare the female participants' preferences for piano, violin, drums and guitar for each of the school clusters at each time
period (Time 1, Time 2, Time 3) in order to investigate any lasting effect of providing gender-consistent or gender-inconsistent role models.
Cluster 1 (gender-consistent role-models)
For the violin, there was a main effect of Time, F (2, 80) = 4.85, p < .01. Female participants indicated less preference for violin at Time 3 (Mean = 3.56, SD = 1.52) than at
Time 1 (Mean = 2.85, SD = 1.15), t (1, 40) = 2.80, p < .01). There was no significant difference in preference for the violin between Time 2 and Time 3. This indicates some
lasting effect of the intervention concerts on the preferences of girls in Cluster 1 for the violin. There were no significant differences for any of the other instruments.
Cluster 2 (gender-inconsistent role-models)
There was a main effect of Time for piano (F (2, 100) = 10.66, p < .0005) and drums (F (2, 100) = 14.97, p < .0005). Female participants indicated more preference for piano at
Time 3 (Mean = 2.59, SD = 1.78) than at Time 2 (Mean = 3.47, SD = 1.91), t (1, 50) = 3.24, p < .002. There was no difference between their liking for piano at Time 1 and
Time 3, indicating the decrease in the girls' preference for piano immediately following the concerts did not last.

Female participants liked drums more at Time 3 (Mean = 4.17, SD = 1.82) compared to Time 1 (Mean = 4.77, SD = 1.49), (t (1, 51) = 2.57, p < .02), but less at Time 3 (Mean =
4.14, SD = 1.82) than at Time 2 (Mean = 3.24, SD = 1.90), (t (50) = 2.71, p < .01), indicating that the increase in their liking for drums lasted to some extent seven months later.
They liked violin less at Time 3 (Mean = 3.54, SD = 1.55) compared to Time 1 (Mean = 2.94, SD = 1.35), t (1, 51) = 2.34, p < .03), but there was no difference in how much
they liked violin between Time 3 and Time 2. There was also a non-significant trend for flute to be ranked lower at Time 3 than at Time 1.
Cluster 3 (no concerts)
Surprisingly, participants ranked the violin more favourably at Time 3 than both Time 1(t (39) = 2.56, p < .02) and Time 2 (t (38) = 2.42, p < .02), even though they had
received no intervention concerts.
Male participants
There was a main effect of Time for piano (F (2, 252) = 5.62, p < .005), and drums (F (2, 252) = 4.94, p < .01), and a Time x Cluster interaction for drums (F (4, 252) = 2.37, p
= .052), and flute (F (4, 252) = 2.55, p < .04.
Post hoc one-way ANOVAs were carried to compare preferences for flute, piano and drums for each of the school clusters at each time period (Time 1, Time 2, Time 3). We
also compared preferences for violin and guitar as male participants' preferences for these instruments altered following the concerts and we wanted to examine if this change
persisted seven month later.
Cluster 1 (gender-consistent role-models)
There was a main effect of Time for piano (F (2, 252) = 3.94, p < .03) and drums (F (2, 252) = 3.95, p < .03). Male participants liked piano more (Mean = 3.12, SD = 1.60) and
drums less (Mean = 2.37, SD = 1.37) at Time 3 compared to Time 2 (piano, Mean = 3.83, SD = 1.47, t (1, 40) = 2.59, p < .02; drums, Mean = 1.83, SD = 1.02, t (2, 40) = 2.19,
p < .04. There was no significant difference in preferences for drums between Time 1 and Time 3, indicating that the increase in preference immediately following the
intervention concerts did not last. There were no significant differences in preferences for any of the other instruments.
Cluster 2 (gender-inconsistent role-models)
Participants liked flute less (Mean = 4.60, SD = 1.42) at Time 3 compared to Time 2 (Mean = 4.08, SD = 1.54), t (1, 46) = 2.66, p < .02). There was also a non-significant trend
for participants to indicate greater liking for violin at Time 3 than at Time 2, but less liking for guitar at Time 3 than at Time 1. There were no significant differences in rankings
of the six instruments by male participants in the Cluster 3 schools (controls).
Comparison of participants' gender-stereotyped beliefs at Time 1 and Time 3
Harrison and O'Neill (in press) reported that male and female participants had similar ideas about which sex would play specific instruments at Time 1 - they considered piano,
violin and flute were 'for girls' and trumpet, guitar and drums to be 'for boys'. In the follow-up phase, we were interested in whether the children's gender-stereotyped
associations of instruments had altered since Time1 (seven months after intervention concerts). We compared 'flexibility' (the number of initial 'both' responses) of
gender-stereotypes at Time 1 and Time 3. We also examined any changes in participants' gender-typed beliefs about masculine, and separately, feminine instruments,
comparing the children's forced-choice selections since the forced-choice format measures primarily knowledge of stereotypes rather than attitudes to gender-roles (Signorella,
Bigler & Liben, 1993).
Proportionally more female, and separately, male participants changed their stereotypes of 'masculine' (trumpet, guitar, drums) instruments from "for girls" to "for boys"
(females, 27.4%; males, 28.9%) than changed their opinion from "for boys" to "for girls" (females, 15.0%), X2(1) = 5.87, p < .02; males, 10.9%, X2(1) = 18.99, p < .001. Thus,
whilst stereotypes of instruments did not change greatly after the concerts, when children altered their stereotyped-beliefs about instruments, they tended to change their opinion
to the 'correct', culturally accepted stereotype.
Flexibility of gender-stereotypes

Following a similar procedure to that used by Serbin, Powlishta, and Gulko (1993), a "flexibility" score was computed from the number of "both" responses given. "Flexibility"
refers to the extent to which the child believes that a culturally gender-stereotyped activity or role is appropriate for both females and males (Serbin et al, 1993). Scores could
range from 0 to 6. There was no significant difference between male and female participants' flexibility scores at Time 1 or Time 3. However, male participants showed more
flexibility at Time 3 (Mean = 2.88, SD = 1.33) than at Time 1 (Mean = 2.62, SD = 1.45), t (155) = 2.00, p < .0001. Similarly, female participants scored higher on flexibility at
Time 3 (Mean = 3.08, SD = 1.34) than at Time 1 (Mean = 2.78, SD = 1.44), t (156) = 2.31, p < .0001. These findings are consistent with those of Serbin et al (1993), who noted
an increase in flexibility scores in second and third graders, the age group in the present study. We also compared female and male participants' flexibility scores at the two time
points within each school cluster. Male participants attending Cluster 1 (gender-consistent) schools scored higher on flexibility at Time 3 (Mean = 2.70, SD = 1.17) (t (42) =
2.02, p < .05) than at Time 1 (Mean = 2.26, SD = 1.26). There was a non-significant trend for female participants in Cluster 3 (control) schools to score higher on flexibility at
Time 3 (Mean = 3.53, SD = 1.28) than at Time 1 (Mean = 3.13, SD = 1.53). This suggests that the overall increase in flexibility was attributable more to age-related change
rather than any possible influence of intervention concerts.
Table 1: Mean rank (and standard deviation) of female participants' preferences for playing specific instruments at Time 1, Time 2 and Time 3 according to schools clusters,
and results of comparisons
Cluster 1 Cluster 2 Cluster 3

(Gender-consistent) (Gender-inconsistent) (Control)
Time 1 Time 2 Time 3 Time 1 Time 2 Time 3 Time 1 Time 2 Time 3
Flute 2.92 (1.68) 2.70 (1.59) 2.71 (1.29) 2.77 (1.71) 3.04 (1.86) 3.04 (1.70) 2.61 (1.57) 2.76 (1.56) 2.36 (1.51)
Violin 2.96 (1.14) 3.60 (1.53) a 3.56 (1.52) 2.93 (1.33) 3.30 (1.55) 3.54 (1.55) 3.33 (1.56) * 3.23 (1.60) 2.85 (1.48)
** b
Piano 2.18 (1.48) 2.45 (1.41) 2.19 (1.42) 2.43 (1.52) 3.48 (1.91) aa 2.59 (1.76) bb 2.22 (1.30) 2.67 (1.52) 2.36 (1.38)

Drums 4.80 (1.41) 4.17 (1.75) a 4.43 (1.58) 4.82 (1.47) 3.33 (1.92) aa 4.17 (1.82) bb 4.27 (1.77) 4.13 (1.77) 4.46 (1.67)
*
Guitar 3.49 (1.72) 3.43 (1.67) 3.27 (1.52) 3.71 (1.59) 3.70 (1.37) 3.54 (1.59) 4.15 (1.49) 3.97 (1.50) 4.09 (1.43)
Trumpet 4.65 (1.05) 4.66 (1.26) 4.78 (1.41) 4.36 (1.26) 4.15 (1.41) 4.06 (1.29) 4.24 (1.43) 4.42 (1.39) 4.31 (1.27)
Possible range of scores 1 - 6. Lower scores indicate stronger preferences

ap < .005, aa p < .001 Results from 2 tailed-t-tests comparing Time 1 and Time 2 within school clusters.
*p < .02 ** p < .01 Results from 2 tailed t-tests comparing Time 1 and Time 3 within school clusters.
bp < .02 bb p < .002 Results from 2 tailed t-tests comparing Time 2 and Time 3 within school clusters
Table 2: Mean rank (and standard deviation) of male participants' preferences for playing specific instruments at Time 1, Time 2 and Time 3 according to schools clusters, and
results of comparisons
Cluster 1 Cluster 2 Cluster 3

(Gender-consistent) (Gender-inconsistent) (Control)
Time 1 Time 2 Time 3 Time 1 Time 2 Time 3 Time 1 Time 2 Time 3
Flute 4.49 (1.29) 4.74 (1.37) 4.64 (1.24) 4.28 (1.65) 4.18 (1.55) 4.60 (1.40)bb 4.28 (1.64) 4.35 (1.69) 4.24 (1.54)
Violin 4.65 (1.30) 4.89 (1.20) 5.02 (1.17) 4.43 (1.43) 4.56 (1.44) 4.34 (1.45) 4.61 (1.32) 4.56 (1.35) 4.79 (1.37)
Piano 3.12 (1.54) 3.75 (1.44) 3.02 (1.61)bb 3.91 (1.59) 4.07 (1.59) 3.75 (1.48) 3.40 (1.79) 3.41 (1.68) 3.37 (1.62)
Drums 2.65 (1.39) 1.87 (1.06) a 2.43 (1.45) b 2.78 (1.72) 2.19 (1.59) aa 2.50 (1.63) 2.75 (1.56) 2.76 (1.71) 2.58 (1.59)
Guitar 2.16 (1.56) 2.38 (1.41) 2.09 (1.18) 2.33 (1.37) 2.86 (1.44) a 2.64 (1.63) 2.57 (1.56) 2.39 (1.24) 2.58 (1.49)

Trumpet 3.71 (1.56) 3.24 (1.38) 3.52 (1.33) 3.12 (1.35) 3.02 (1.41) 3.12 (1.57) 3.89 (1.39) 3.52 (1.47) 3.42 (1.45)
Possible range of scores 1 - 6. Lower scores indicate stronger preferences

ap < .005, aa p < .001 Results from 2 tailed-t-tests comparing Time 1 and Time 2 within school clusters
bp < .05 bb p < .02 Results from 2 tailed t-tests comparing Time 2 and Time 3 within school clusters
Discussion
We reported previously that there was an immediate effect of providing counter-stereotypical role-models on boys' and girls' preferences for gender-consistent and
gender-inconsistent instruments (see Harrison & O'Neill, in press, for discussion of results). The present study indicated some lasting influence of the intervention concerts on
the children's instrument preferences. Boys who saw a woman play guitar continued to like it less, and liked violin less regardless of whether they had seen a man or woman
play it. The increase in their liking for drums did not last. Girls still liked violin and drums more (after seeing a man or woman play the instruments), and flute less (after seeing
a man play it) seven months after the concerts. However, the decrease in girls' liking for piano (after seeing a man play it) did not last.
These results suggest that the decrease in liking for same-sex instruments did to some extent endure over seven months. Thus, we found more enduring effects of providing
role-models compared to other studies designed to challenge children's gender-typed beliefs. However, our study suggests that exposing children to counter-stereotypic
role-models is not necessarily effective in either the short- or long-term. For example, girls actually liked flute less after seeing a man play it, and this effect was still evident
seven months later. Other studies in the literature have reported negative effects of such strategies. For example, eighth grade boys expressed more stereotypical attitudes about
women's roles after watching counter-stereotypic advertisements compared to stereotypic commercials. Guttenbrag and Bray (1976) presented participants with films and
stories about women with counter-stereotypic roles. Whilst female participants' stereotypic attitudes decreased significantly, male participants in fifth and ninth grade showed
more stereotypical attitudes following presentation of the role-models.
Our data suggests that children were more likely to change their preferences for instruments rather than the stereotype itself. Thus, findings indicate that presenting
counter-stereotypic role-models may be an ineffective strategy for challenging children's gender-stereotyped beliefs. More research is needed before we will have a better
understanding of the best way to present instruments to children if we are to assist them in overcoming the tendency to restrict instrument choice along grounds of gender. It
may be that role-models should be presented separately in single-sex teams to overcome the negative effect we observed in our study (see also Matteson, 1991). In the
mainstream gender literature, longer and more interactive intervention studies have been more successful (e.g., Koblinsky & Sugawara, 1978; Weeks & Porter, 1983). Despite
various methodological problems, Tarnoswki's (1993) study suggested that gender-neutral instrument presentation workshops (over a period of eight weeks) may be beneficial
in encouraging children to hold gender-neutral beliefs about the gendered nature of musical instruments. We propose that it may not be enough to merely expose children
passively to musicians that either do or do not conform to their stereotypes, rather it may be necessary for children to engage actively in considering the role of gender in
playing instruments through special programmes designed to raise their awareness and develop more gender-neutral attitudes.
References
Abeles, H. F., & Porter, S. Y. (1978). The sex-stereotyping of musical instruments. Journal of Research in Music Education, 26, 65-75.
Bruce, R., & Kemp, A. (1993). Sex-stereotyping in children's preferences for musical instruments. British Journal of Music Education, 10, 213-217.
Carter, B., & Patterson, C. J. (1982). Sex roles as social conventions: the development of children's conceptions of sex-role stereotypes. Developmental Psychology, 18,

812-824.
Delzell, J. K., & Leppla, D. A. (1992). Gender associations of musical instruments and preferences of fourth-grade students for selected instruments. Journal of Research in
DiLeo, J. C., Moely, B. E., & Sulzer, J. L. (1979). Frequency and modifiability of children's preferences for sex-typed toys, games, and occupations. Child Study Journal, 9(2),
141-159.
Drabman, R. S., Robertson, S. J., Patterson, J. N., Jarvie, G. J., Hammer, D., & Cordua, G. (1981). Children's perceptions of media-portrayed sex roles. Sex Roles, 7(4),
379-389.
Guttenbrag, M., & Bray, H. (1976). Undoing sex stereotypes: Research and resources for educators. New York: McGraw-Hill.
Harrison, A. C., & O'Neill, S. A. (in press). Children' gender-typed preferences for musical instruments: an intervention study. Psychology of Music.
Huston, A. C. (1983). Sex-typing. In E. M. Hetheringon (Ed.), Handbook of child psychology: Socialization, personality and social development (Vol. 4, pp. 288-467). New
York: Wiley.
Katz, P. A., & Boswell, S. (1986). Flexibility and traditionality in children's gender roles. Genetic, Social, and General Psychology Monographs, 112(1), 103-147.
Kelly, S. N. (1997). An investigation of the influence of timbre on gender and instrument association. Contributions to music education, 24(1), 43-56.
Koblinsky, S. A., & Sugawara, A. I. (1984). Non-sexist curricula, sex of teacher and children's sex-role learning. Sex Roles, 10(5/6), 357-367.
Matteson, D. R. (1991). Attempting to change sex role attitudes in adolescents: explorations of reverse effects. Adolescence, 26(104), 885-898.
Mischel, W. (1970). Sex typing and socialization. In P. H. Mussen (Ed.), Carmichael's handbook of child psychology (Vol. 2, pp. 3-72). New York: Wiley.
O'Neill, S. A., & Boulton, M. J. (1996). Boys' and girls' preferences for musical instruments: a function of gender? Psychology of Music, 24, 171-183.
Pickering, S., & Repacholi, B. (1999). The effect of musician gender on children's instrument preferences. Paper presented at the Society for Research in Child Development
Biennial Meeting, Albuquerque.
Serbin, L.A., Powlishta, K.K., & Gulko, J. (1993) The development of sex-typing in middle childhood. Monographs of the Society for Research in Child Development, 58.
Signorella, M. L., Bigler, R. S., & Liben, L. S. (1993). Development differences in children's gender schemata about others: a meta-analytic review. Developmental Review, 13,
147-183.
Tarnowski, S. M. (1993). Gender bias and musical instrument preference. Update: The Applications of Research in Music Education, 12(1), 14-21.
Weeks, O. N. M., & Pryor Porter, E. (1983). A second look at the impact of nontraditional vocational role models and curriculum on the vocational role preferences of
kindergarten children. Journal of Vocational Behaviour, 23, 64-71.
Williams, F., LaRose, R., & Frost, F. (1981). Children, television and sex-role stereotyping. New York: Praeger Publishers.

Back to index

Synchronising a Motor Response with a Visual Event: The Perception of Temporal Information in a Conductor's Gestures
Proceedings paper
Synchronising a Motor Response with a Visual Event:

The Perception of Temporal Information in a Conductor's
Gestures
Geoff Luck,
Keele University.
Introduction
The conductor of a classical instrumental ensemble traditionally uses a small white baton to describe
certain patterns that represent the rhythm of the music being played by the ensemble. Each rhythm has a
different pattern, and these patterns can be modified to indicate expressive features of the music (Rudolf,
1980).
For the last 600 years or so, Western music has been largely polyphonic, resulting in a need for the
different members of an ensemble to accurately synchronise their individual performances both with each
other and with the conductor in order to produce a satisfactory "ensemble" performance (Rasch, 1979).
Thus, musicians playing under a conductor's direction must be proficient at discerning which features of
the conductor's gestures indicate where the beat is located. Likewise, the conductor must be proficient at
showing the precise location of the beat.
Gilden (1991), Stetson (1905), and Wood (1995) note that the visual impression of a rhythmic pule or
"beat" may be produced by a conductor using a gesture in which the velocity structure of the movement is
varied along its path such that the point of maximum velocity is accompanied by a change of direction in
the trajectory of the movement. Clayton (1986) offers a similar description, but adds that in his studies the
beat point nearly always coincided with the baton positional minima, that is, the lowest point of the baton
trajectory.
With regards the 'style' of gestures used by conductors, in a series of studies investigating two-beat
('up-down') gestures, Stetson (1905) concluded that the conductor's downward beat-stroke was a ballistic
movement, while the upstroke away from the beat was a more controlled movement. This view allows
modification of the gesture, in terms of velocity and trajectory, only in a 'temporal window' during the
movement away from the beat. Wood (1995), on the other hand, suggests that all gestures, including the
beat-stroke, should be controlled movements, thus allowing modification of the gesture during the
formulation of the beat itself. In contrast with Stetson's (1905) view, Wood's (1995) approach implies that
additional information may be conveyed in the conductor's arm movements during the evolution of each
beat which may assist in musicians more accurately prediction the exact instant of each beat. The result of
this ought to be a higher level of synchronisation between musician and conductor.
file:///g|/Mon/Luck.htm (1 of 7) [18/07/2000 00:32:57]

Musicians' ability to synchronise with a conductor has been investigated by Clayton (1986), who, in a
series of studies, systematically controlled the presence or absence of the visual (conductor) stimulus, the
auditory (ensemble) stimulus, or the written music, in order to investigate the relative contribution of these
sources of timing information to an ensemble's ability to play "together". He found that, except under
certain circumstances, the conductor provided general, as opposed to specific, timing information, while
ensemble exerted the most overall influence over the synchronicity of the players. Overall, participants
revealed a mean asynchrony of 26ms between their responses and the conductor's gestures.
Assuming, then, that musicians do look to the conductor for at least some timing information, can
sensitivity to such information be increased with practice? Runeson and colleagues (Runeson, 1984;
Runeson & Frykholm, 1981, 1983) suggest that sensitivity to the kinematics of human movement may be
heightened through experience. 'Kinematics' is here defined as, "the changes in the optic array that occur
when an object moves...Spatial position, velocity, acceleration, and all other derivatives of a motion path
may be regarded as kinematic variables." (Gilden, 1991, p.555). Furthermore, Rubel (1985) and Tees
(1994), for example, suggest that visual and auditory sensory thresholds, and intersensory responsiveness,
are affected by experience. This research, therefore, implies that performance on tasks involving
temporally based, multimodal events, such as synchronising a motor response to a visual event, is likely to
improve through long-term experience.
The research reviewed above prompted several questions to be asked, and a prediction regarding the
outcome of each question to be made:
1. How accurately can people synchronise a motor response with a conductor's demarcation of the beat
when only information regarding the movement of the baton is available? It was predicted that
temporal synchronisation between participants' responses and the conductor's beat would be
somewhat poorer than Clayton's (1986) findings.
2. Does the addition of temporal information in the form of the kinematics of the conductor's elbow and
wrist movements to the trajectory of the baton affect the accuracy with which people can
synchronise a motor response with the conductor's beat? It was predicted that synchronisation under
these conditions would be at least as good as Clayton (1986) reported.
3. Does amount of experience of playing under a conductor's direction affect the accuracy with which
people are able to synchronise their response with the conductor's beat? It was predicted that those
with higher levels of experience would be able to achieve a higher level of synchrony with the
conductor's beat than those with lower levels of experience.
Method
The Conductor
A single, professional conductor, henceforth referred to as 'MD', was the conductor in all stimuli.
The Participants
32 participants, 8 males and 24 females, took part in this study. All participants were considered to be
'musicians', although their experience under a conductor's direction varied from very little to rather
considerable.
The Stimuli
Participants watched a sequence of digital video clips, each of which showed a conductor either beating a
single upbeat and downbeat at one of three tempi (slow, medium, or fast), or beating an upbeat followed by
a series of beats at varying tempi. The conductor was filmed as if from the 'cello section' of an orchestra.
There were two versions of each clip: The full-cue version showed the full image of both the conductor's

movements and those of the baton; the baton-only version showed only the movements of the baton, the
conductor's limb movements having been digitally removed from the image. Each participant saw either
the baton-only or the full-cue stimulus, and each condition was comprised of a total of 22 different clips,
containing a total of 129 beats.
Apparatus and Materials
Computer Hardware. Participants viewed the stimuli on a Personal Computer, and responded to each beat
by pressing the left button of a computer mouse. The mouse chosen was selected for its particularly loud
and positive 'click' so as to provide participants with explicit auditory feedback concerning the moment at
which they responded.
Questionnaire. Each participant completed a questionnaire as part of this study, the purpose of which was
to elicit background information regarding, for example, the total amount of experience they had under a
conductor's direction in general, and whether they had any experience under MD's direction.
Procedure
Participants were tested individually, and were required to press the left mouse button in time with the
conductor's demarcation of the beat, as if they were using the mouse to 'play a note' on each beat.
Participants were randomly selected to receive either the baton-only or full-cue stimulus, though
presentation order of all clips was the same for all participants. After responding to the visual stimuli,
participants were then asked to complete the post-test questionnaire. Once this was completed the
experiment came to an end, participants were informed of the purposes of the study, and thanked for their
time.
Results
Results are presented with reference to each of the three research questions in turn.
1. How accurately can people synchronise a motor response with a conductor's demarcation of the
beat when only information regarding the movement of the baton is available?
Table 1 shows the percentages of early, accurate, and late responses, and overall mean response, by
participants in each condition. As can be seen, only 33.3% of beats in the baton-only condition
received a response that might be considered accurate, while the majority of beats received a late
response. The average response by participants who received the baton-only stimulus was 82ms late.
In line with the prediction made for research question 1, this figure is somewhat poorer than that
reported by Clayton (1986).
Table 1. Percentages of early, accurate, and late responses, and overall mean response, by
participants in each condition.
Early Accurate Late Mean response
Baton-only 0.8% 33.3% 65.9% +82ms
Full-cue 0% 23.3% 76.7% +84.67ms
2. Does the addition of temporal information in the form of the kinematics of the conductor's elbow and
wrist movements to the trajectory of the baton affect the accuracy with which people can synchronise
a motor response with the conductor's beat?

Again with reference to table 1, it can be seen that participants in the full-cue condition responded
accurately to only 23.3% of beats, while all other beats received a late response. Overall, the average
response of full-cue participants was late by 84.67ms − a similar, though slightly higher figure
compared to baton-only participants. Thus, the prediction made regarding this question was not
supported by the data. Full-cue participants in fact demonstrated slightly poorer levels of
synchronisation than baton-only participants.
3. Does amount of experience of playing under a conductor's direction affect the accuracy with which
people are able to synchronise their response with the conductor's beat?
Table 2 shows the percentage of beats that received an accurate response by each experience group,
between conditions. Overall, it was found that total number of years experience following a conductor was
negatively related to a person's ability to synchronise a motor response with the conductor's beat.
Table 2. Percentage of beats that received an accurate response by each experience group, between
conditions.
Low exp group Med exp group High exp group
Baton-only 71.3% 51.9% 59.7%
Full-cue 55.8% 69.8% 41.9%
Averaged across 63.55% 60.85% 50.8%

conditions
In addition, as table 3 shows, experience under the specific direction of MD, the conductor used in the
stimuli, was also negatively related to levels of synchronisation.
Table 3. Percentage of beats that received an accurate mean response by each MD experience group,
between conditions.
MDYes MDNo
Baton-only 44.2% 50.8%
Full-cue 37.2% 51.6%
Averaged across conditions 40.7% 51.2%
These results, then, do not support the prediction made for research question 3.
Discussion
Summary of results
To summarise, then: Participants demonstrated a general tendency to respond late; in addition, full-cue
participants demonstrated slightly lower levels of accuracy compared to baton-only participants; finally,

amount of experience under a conductor's direction in general, and under the specific direction of MD, was
negatively related to synchronisation ability. So, how might these findings be explained?
Possible strategies used by participants when attempting to synchronise with the conductor:
'Pick a point and stick to it'
Clayton (1986) suggests that people apply a two-stage strategy when attempting to coordinate a series of
motor responses with a series of visual events. The first stage involves selecting a somewhat arbitrary
asynchrony between stimulus and response, while the second stage involves replicating this offset under
limitation imposed by, for example, differences in ability to track visual versus auditory stimuli (e.g.,
Bartlett & Bartlett, 1959), and variability in motor control/response (e.g., Trew & Everett, 1997). Clayton
(1986), therefore, suggests that the level of synchronisation achieved by an individual is somewhat
arbitrary. Another approach is taken by advocates of the Evaluation Hypothesis
Evaluation Hypothesis
It has been observed that synchronisation tasks typically reveal a systematic negative asynchrony between
the stimulus and participants' responses − that is, responses tend to precede stimuli by a few tens of
milliseconds (Vos, Mates & van Kruysbergen, 1995). Vos & Helsper (1992) suggest that this effect arises
as a result of an asymmetric evaluation function such that participants would rather respond a little early
than risk responding late.
It might be suggested, however, that musicians favour a late response over an early one, resulting in
responses which 'drag' behind the stimulus. Such an evaluation might be learnt, for example, from playing
in ensembles, where it is discovered that 'playing late' is the safer option if you are not sure about the
placement, or quality, of your entry. If such an asymmetric evaluation was learnt through experience, one
might expect those with the most experience playing in conducted ensemble to respond late more often
than those with less such experience. Indeed, this is exactly what was found in the present study.
Experience, Negative Asynchrony, and Accuracy

Another explanation for this finding is that higher levels of accuracy among less experienced individuals
may be an artifact arising from an interaction between experience level and an anticipated attempt to 'play
late'.
Both Bartlett & Bartlett (1959) and Clayton (1986) suggest that the negative asynchrony effect mentioned
previously is particularly associated with participants with less experience at synchronisation tasks. It
might be suggested, therefore, that musicians with less experience under the direction of a conductor − that
is, with less experience of coordinating a motor response with a visual stimulus − will be more likely to
demonstrate higher levels of negative asynchrony compared to those with more experience. However, since
an increase in negative asynchrony is equivalent to a less accurate response, this does not fit with the
present data which suggested that less experienced participants were more accurate.
However, an increase in negative asynchrony would result in increased accuracy if participants were
aiming to synchronise with a point which occurred after the beat had been shown, but anticipated this point
and responded closer to the actual beat-point. This is supported by the fact that, in the present study, the
most number of accurate responses were recorded by participants with the least amount of experience under
a conductor's direction and who received the least amount of information − the baton-only stimulus − a
situation, it might be suggested, which could have resulted in even greater anticipatory tendencies.

Better Performance by Baton-only Participants

As far as the rather counter-intuitive finding that baton-only participants were slightly more accurate than
full-cue participants, it may be suggested that, as mentioned above, the lack of information in the
baton-only stimulus caused participants to anticipate their chosen, somewhat late point of synchronisation,
thus shifting their responses 'to the left', and resulting in a series of more accurate responses. This may have
combined with the experience-related anticipatory tendencies mentioned above.
Style of Beat
Overall, visual perception of a conductor's arm movements per se may not be as important as the 'style' of
the baton gesture that results from a given style of arm movement produced by particular combinations of
relative order and degree of joint mobilisation. In support of this idea, both Boult (1963) and McElheren
(1966) advocate the use of smooth and predictable gestures to help players more accurately predict the
position of the next beat, and also 'feel where they are' between beats.
It may be suggested, then, that as long as at least the baton is visible, the ability of an ensemble to
synchronise with the conductor may depend upon the conductor's ability to conduct in a particular, perhaps
somewhat smooth style, combined with experience-related attempts to respond after the beat, coupled with
experience-induced anticipatory tendencies.
Conclusion
In conclusion, it might be encouraging to consider the general lack of ability of participants in the present
study to synchronise a motor response with the conductor's demarcation of the beat in light of an
observation by Sternberg & Knoll (1994): "Regardless of their inability to be on target with the required
experimental tasks....expert musicians will nevertheless perform as skilled soloists and/or as a fine
ensemble in a musical setting. In other words, they will all come in slightly early or slightly late, but they
will all make their musical entrances together, as a group." (p.241).
References
Bartlett, N. R. & Bartlett, S. C. (1959). Synchronisation of a motor response with an anticipated sensory event. The
Psychological Review, 66(4), 203-218.
Boult, A. (1963). Thoughts on Conducting. London; Phoenix House.
Clayton, A. M. H. (1986). Coordination between players in musical performance. Unpublished PhD Thesis; Edinburgh
University.
Gilden, D. L. (1991). On the origins of dynamical awareness. Psychological Review, 98(4), 554-568.
McElheren, B. (1966). Conducting Technique for Beginners and Professionals. N.Y.; O.U.P.
Rasch. R. A. (1979). Synchronisation in performed ensemble music. Acustica, 43, 121-131.

Rubel, E. W. (1985). Auditory system development. In: G. Gottlieb & N. A. Krasnegor (Eds.), Measurement of Audition and
Vision in the First Year of Postnatal Life: A Methodological Overview (pp. 53-86). Norwood, NJ; Ablex.
Rudolf, M. (1980). The Grammar of Conducting: A Practical Guide to Baton Technique and Orchestral Interpretation
(Second Edition). London: Collier Macmillan Publishers.

Runeson, S. (1984). Perceiving people through their movements. In: Kirkaldy, B. (Ed.), Individual Differences in Movement.
Lancaster: M.T.P. Press.
Runeson, S. & Frykholm, G. (1981). Visual perception of lifted weight. Journal of Experimental Psychology: Human
Runeson, S. & Frykholm, G. (1983). Kinematic specification of dynamics as an informational basis for person and action
perception: expectation, gender recognition, and deceptive intention. Journal of Experimental Psychology: General, 112,
585-615.
Sternberg, S. & Knoll, R. L. (1994). Perception, production, and imitation of time ratios by skilled musicians. In Aiello, R. &
Sloboda, J. A. (Eds) Musical Perceptions. Oxford U. P.; Oxford
Stetson, R. H. (1905). A motor theory of rhythm and discrete succession: Parts I and II. Psychological Review, 12, 250-270,
293-350.
Tees, R. C. (1994). Early stimulation history, the cortex, and intersensory functioning in infrahumans: Space and time. In: D.
J. Lewkowicz & R. Lickliter (Eds.), Development of Intersensory Perception: Comparative Perspectives. Hillsdale, NJ;
Erlbaum.
Trew, M. & Everet, T. (1997). Human Movement: An Introductory Text (3rd Ed.). Churchill Livingstone; London.
Vos, P. G. & Helsper, E. L. (1992). Tracking simple rhythms: On-beat versus off-beat performance. In: F. Macar, V.
Pouthas, & W. J. Friedman (Eds), Time, action and cognition. Dordrecht; Kluwer.
Vos, P. G., Mates, J. & van Kruysbergen, N. W. (1995). The perceptual centre of a stimulus as the cue for synchronisation to
a metronome: Evidence from asynchronies. The Quarterly Journal of Experimental Psychology, 45A(4), 1024-1040.
Wood, M. (1995). Notes on expressive gestures: Observing and quantifying the 'character' of movement. Unpublished
Research Manuscript.
Back to index

M
Keynote abstract
M. Baroni
Meaning in Music
The paper is divided into three parts: in the first part the problem of the difference between verbal and
musical meanings is considered. Examples are given of allusions to human experiences that music can
convey: the main areas being speech, physical gestures, aspects of cultural habits and synaesthetic
sensations. In the second part, the idea of musical grammar is discussed from two particular points of
view: grammar is a set of rules that govern the organisation of musical structures; grammar is also a
set of conventions that allow the interpretation of musical structures as allusions to human experience.
The main aspect of such allusions is that of "emotional schemes". Musical language is able to mix and
dose musical structures in order to obtain meanings not even existing before their specific musical
expression. In the third part of the paper the problem of verbal interpretation of musical meaning is
discussed. The conceptual resources of verbal language are not always adequate to describe the
substantial contents of a piece of music.
Introduction
To speak of meaning is one of the most complex and controversial issues in the field of the
philosophy of music. I do not see any point in going into the history of this concept because all the
participants in this Conference are undoubtedly very familiar with this area. Nor do I intend to speak
about the complex history of the discussions about the relationships between musical and verbal
languages. Some of the protagonists of this Conference such as John Sloboda (1985) and Erik Clarke
(1989) have offered well known contributions to such debate. To introduce the problem I will limit
myself to saying that the meaning of the word "meaning", when applied to music, can become
extremely ambiguous. Together with many other words derived from the linguistic area, it risks
becoming a mere metaphor, losing its properties of conceptual definition. The meaning of a word
does, in fact, have certain precise aims. First of all to definine categories of objects or events which
are part of ordinary human experience. Secondly, to distinguish one concept from other similar ones.
In other words, the main aims of a language are to reduce the possibilities of ambiguity, to codify the
relationships between the meaning and the phonetic form of the words, and to limit, whenever
possible, the changes in such codified relationships. For these reasons the words of a given language
are stable, are finite in number and have a lexical meaning given by the dictionary.
In music none of these conditions exists: its meanings, even though they can be interpreted, are not
finite in number, since they change from composition to composition, and for this reason music does
not have a lexicon; in addition, the meanings are not aimed at categorizing events or objects, but
rather at evoking living experiences also linked to emotional conditions. In other words, music does
not have real semantic properties, and does not have real meanings, if we are to use the two terms in
their proper linguistic sense. But in everyday language, and also in scientific conferences, we insist on
adopting these words, apparently without any particular difficulties. In some cases, and in other
file:///g|/Mon/Baroni.htm (1 of 11) [18/07/2000 00:33:00]

M
languages, a sort of search for alternative words can be observed: signification is used in French, or
senso in Italian. In any case the same problem arises: that of understanding more precisely what we
are referring to when we speak of musical meaning and consequently what the difference is between
musical and linguistic meaning.
In order to discuss this problem adequately three conditions are necessary: firstly to define the
non-conceptual nature of musical meaning, secondly to describe its relationships with musical
structure and finally to indicate to what extent it is possible to use verbal language in order to describe
the meaning of a piece of music. My speech is therefore divided into three parts, each of them devoted
to one of these three conditions.
The nature of the musical meaning

Singing and dancing are universally spread functions assigned to music. A well known body of
anthropological research (Molino 1975) discovered that the idea of music as an autonomous
phenomenon solely linked to hearing, is almost exclusive to Western culture: the ancient Greek
musikè meant a multiplicity of activities and the same can be said of many analogous African words.
The crucial moment of this Western idea of music was reached in the XIX century with the German
philosophical concept of Absolute Musik. According to this concept music was absolute at least in two
respects: first of all because it had to be "contemplated" in itself, as an aesthetical object, in silence
and immobility; secondly because it was exclusively instrumental without any interference from
words, images or movement. It was in this context, in order to explain the intriguing event of a deep
emotional experience which may arise in the absence of any references to objects external to sound
itself, that the problem of giving a meaning to music emerged. The genial booklet by Hanslick (1854)
was one of the first to propose answers to this problem.
The practice of "absolute" instrumental music, however, had its beginnings two centuries before, in
Italian baroque music. In this different cultural context it is perhaps possible to find some initial
explanations to some of the future problems. The style of Italian baroque instrumental music came
from two main sources: vocal music and dance music. Polyphonic models such as the Ricercare and
the Fugue derive from vocal church music, from which they inherit the solemnity and severity as a
sort of emotional and behavioural memory. Other instrumental genres such as organ Toccatas
(Darbellay 1977) and violin Sonate a tre (Piperno 1990) have their main melodic models in the
"recitar cantando": features such as cadences, culminations, suspensions, rallentandos, accelerandos,
accents, and so on, were all originally means of "imitating" the pronounciation of the words (Galilei
1581), but their conventional meanings are retained, even when the words are absent. The rhetorical
organisation of many pieces of vocal music was based on the needs inherent in declaiming a long
verbal text, as is the case in many works by Monteverdi. The same rules of time organisation passed
directly into instrumental music where they became formal conventions. Another important
characteristic of the first examples of purley instrumental music is their tendency towards metrical
regularity. In this case the models are to be found in dance music, and their typical genres are those of
the Sonata da camera.
The main structural problems faced by the composers of this epoch consisted of how to insert
expressive melodies into a grid of regular pulsations, to find efficient schemes of harmonic rhythm
and of melodic phrasing, and at the same time give an instrumental composition a coherent temporal
organisation. Instrumental music was based on schemes of vocal and physical gestures but was
technically elaborated in a particularly complex new synthesis. In the original models of vocal and
dance music, the ways of interpreting the meanings of musical language were obviously suggested by

M
the words and gestures themselves. In the new situation they were conventionally assigned to the
corresponding structural events and this heritage was unconsciously passed on to the musical
competence of many generations of listeners and of composers. The intuitions of Roland Barthes
(1953) about the passage of stylistic schemes from one text to other texts, or the analyses of Kofi
Agawu in the more specific musical system of classical style (1991), that is, the so-called
"intertextual" theories also applicable to music, may help to give the above-mentioned stylistical
transformations a wider theoretical context. Observed from this point of view, the problem acquires
further sense: the meanings assigned to the structural features of instrumental music were able to
achieve significance because they passed from composer to composer and from composers to
listeners; in other words, because they became a social phenomenon and implied a wide circulation of
experiences. Although these socially accepted conventions may not always have been explicit or
conscious, they nevertheless had no difficulties in passing from mind to mind and from epoch to
epoch. A great number of cues, observable in the writings of musicians and musicologists of the XIX
century (Bent 1994), show that such gestual conventions were still present when the idea of absolute
music imposed its rights on European aesthetics. Scientific research devoted to the problem of the
relationships between physical gestures and music (Trevarthen 1999-2000; Krumhansl-Schenck 1997)
confirms the physiological and psychological depth of the phenomenon and, for this reason, can help
explain why such phenomena were able to remain as an ininterrupted tradition for so many centuries.
Another particularly important source of musical meaning is to be found in the traditional
connotations assigned to different instruments of the orchestra, mainly, although not exclusively,
linked to their historical and anthropological origins. For example, the pastoral heritage of the sound
of the oboe, the military associations of the trumpet or those of hunting linked to the horn, have been
frequently used in order to convey particular meanings based on socially accepted conventions of
interpretation. Conventions like these derive from geographical conditions: a particular use of the
guitar evokes Spain and many instrumental timbres are related to specific cultures: Sardinia, Sicily,
India, Japan and so on. Such "geographical" evocations are not restricted to instrumental timbres, but
also to a number of other features such as an exotic scale or a particular form of cadence (the falling
fourth for Russian music). Gino Stefani (1987), moreover, observed that many social practices
(marching, dancing and other ritual behaviour linked to music, as well as to gestures, or to ideological
symbols) left their traces in instrumental music and in the conventions of interpretation which had
been passed on socially.
The instrumental timbres, however, have other sources of meaning, not determined by their origins,
but by their intrinsic qualities. The famous Traité d'orchestration by Hector Berlioz (1843) is a real
mine of information in this respect: the sound of the flute, for example, is ...(qui trovare esempi). The
quoted examples and most af all specific scientific research devoted to the problem (for example
Bismarck 1974, Erickson 1975) show that in Western musical tradition the perception of sound is
synaesthetically linked to sensations of a visual (darkness, brightness) or tactual nature (delicate,
rough, hard), to physical efforts (heavy, light) or dimensions (tiny, huge). Analogous observations
have been made by linguistic research in the field of so-called "phonetic symbolism" (Dogana 1983)
but in the case of music the treatment of texture, well known in analytical practice, can multiply these
synaesthetic effects, for example by means of accumulations and rarefactions, densities and
transparencies, ascents and descents. Sensations of force or softness are obviously due to quite
analogous qualities of dynamics.
From this brief overview it can be seen that the the nature of musical meaning can be linked to a
multiplicity of fields of experience. It is also apparent that musical meaning has neither the function of
dividing the events of the world into conceptual categories, nor the possibility of alluding to all
aspects of human experience, many of which seem to be outside of its domain. But it is important to

M
emphasize that the relationships between musical meanings and the musical structures that have the
task of conveying them, are not left to individual preferences, but are actually governed by socially
spread conventions, even though they are learnt simply by exposition to hearing. The latter condition
explains, among other things, why young children are able to interpret some elementary aspects of
music, an opportunity that is exploited in music education (Tafuri 1987). More generally speaking, it
is plausible to deduce from this mass of data and cues, that music may be conceived as a sort of
language and may therefore have certain linguistic properties, above all that of being a semiotic
system. So, in semiotic terms, it might be said to possess "signifiers" (Saussure 1922) or a "level of
expression" (Hjelmslev 1961) which must be able to evoke, or must be linked to, the "signified" or a
level of "content". A specific study devoted to the psychological nature of such links has not yet been
made, but we can tentatively observe that they are characterised by aspects of similarity between the
structure of the sounds and that of the human experiences to which the sounds allude. This is one of
the reasons why the terms "allusion to" or "evocation of" seem to be more apt for music than the term
"meaning". In verbal language this relationship is normally defined as "arbitrary" because there is no
similarity between the signifier and signified. Only some kinds of words (e.g. onomatopoeias) are
linked to their meanings through relationships of similarity, that is, in a "motivated" form. In music
the majority of the relationships is to be conceived as "motivated". The next section will try to explain
what the structural rules of this musical "language" are and how they can produce forms of
"motivated" similarity through the aspects of meaning to which they are able to allude.
Musical grammar
The author of the present paper, together with two colleagues, has recently published (Baroni,
Dalmonte, Jacoboni 1999) the results of an investigation devoted to the systematic study of the rules
of a particular style of music. Since such results are pertinent to our discussion about musical
semantics, it is necessary to make a short digression at this point in order to give information about
the aims and the contents of the research in question. The repertoire chosen for the study were arias
taken from the chamber cantatas of an Italian musician from the XVII century, Giovanni Legrenzi.
The analysis took its starting point from the observation that a number of musical events always
involved the same structural features, in the same order and in the same quantity: one example of such
an event is the dimension of the phrases, others are the forms of the cadences, the correspondence
between verbal and metrical accents, the nature of the harmonic sequences, and so on. All seemed to
happen as if the composer of the music had been following particular rules and respecting them while
composing. This is nothing new: the aim of analysis is always to find regularities, and a composer
always follows rules when composing. But what exactly is the nature of such rules? Are they
describable? A particular problem arose at this point: while the structural regularities are always
concretely observable in a text, the rules are not in the text, but in the mind of the composer. From an
analytical point of view they are mere hypotheses. The musical situation is quite similar to that of
linguistic grammar. In linguistic practice the only way to validate the hypothesis of the empirical,
psychological existence of the rules of the grammar is to demonstrate the possibility of their
producing phrases that speakers of the language can judge grammatically correct. In our case the
grammar ought, in theory, to produce music that a competent listener might judge as corresponding
reasonably to Legrenzi's arias. So we decided to construct a system of hypothetical rules able to
"describe" exhaustively the structures used in the repertoire, and to feed the rules into software
producing "Legrenzian" music.
We expected that the resulting arias could be judged musically "correct". A judgement of
"grammatical correctness", however, soon appeared implausible: what is correctness in music? Only

M
the prohibition of parallel fifths and octaves, and other such prescriptions? The results of the computer
taught us that in music an "ordinary" or "daily" language does not exist as distinct from the "artistic"
one, and that all rules have exclusively stylistic aims. More precisely this means that another
fundamental difference exists between language and music: the former has a phonetic system aimed at
producing lexical items, linked to conceptual meanings, and has syntactical rules governing the
relationships between the lexical items of a phrase. The "grammatical correctness" depends on this
specific system of rules which are applied in ordinary language. Only outside this use can a language
convey other "meanings" aimed at evoking living experiences and not at producing conceptual
communication: meanings evoked for example by the rhythm of the discourse, by its phonetic
properties, by the use of connotations, by all the resources of the poetical uses of the language.
Grammatical correctness refers to the first of the two levels, that of daily language, and not to the
second "poetical" level. Music, however, possesses exclusively this second level. This does not mean,
however, that music does not have rules and does not have syntax. It means, though, that its rules and
syntax have different functions from that of ordinary language, and that the concept of "grammar" is
different in the two systems: when referred to music it does not have the same meaning. In any case
we decided to accept this transposition and to use the same term, in order to emphasize that musical
rules are widely based on distinct entities. Only well definable features are, in fact, considered in
musical grammar: metrical accents, durations, pitches, intervals, chords, scales, degrees of the scales,
and so on. This renders its rules exactly measurable in the same way that the rules of functionally
different linguistic grammar are exactly measurable.
The most interesting things we learned from the computer outputs were that the texts mechanically
produced by the machine not only had structures similar to those of the initial repertoire, but also
gradually tended to reproduce its expressive characters: the computer arias are not arbitrary
successions of notes, but are compositions that can make sense to a listener, and that in some cases
he/she might even interpret as being "true" arias of the XVII century. This singular phenomenon gives
a concrete demonstration of the distinction proposed by Umberto Eco (1979) between intentio
auctoris (the expressive intentions of the "empirical" author of a text) and intentio operis (the contents
a text can communicate, independently from the intentions of its author). In our case, it was quite
certain that the composer of the mechanical arias, the computer (which made its choices on the basis
of a series of random numbers), had no expressive intentions. Any such expression that the listener
may have felt was simply included in the rules (intentio operis). The musical "meaning" of our
mechanical arias (the interpretation allowing the listener to find plausible references to human
experiences) did not come from a lexicon but from structural rules, that is, from a system of links
between certain features, that were known and commonly used in Legrenzi's time. It is therefore
plausible to imagine that every epoch and every style has its specific rules, different from those of
other epochs and styles.
But other more specific aspects of musical meaning emerged from the Legrenzian research. The most
important is the distinction between different categories of rules. In our experiment we fed just one set
of rules into the software: those which were common to all the arias of the repertoire. The "expressive
meanings" of the outputs are exclusively the result of these "general" rules. We also identified, in
some detail, but without giving them to the computer, more "specific" rules which are referred to
single fragments of an aria and that modify some aspects of the "general" rules. Normally such
fragments were linked to the words of the lyrics. In one of them, for example, the poet spoke of a
snake in the bosom of the lover (Nutro il serpe nel mio seno) and many aspects of the structure
became unusually tortuous, by means of particular modifications of the ordinary rules; in another, the
words alluded to a heroic situation (Mia ragione all'armi all'armi) and here the music adopted
traditional "battle" models common at the time. In such examples particular rules momentarily took

M
the place of other more "common" ones. We called the former category, " specific expression rules",
and the latter " diffuse expression rules". The difference between the two categories of rules is not just
a question of their more or less extensive application. It is above all a question of their nature: a
diffuse form of expression, the common expressive character of all the arias of Legrenzi, though
undeniably present, is difficult to describe in verbal terms and so its nature, far from appearing
immediately "semantic", seems to be nearer to what R.Jakobson (1970) or N. Ruwet (1972) called
"self-significant". A specific form of expression, on the contrary, is easier to define and has aspects
more similar to those of a conceptual meaning. The distinction corresponds to what, in terms of
semiotic theory, has been called internal and external semantics, respectively (Nattiez 1989) or
congeneric and extrageneric meaning (Coker 1972). An other important category of rules concerns
those which are applied not to the whole repertoire, nor to a small fragment of an aria, but to a single
composition. For example, there are particularly brilliant or particularly sombre arias where the
common rules are applied, but with different percentages of choices for certain rhythmic, melodic,
harmonic traits. In this case we are dealing with "meta-rules" which govern other rules without
changing them. The semantic results of meta-rules are not so well definable as those of "specific
expression", but not so vague as those of "diffuse expression". Their presence shows that the
difference between specific and diffuse expression might be considered a sort of "continuum" and not
a sharp distinction between two separate levels.
It is important to repeat that the choices made by the computer were not oriented by semantic inputs.
This means that the choices were always made randomly from among the permitted rules; in other
words, their only task was to respect the structural possibilities of the grammar and not to look for
particularly meaningful results. The arias composed by the software showed that such rules
automatically ensure interpretable links between different structural features and produce expressive
effects in all listeners who have stylistic competence. Obviously, the results obtained through the
random choices of the machine were by no means original. A listener could accept one or two arias.
But one hundred or one thousand arias (which the computer can easily compose and eventually did
compose) produce a sense of weariness due to the absence of creative interest. The main result of the
experiment, however, is that the simple application of structural rules is able to produce a particular
kind of musical meaning: the arias can be considered "expressive". The "analogical" relationships
with human experiences, in this particular case, are not easy to describe. They may be felt as allusions
to the gestural behaviour of the epoch, to a vague evocation of the laments of an unhappy lover, or to
his tendency to use the lament as an elegant form of seduction. But their verbal interpretation is not
strictly necessary in order to understand that the arias "make sense".
A musical grammar can be defined as a set of expressive possibilities that composers and listeners
have at their disposal: the former in order to compose, the latter in order to interpret music. Such
possibilities imply two different resources: a system of structural rules fixed by the culture of a given
epoch (e.g. more or less explicit and conscious rules of rhythm, melodic structure, counterpoint,
harmony, form, etc.) and, at same time, a system of semantically accepted conventions (even less
explicit and conscious than the former), which allow the listener to interpret musical structures as
analogical allusions to human experiences. Both structural rules and semantic conventions apply to
specific and particular features (or systems of features) of the structure (rhythm, melodic profile,
harmonic sequences, and so on) and not to their global organisation as concretely perceived during a
performance. Their organisation is always left to the free invention of the composer, who has to
respect the structural rules and the semantic conventions but is not obliged to adopt pre-fixed
solutions. He /she must use rules and conventions consisting of different mixtures and doses, in order
to obtain interpretable results which correspond to his/her expressive intentions.
Finally, it should be added that the rules of musical grammar must be well distinguished from the

M
psychological procedures that are to be applied in composing and listening activities. Musical
grammar (like verbal grammar) is a non-temporal structure, an abstract system of rules that we can
learn, we can list and we can describe. But when we use them in making music (or in speaking) we
have to solve new problems: those connected with the organisation of events in time (for example,
problems of perception and of memory) that do not coincide with the rules of the grammar. In
listening activities the system of cues studied by Irène Deliège (e.g. in Deliège-Mélen 1997)
presupposes the knowledge of grammatical rules and their presence in the long term memory, but the
research is devoted to quite different phenomena. The same must be said for the less studied, but no
less important, problem of composing activities (Baroni 1999). In my opinion the presence of the two
distinct domains and their specific relationships (grammar and its application) has not been considered
with due attention by research.
Musical hermeneutics
The fact that the composer is free to mix and dose the structural events and the semantic conventions
without being obliged to adopt pre-fixed solutions has important consequences on the concept of
meaning: in music we never have to deal with unequivocal meanings, but only with a set of different
cues that allow interpretation. Each of such cues is based on non-arbitrary semantic conventions, but
their overall effect can leave margins to partially different interpretations. Within the processes of the
interpretation, however, another problem arises: in many cases it is quite evident that the allusions to
human experiences evoked by music mat be of very different nature. In other words, the "motivated"
relations of similarity between musical form and musical meaning are not arbitrary, but not
unequivocal. For example, two musical phrases organised as antecedent and consequent are thought
of (as implied by their names) as if they were two parts of a sort of logical thinking and their
interpretation can be based on this kind of "meaning". But in other cases (for example in Italian
terminology) they are named "proposal and response", in accordance with the XVIII century idea that
music was a sort of dialogue (Rosen 1971). In the same years, however, Giuseppe Carpani in his
Haydine (1823) also spoke of architectural symmetries that music could create in examples like these;
the idea of tension and distension is extremely common in musicological literature, whereas
mediaeval terminology, in similar situations, used the terms "open" and "closed". What, in short, does
the pairing of such terms mean? How can any coherent interpretation be reached? Are we to consider
dialogues, architectures, gestures, philosophical thinking, or perhaps doors? How is it possible to find
some sort of unity in this irrational conglomeration?
All listeners to music are intuitively well aware of the existence of a coeherence, and scientific
explanations have been proposed to support this fact. For example Michel Imberty (1976) says that
musical listening always implies the use of what Piaget called "affective schemes": this means that a
listener immediately feels that an affective relationship is present among the different objects a music
can evoke, that they are unified by a symbolic reference to a common emotion and are not necessarily
definable in conceptual terms, but perfectly understandable in an intuitive way. In this case the
listener does not use logical thinking (which would reveal the previously mentioned conceptual
inconsistencies) but what Piaget (1945) calls symbolic thinking. And as far as symbolic thinking is
concerned, tension and distension, proposal and response, opening and closure, and so on, can have
the same affective function. This is a way of understanding reality that children normally use when
they play their symbolic games, one that Mediaeval thinking adopted to interpet the signs of heavenly
presence in the world (Eco 1981), that many cultures (including those of industrialized countries)
normally use in a vast variety of different ritual occasions (Firth 1973), that all of us adopt in our daily
life when we use metaphors, and that all forms of art demand from their participants. A number of

M
similar situations are, in fact, well known in the field of psychology. There are theories describing the
"semantics" of emotions that show how the relationships between different affections are easier to
describe through topological schemes than through conceptual definitions (Galati 1993): this means
that an analogous emotional quality can pertain to different and even apparently opposite objects
(Imberty 1979). In other words, the domain of the conceptualization of reality and that of emotional
responses to reality are mutually independent. Thus, the various different interpretations of a piece of
music not only have margins of freedom due to the varying levels of importance attached to its
musical structure, but above all need to be made in a form that is not exclusively conceptual.
After this panoramic look at the nature of musical meaning and the musical rules able to convey
meaning, we can now return to the initial problem of so-called absolute music with some concluding
remarks. Once again the comparison with verbal language can prove useful in this discussion. When
we speak, we all know from the start what we want to say and then look for the right words. In music
it is not necessarily the same: the contents do not always pre-exist the music that manifests them.
Musical creativity can imply that a composer finds particular aggregations of musical features that
have the power of alluding to new significant emotional situations and on this basis composes
interesting new music. So it may often happen that the aesthetic quality of a piece of music is not a
question of finding the right way to convey already known affective schemes, but rather of creating
affective situations which were not previously existent. Such situations, of course, do not possess
words able to designate them: they can be expressed even though they are not yet conceptualized and
perhaps will be never conceptualized. There are many examples of objects and living experiences that
we know very well without being able to name them. It is by no means necessary to give a name to
everything that we live. According to D. Raffmann (1993) there are experiences that do not have a
name, that are in-effable.
S. Davies (1994) used the term "musical emotions" to describe the particular category of musical
meanings that are present only and exclusively in music. When Mendelssohn, in a famous letter to
M.A. Souchay (Oct. 15th 1842), stated that his musical thoughts were much more specific than the
words that could describe them, he was referring exactly to these kinds of musical meanings. This
particular concept of "musical emotion" is the only way of giving sense to the ancient formalistic
theories: when Jakobson and Ruwet said that music signified itself, they made an apparently absurd
statement if "itself" is understood in terms of structures (a note signifies a note). But it is not at all
absurd if "itself " signifies " a content that only music can give", a statement that can, moreover, be
easily extended to all other artistic languages: the meaning of a face in a drawing by Picasso is
produced by those very lines and does not preexist those lines. Thus, more generally speaking, the
problem of explaining the nature of "absolute" music, and the controversies regarding the formalistic
tendencies in the aesthetics of music, are not musical, but verbal problems.
My intention, of course, is not only to explain "absolute music", and formalistic tendencies. The point
of view I intend to assume is more general and primarily based on the distinction between diffuse and
specific expression rules. This continuum of possibilities, this mixture of allusions to not always
"effable" affective schemes and to "effable" images of other better known objects of experience (as
the gestures, the physical sensations, the references to social and cultural practices described in the
first part of this paper) is the ordinary condition of Western music tradition. Often it is note easy to
make clear distintions inside these subtle mixtures of grammatical rules and inside these delicate
dosings of semantic-analogical conventions. So, in many cases the interpetation can become
probematic, particularly when we have to use words in order to interpret musical meanings.
We might define this approach to meaning as a "hermeneutic" one.There is a long philosophical
tradition of hermeneutics as the art of interpreting texts by means of words: initially the term was

M
referred to the Bible, but during the XIX century it was extended to history and to other fields; a
famous and controversial book by Hermann Kretschmar (1903) applied it to music. It would seem
more opportune to adopt this term instead of the more common "interpretation", since the latter is
normally reserved to musical performance. It can be observed, however, that there is no difference in
the nature of the two kinds of interpretation: both of them are efforts to recognise the deep nature of
musical meanings. The only difference concerns the way of manifesting them: the possibility of using
sounds themselves is much more subtle and adaptable than that of using words. This particular
condition, typical of Western music, is not present in musical traditions that do not make use of
notation, such as jazz or many ethnical cultures. In our music, performance is explicitly thought of as
a particular language, distinct from that of composition, and aimed at "interpreting" compositional
language. This is clearly shown, for example, in the rules of performance proposed by Johann
Sundberg and his collaborators (1989) which are conceived as totally dependent on the written
structure (the composition) they have to make perceivable. With words this process becomes more
difficult. Words are cumbersom, they were born for different purposes. When applied to music their
main problem is to escape from their conceptual nature, to transform themselves into metaphors, to
become a sort of poetry speaking of music. I am personally convinced that musicology must be and
must remain a scientific discipline, but I am also aware that one of its subfields, musical
hermeneutics, needs to use verbal language in a non scientific way. The important thing is to maintain
a clear theoretical distinction between the two functions (something that is not always done
nowadays); that is, to avoid arbitrary overlappings between the domains requiring metaphoric and
even ambiguous language and those necessitating the use of scientific clarity along with the
systematic refusal of ambiguities.
REFERENCES
Agawu K., Playing with signs: A semiotic interpretation of classic music, Princeton University Press,
Princeton 1991.
Baroni M., Dalmonte R., Jacoboni C. , Le regole della musica. Indagine sui meccanismi della
comunicazione, EDT, Torino 1999.
Baroni M. "Musical grammar and the study of cognitive processes of composition", Musicae
Scientiae, 3/1 (1999), pp. 3-22.
Barthes, Le degré zéro de l'écriture, Seuil, Paris 1953.
Bent I (ed.), Music analysis in the 19th century (II: Hermeneutic approaches), Cambridge University
Press, Cambridge 1994.
Berlioz H., Grand traité d'instrumentation et d'orchestration, Lemoine, Paris 1843.
Bismarck G. von, "Timbre of steady sound: A factorial investigation if its verbal attributes", in
Acustica , 30 (1974).
Carpani G., Le Haydine, ovvero lettere sulla vita e le opere del celebre maestro Giuseppe Haydn,
Padova, Tipografia della Minerva 1823 (anast. reprod. Forni, Bologna 1969).
Clarke E., "Issues in language and music", Contemporary Music Review, 4 (1989), pp. 9-22.
Coker W., Music and meaning; A theoretical introduction to musical aesthetics, The Free Press, New
York 1972.

M
Darbellay E, Prefazione to G.Frescobaldi, Il primo libro di Toccate d'intavolatoura di cembalo e

organo, Edizioni Suvini Zerboni, Milano 1977.
Davies S., Musical meaning and expression, Cornell University Press, Ithaca-London 1994.
Deliège I.-Mélen M., "Cue abstraction in the representation of musical form", in I.Deliège, J.Sloboda
(eds.), Perception and cognition of music, Psychology press, Hove 1997, pp. 387-412.
Dogana F., Suono e senso. Fondamenti teorici ed empirici del simbolismo fonetico, Angeli, Milano
1983.
Eco U., Lector in fabula. La cooperazione interpretativa nei testi narrativi, Bompiani, Milano 1979.
Eco U., entry "Simbolo", in Enciclopedia Einaudi, vol. 12, Torino 1981
Erickson R., Sound stucture in music, University of California Press, Berkeley 1975.
Firth R., Symbols public and private, London, Allen and Unwin 1973.
Galati D., "Conoscenza delle emozioni ed emozioni primarie", in D.Galati (ed.) Le emozioni primarie,
Bollati Boringhieri, Torino 1993, pp. 162-212.
Galilei V., Dialogo della musica antica et della moderna, Marescotti, Firenze 1581.
Hanslick E., Vom Musikalisch-Schönen, Weigel, Leipzig 1854.
Hjelmslev L., Prolegomena to a theory of language, University of Wisconsin, 1961.
Imberty M., "Le problème de la médiation sémantique en psychologie de la musique", in Versus,
Quaderni di studi semiotici, 13 (1976), pp. 35-48.
Imberty M., Entendre la musique. Sémantique psychologique de la musique, Dunod, Paris 1979.
Jakobson R., "Language in relation to other communication systems", in Linguaggi nella società e
nella tecnica, Edizioni di Comunità, Milano 1970, pp. 3-16.
Kretzschmar H., "Anregungen zur Vörderung musikalischer Hermeneutik", Jahrbuch der
Musikbibliothek Peters, 1903, pp. 45-66.
Krumhansl C., & Schenck D.L., "Can dance reflect the structural and expressive qualities of music? A
perceptual experiment on Balanchine's choreography of Mozart's Divertimento n° 15", Musicae
Scientiae, 1/1 (1997), pp. 63-86.
Molino J., "Fait musical et sémiologie de la musique", in Musique en jeu, 17 (1975), pp. 37-62.
Nattiez J.J., Musicologie générale et sémiologie, Bourgois, Paris 1987.
Piaget J. La formation du symbole chez l'enfant, Delachaux et Niestlé, Neuchâtel 1945.
Piperno F., Prefazione to B. Marini, Affetti Musicali, Edizioni Suvini Zerboni, Milano 1990.
Raffman D., Language, music and mind, M.I.T. Press, Cambridge Mass. 1993.
Rosen Ch., The ckassical style, Haydn, Mozart, Beethoven, Faber and Faber, London1971.
Ruwet, Langage, musique, poésie, Seuil, Paris 1972.

M
Saussure, Cours de linguistique générale, Payot, Paris 1922.

Sloboda J., The musical mind. The cognitive psychology of music, Oxford University Press, Oxford
1985.
Stefani G., "Una teoria della competenza musicale", in Il segno della musica. Saggi di semiotica
musicale, Sellerio, Palermo 1987, pp. 15-35.
Sundberg J.-Fryden L.-Friberg A., Common secrets of musicians and listeners. An
analysis-by-synthesis study of musical performance, in Howell P-West R.-Cross I. (eds.),
Representing musical structure, Academic Press, London 1989.
Tafuri J., "L'ascolto musicale: problematiche e progetti", in C. Delfrati (ed). Esperienze d'ascolto
nella scuola dell'obbligo,, pp. 9-32.
Trevarthen C., "Musicality and the intrinsic motive pulse: evidence from human psychobiology and
infant communication", in Musicae Scientiae, Special Issue 1999-2000, pp. 155-215.
Back to index

tuesday
Back
Proceedings
Tuesday 8th August

. S1 S2 S3 S4 S5 S6
Symposium: Symposium: Symposium: Thematic Thematic Thematic
Session: Session: Session:
Structuring Music identity and Time in music:
factors in musical interaction from Intonation and Musical Emotion in music
processing psychoacoustics to harmony expression
Chair: Davidson,J. cognitive Chair: Scherer,K.
Convenor: psychology Chair: Kendall,R. Chair:
Belardinelli,M Convener: McAdams,S.
Davidson,J. Convenors:
Chair:Peretz,I. Drake,C &
Palmer,C.
Disscussaint:
Ciardi,F. Chair: Drake,C.
9.00 Birbaumer, N. Borthwick, S. Palmer, C. Kopiez, R. Eitan, Z. Lavy, M.
Complexity of The poshs and the From my hand to Intonation of Musical gesture, Effects of
brain activity slangs: the effects your ear: the faces embedded analysis and extra-music
reflects of social identity of meter in intervals: listening narrative on
complexity of on music learning performance and adaptation to two emotional
music perception tuning systems response to
(Abstract) music
(Abstract)
9.30 Belardinelli, M. Pitts, S. Tekman, H. Johnson, P. Madison, G. Lindstrom, E.
Looking for the An inherited Psychoacoustic Intonation and Interaction Interplay and
anchor points for musical self? determinants of interpretation in between melodic effects of
musical memory Exploring musical the use of time in string quartet structure and melodic structure
ability within and music performance: The performance and performance
adoptive family performance case of the flat variability on the on emotional
environment leading note expressive expression
quantities
perceived by (Abstract)
listeners
10.00 Rodriguez, C. Burland,K. Penel, A. Rakowski, A. London, J. Addis, L.
Interactions of The creative and Effects of Measurements of Musical Emotion in music

performance and collaborative grouping and the pitch strength expression and and language
memory skills in musical identity: metrical of percussion musical meaning
children's investigating organisations on instruments in context
musical thinking different social time perception
approaches to and production: (Abstract)
(Abstract) musical from simple
composition sequences to
music
(Abstract)
file:///g|/Tue/tuesday.htm (1 of 3) [18/07/2000 00:33:02]

tuesday
10.30 Ciardi, F. Magee, W. Repp, B. Cook, N. Aiello,R. Kreutz, G.
Discontinuity Identity in chronic Effects of Chordal Memorising two Basic emotions

Amplification in neurological contextual timing harmoniousness is piano pieces: the in music
melodic disability: finding variability on time determined by two recommendations
perception and able 'self' in perception and distinct factors: of concert pianists
music therapy sensorimotor interval
synchronization dissonance and
chordal tension
11.30 General assembly
12.00 Poster session 2

. Click here for details
Afternoon
S1 S2 S3 S4 S5 S6
Thematic Symposium: Thematic Symposium Thematic Thematic
session: session: Session: Session:
Categorical Music in popular
Music as an aid rhythm perception Cognitive culture and Pitch perception Music and
to learning and quantisation processes in everyday life. evolution
performance Chair:
Chair: Conveners: Krumhansl,C. Chair: Bannan,N.
Convener:
Gruhn,W. Chair: Rink,J. Hargreaves,D.,
Jansen,C.
North,A.
Chair: Repp,B.
Chair:
Folkestad,G.
Discussant:
Miller, L.K.
14.00 Rauscher, F. Clarke, E. Milankovic, V. Sloboda, J. Burnham, D. Tolbert, E.
Piano lessons Categorical Performance in Everyday Musicians Music and

enhance spatial rhythm perception education: a field experience of perceive speech meaning: an
imagery of and event to explore music: an differently evolutionary
at-rsik children perception experience story
-sampling study (Abstract)
14.30 Rodriguez, C. Desain, P. Williamon, A. MacDonald, R. Brattico, E. Jan, S.
Transactional How identification Do the principles Poular music: a Processing of The memetics of
cognition in of rhythm of expert memory persuasive and musical intervals music and its
music: categories apply to musical neglected art in the central implications for
relationships depends on tempo performance? form? auditory system: psychology
between aural and meter an event related
and potential (ERP)
tactile/haptic study on sensory
experiences consonance
15.00 Costa-Giomi, E. Large, E. Bogacz, S. North, A. Patel, A. Justus, T.

The relationship
between Rhythm Cognitive and Pop music lyrics Tone sequence Origins of human
(Abstract)olute categorisation in motor and the Zeitgeist: structure is musicality:
pitch and spatial context coordination of a histriometric reflected in evolution, mind,
abilities polyrhythms in analysis dynamic neural culture
piano performance responses
(Abstract)

tuesday
15.30 Rauscher, F. Papadelis, G. Goebl, W. Eerola, T. Lacherez, P. Hodges, D.
Long -term Rhythm Melody lead in Expectancy based The role of motion Why are we
effects of music categorisation skilled piano model of melodic perception in musical? Support
instruction on acuity as a performance complexity judgement or for an
kindergarteners manifestation of relative pitch evolutionary
spatial-temporal rhythmic skills theory of human
reasoning: a (Abstract) musicality
logitudinal field (Abstract)
study
(Abstract)
16.30 Invited Speaker: Frith, S

Why does music matter?
Chair: Sloboda, J.

STRUCTURING FACTORS IN MUSICAL PROCESSING
Symposium Introduction

Marta Olivetti Belardinelli
ECONA (Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial
Systems)
Department of Psychology, University of Rome "La Sapienza"
The Symposium comprises four presentations of experimental results obtained in researches on

cognitive processing elicited during music listening.
Rieman already understroke that music is not the score or the music physically sounding in the
environment: music really exists only as representation of sound relationships in the listener's mind
(1914-1915).
From the cognitive point of view temporal, perceptual and contextual hierarchies of music are built up
during listening and are continuously revised as listening goes on in order to grasp the meaning of the
musical message.
Considering the shared experience of tonal music in western culture Lerdahl and Jackendoff (1983)
elaborated a generative grammar of tonal music. However on the same basis it is impossible to
understand how everybody can listen to and comprehen music with which one is not acquainted and
whose grammar is therefore unknown. The problem regards not only the music of other cultural
traditions, but also contemporary atonal music for which common western listeners lack both a
generative grammar and past experience. Psychologists are therefore interested in assessing cognitive
rules and modalities according to which perceived characteristics determine what is heard as salient in
this new temporal schema which decomposes objective linear time. This process could be interpreted
as an example of what Elman (1990) called an implicit representation of time.
In the first place this representation sends back to the problem of the tenporal encoding in the auditory
and other body systems so as to the temporal properties of neurons and the temporal encoding at the
cortical level, ending with the problem of the decoding of temporal information (Fotheringhame &
Young, 1997).
Moreover at a psychological level the stress on the listener's perceptual characteristics addresses the
research to the subjective state of awareness that influences memory functioning in establishing the
temporary relative hierarchy of auditory events, while the sequential processing is still developing.
But the entire matter is complicated by subjects' culture, age, expertise and habituation (Deutsch,
1995) that deeply influence music processing from the level of the brain activity to the decoding of
the musical meaning. Effects of experience and habituation on the dimensional complexity of brain
activity were recently found (Birbaumer, et al., 1996), and the influence of age and expertise in
determining the subjective states of awareness in recognition memory for musical themes was shown
(Olivetti Belardinelli, Rossi Arnaud, 1999).
file:///g|/Tue/Belasymp.htm (1 of 2) [18/07/2000 00:33:03]

In this frame the first contribution by N. Birbaumer and A. di Gangi investigate the relation between
complexity of brain activity and complexity of music. M. Olivetti Belardinelli and collaborators
investigate the influence of salience and tonality on recognition memory. M Imberty shows the
influence of subjects' cognitive style and spontaneous rhythm on the establishment of perceptual
hierarchies. C.X. Rodriguez assesses the age related relationships between performance recognition
and fragment recognition in children, obtaining a richer profile of children's musical thinking.
References
Birbauner, N., Lutzenberger, W., Rau, H., Braun, C., & Mayer-Kress, G. (1996). Perception of music
and dimensional complexity of brain activity. International Journal of Bifurcation and Chaos, (2),
267-278.
Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Fotheringhame, D., & Young, M.P. (1997). Neural coding schemes for sensory representations:
Theoretical proposals and empirical evidence. In M.D. Rugg (Ed.) Cognitive Neuroscience (pp.
47-76). Hove East Sussex: Psychology Press.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge: MIT Press.
Olivetti Belardinelli,M., & Rossi Arnaud, C. (1999). Recollection and familiarity in recognition
memory for musical themes. In Proceedings of the XI Conference of the European Society of
Cognitive Psychology, Gand: Academic Press, 1999.
Riemann, H. (1914-1915). Ideen zu einer "Lehre von den Tonvorstellungen". Jahrbuch der
Musikbibliothek Peters, Leipzig.
Back to index
file:///g|/Tue/Belasymp.htm (2 of 2) [18/07/2000 00:33:03]

idsymp
Symposium Chairperson: DAVIDSON, Jane W. Dr
Musical Identity and Interaction
Symposium Rationale: Research into musical behaviour has tended to focus on individual factors such as motivation
and has usually relied on counting the occurrence of specific behaviours over time to determine their salience in either
the development or continuation of musical engagement. The current symposium offers an alternative view of musical
behaviour, considering it primarily within its social context and how interpersonal and musical interactions impact
upon the individual participant's sense of self.
Aims: The symposium aims to offer a fresh insight into musical phenomena. Working from a primarily social and
psychodynamic perspective, the current symposium aims to explore how an individual's musical identity is influenced
by different forms of social interaction. The first paper explores children's relationships with their parents in their
development of a musical identity. The second paper focuses specifically on the issue of adoption and how an adopted
child develops a musical identity in relation to parents and other family members. The third paper looks at how social
networks influence work on specific musical tasks - creating a composition. Finally, the fourth paper explores how
Music Therapy interactions are used for a client to monitor a changing self concept as a degenerative illness advances.
Back to index
file:///g|/Tue/Idsymp.htm [18/07/2000 00:33:04]

TueAM1_1 Birbaumer
Proceedings paper
Complexity of brain activity reflects complexity of music

ALIDA DI GANGI∗ and NIELS BIRBAUMER∗ ,†
∗ Institute of Medical Psychology and Behavioral Neuroscience,
Eberhard-Karls University of Tübingen, Germany.
Gartenstraße 29, D 72074 Tübingen, Germany
†Dipartimento di Psicologia Generale, University of Padova, Italy
Introduction
Music stimuli produce emotion-specific physiological changes that reflect their emotional contents
(Krumhansl CL, 1997). Emotions expressed by musical stimuli are associated with their valence and
arousing qualities (North AC et al, 1997). Visual stimuli presented together with musical stimuli may
amplify the existing positive emotions (Wallbott HG, 1989). It is also important to highlight that
musical creativity and performance is closely linked to emotional associations (Lund NL et al, 1994).
Neurophysiological measures such as EEG-recordings and -analysis may provide more objective
measures of neural changings occurring during exposure to musical stimuli than psychological ones.
EEG studies also showed differences between musicians and non-musicians in musical processing.
Musicians show higher amplitudes in auditory evoked potentials at the fonto-temporal lobe, than
non-musicians (Hibler N et al, 1981). Mismatch Negativity (MMN) -a component of evoked
potentials- indexing the preattentive detection of change in stimulus patterns, is larger in musicians.
That indicates improved sensory memory function, suggesting that cognitive components of
musicality (defined as the ability to temporally structure auditory information) are based on neural
mechanisms already present at the preattentive level (Tervaniemi M et al, 1997). Musicians show
changes in both hemispheres, and changes in brain activity involve larger brain regions in musicians
than in non-musicians. In addition, musicians show changes in the frequency band between 18 and 24
Hz, non-musicians between 13 and 18: that may suggest different strategies in processing musical
information between musically trained and untrained subjects (Petsche H et al, 1985). Musicians are
also faster than non-musicians in detecting auditory incongruities, their brain waves differ from
non-musicians and as a function of their familiarity with the melodies and the type of incongruity
(Besson M et al, 1994).
Chaos theory and non-linear dynamics provide a useful instrument to analyze EEG recordings. Such
instruments have been increasingly used in the last 30 years in order to describe and explain the
dynamics of many biological phaenomena. Non-linear dynamics has been successfully used in
medicine (Chialvo et al, 1987; Garfinkel, 1992; Guevara et al, 1981; Holstein-Rathlou et al, 1994);
biology (Olsen et al, 1990); physics (Kowalik et al, 1988), chemistry (Rössler et al, 1978): since 1985
file:///g|/Tue/Birbaume.htm (1 of 7) [18/07/2000 00:33:06]

TueAM1_1 Birbaumer
there has been an increasing interest in EEG-analysis with non-linear algorhytms (Rey et al, 1997).
According to Kaplan (Kaplan et al, 1995) chaotic behavior is defined as "aperiodic bounded dynamics
in a deterministic system, with sensitive dependence on initial conditions".
A set of finite differential equations can be used in order to describe the dynamics of biological
systems that exibite chaotic behavior (Hodgkin et al 1939, 1952): through time-series analysis the
system can be reconstructed . The method is very useful, especially for non-stationary signals such as
the EEG (Elbert et al, 1994). In order to reconstruct long temporal series, specific algorhythms are
used (Lutzenberger et al, 1992a, 1992b). The number of indipendent variables required to reconstruct
the whole time series is defined as "dimensionality". An attractor is defined as a phase-space subset to
which the phase space trajectory may converge. The correlation dimension allows to analyze the
attractor dimensionality. A frequently used measure for dimensionality is the D correlation
2
dimension.
Dimensionality can be a useful tool in understanding brain dynamics. Dimensionality may be defined
as the number of brain structures functioning at the same time. Complexity changes in brain activity
due to different conditions - as sleep, epilepsy or Alzheimer disease- or different tasks show different
brain dimensionality. (Babloyantz A et al, 1986; Babloyantz A et al, 1986; Birbaumer N et al, 1995;
Fell J et al, 1993; Jeong J at al, 1998b). The higher the dimensionality, the more structures or
cell-assemblies are involved. Cell-assemblies are defined as group of cells with plastic synapses
distributed with any possible distance across the neocortex, with excitatory connections among each
other; the excitatory connections of a particular assembly are stronger than the background assemblies
responsible for another mental activity. Assemblies are formed through contiguity: simultaneous
arrival of two impulses or cascades of impulses at plastic synapses strenghten their connection, and at
the next occasion the input of only few synapses are capable of firing the post-synaptic unit (Hebb,
1949).
How can non-linear dynamics shed light on neural processing of music? Recent studies show that
emotional responses to music are reflected in the electical activity of the brain. Brains which raport
positive feelings to musical stimuli show decreased chaotic electrophysiological behavior. In addition,
rhythmic variations contribute much more to tha subjective emotional response to music than the
melody variations (Jeong J et al, 1998a).
The topic of this work was to show whether particular features in musical stimuli are reflected in
changes in dimensionality in EEG-recordings. Our question was whether complex music stimuli
evoke more complex brain responses (i.e. activity of more and extended cell-assemblies), whether
rhythm or melody have an influence, and whether education and musical preferences play a role in
brain complexity.
Experiment
Subjects
Eighteen healthy and right handed males aged between eighteen and thirthy-one years (mean age:
22,0 years). All subjects were free of any medication. Prior to the experiment, subjects were informed
about all aspects of the experimental procedure and then asked to sign an informed consent according
to the Helsinki convention on human studies.

TueAM1_1 Birbaumer
Methods
The experiment lasted for about 50 minutes and the procedure was identical for each subject. Subjects
sat in a sound proof chamber on a comfortable reclining chair. At the beginning, they received a
headphone; at the same time EEG electrodes were attached to their head. The electrodes were placed
at the following placements: F3; Fz; F4; C3; Cz; C4; Pz; P4. In addition, two electrodes were placed
on the mastoids. All channels were amplified with a bandwidth from 0.016 Hz to 70 Hz and sampled
at a rate of 256 Hz. Horizontal eye movements were recorded and EEG was corrected for ocular
artefacts. The EEG was recorded using Ag/AgCl electrodes according to the international 10-20
system. The experiment consisted of three blocks. Each block contained 12 trials which lasted for 15
s. Trials were separated from each other by intertrial intervals randomly varying between 8 and 15 s.
The random variation of the intertrial interval was inrtoduced in order to prevent systematic EEG
variation associated with expectancy and preparation. During a single trial the acoustic stimuli were
presented without any other stimulation, subjects had to attend to these stimuli. After each trial,
subjects had to perform two subjective ratings, regarding (a) the subjective interest elicited by and (b)
the subjective complexity of the stimuli on a 1 to 9 analogue scale, with 1 indicating low
interest/complexity and 9 indicating highest interest/complexity. After the experiment and after
removal of the electrodes, subjects performed a short questionnaire asking:
1. How they estimate their own musical capability.
2. How many hours a week they perform music
3. How many hours a week they listen to music
4. How much they like classical music
5. How much they like popular music
6. Which instruments they perform
7. How they estimate their rhythmic capability
8. How they like dancing
9. How they like Jazz
10. Which kind of musical education they had.
Each question had a scale range from 1= very low to 5=very high. The questionnaire evaluated
musical habits of each subject.
During block 1 (mode "melody"), only the pitch of the piano sounds was varied (melodic complexity)
with rhythm kept constant. During block 2 (mode "rhythm"), only the rhythm of the wood-drum like
sounds have been varied with tone frequency being constant. During both blocks, three different kinds
of trials (four trials of each complexity condition) presented in pseudo randomized sequence. Three
degrees of complexity were introduced the first condition consisting of periodic, the second of chaotic
and the third of stochastic (i.e. without any set of deterministic laws) tone sequences. The third block
(mode "melody and rhythm") also contained 12 trials which were separated into three conditions. In
this block, variation of melody and rhythm was combined. Condition 1 contained periodic melody and
periodic rhythm, condition 2 periodic rhythm but stochastic melody, condition 3 stochastic rhythm
and stochastic melody. The computer-synthesizer generated sequences of stimuli were recorded on
analogue tape and replayed from a tape recorder. Sequence of stimuli and intertrial intervals were
identical across subjects.

TueAM1_1 Birbaumer
Apparates
Acoustic stimuli were created using a Yamaha syntetizer connected via an Opcode Studi III Midi
interface to a NeXT computer. For the generation of the Midi signals we used the software package
Chaos.app originally written by R. Bidlack and modified by ED Erwin.
Data Analysis
For each sequence an interval of 16 s was selected for the analysis. Thus the lenght of each EEG-trace
was 2048 points. For every EEG trace the following measures were calculated:
(i) EEG alpha power, obtained from the average log power in the range from 8 to 12 Hz. The power
spectrum was calculated by averaging the Fourier transforms of 15 overlapping 2s segments (256
points), using Parzen windows on the 2s segments.
(ii) EEG beta power, calculated as the average log power in the range from 14 to 30 Hz.
(iii) The state space dimension of the EEG (D ), the singular value decomposition was based on the
2
autocovariation function with time-lags ranging from 0 to 32 points corresponding to 0.25 s.
ANOVA (Analysis of Variance) were calculated with the between factors electrode rows (left, middle,
right), electrode columns (frontal, central, parietal), sound complexity (periodic, chaotic, stochastic),
and type of modulation (rhythmic, melodic and both).
Two approaches were used: the first analysis did not differentiate between the musical background of
the subjects and included all three types of modulation. A second analysis was based on the subjects’
preferred type of music: seven preferred classical music and nine subjects preferred popular music.
We expected a differential responsiveness of these groups to the type of modulation. For this analysis
we used only the two pure types of modulation (mode "melody" and mode "rhythm").
Results
The analysis of the EEG dimension showed significant effects of electrode row (F(2, 30)=9.8; ε
=0.89, p<0.001) and of electrode columns (F(2, 30)=4.5; ε =0.91, p<0.05) which demonstrate a
non-uniform distribution over the head. With respect to the experimental variation, we found a
significant interaction of the electrode rows and complexity (F(4, 60)=7.1; ε =0.66, p<0.002). The low
dimensional chaotic music induced a reduction of the dimension mainly in the frontal electrodes,
compared to the periodic and the stochastic music, which showed no significant differences. The
parietal electrodes showed no significant effects of complexity while the central electrodes showed
moderate effects similar to the frontal electrodes. These effects were confirmed by post-hoc t-tests. No
effects of the type of modulation were found.
The analysis with the factor "music preference" (classical vs popular music) and the restriction to the
two pure types of modulation (melody vs rhythm) confirmed the above interaction of electrode rows
and complexity (F(4,56)=5.7; ε =0.51, p<0.008). In addition, we found a significant interaction
between groups, complexity, and type of modulation (F(2,28)=5.1; ε =0.98, p<0.02). Subjects
preferring classical music responded with a reduction of the EEG dimension if the melody modulation
was chaotic while subjects preferring popular music showed this effect when the rhythm was
modulated. For the complexity rating which was performed immediately after each trial, the 2 (group:
classical versus popular music preferred) by 3 (complexity condition: periodic, chaotic, stochastic) by
2 (mode: melody vs rhythm) ANOVA showed a significant effect of complexity (F(2,28)=5.7;

TueAM1_1 Birbaumer
p<0.001), a significant effect of mode (F(1,14)=21.0, p<0.001) and a significant interaction of

complexity and mode (F(2,28)=13.4, p<0.001). Especially for the melody mode, the subjective
complexity rating followed the mathematical complexity of the stimuli.
The interest rating, which also was performed immediatly after each trial, was sensitive to the
complexity manipulation (F(2,28)=14.0, p<0.001) with the relationship being linear for the melody
mode but not linear for the rhythm mode (interaction complexity condition × mode: F(2,28)=9.6,
p<0.01). The stochastic melody was rated as being the most interesting, the chaotic melody as
medium interesting and the periodic melody as least interesting. The chaotic rhythm however was
rated as most interesting, followed by the stochastic rhythm with the periodic rhythm being rated as
least interesting.
Discussion
Low-dimensional chaotic sequences produce a significant reduction in dimensional complexity
compared to both periodic as well as stochastic sequences. This was documented particularly in the
prefrontal regions. The phenomenon occurs for the melodic sequences in both groups whereas in the
"popular music" group it was observed only in the case of the rhythmic sequences. If chaos levels
reflect the number of active cell-assemblies than it could depict the aesthetic experience. Subjective
interest may at least in part be determined by stinulus complexity, whose neurophysiological
equivalent may consist in EEG complexity. Subjective interest, particularly in the musically
sophisticated subjects is reflecting the richness or diversity of associative connections evoked by a
particular piece of music. On the other hand the majority of less educated listeners those rhythmical
modulations which obviously "pull" their brain activity in a less complex periodic oscillatory
response, shutting off all competing assemblies. The difference between the three types of music (high
and weakly chaotic and periodic) is confined to frontal brain regions. The same result was found for
variations in intelligence (Lutzenberger W et al, 1992b; Schupp HT 1994) and differences between
mental imagery and perception of diverse objects. The more intelligent showed increased dimensional
complexity in the prefrontal regions. Apart from the general explanation that production of music
seems to be an exclusive human trait, appearing as late in the phylogenetic and ontogenetic evolution
as the prefrontal cortex, mental processes such as listening to music, creative thinking, imagery,
involve dalay of immediate reinforced behavior and active working memory. Both cognitive functions
are more or less exclusively frontally located. It is therefore not surprising that realization of the
"highest" (latest evolutionary) cognitive skills used the participation of additional frontal
cell-assemblies which is expressed in an increased frontal dimensional EEG complexity.
Supported by the Deutsche Forschunggemeinschaft (DFG)
References
Babloyantz A, Destexehe A "Low-dimensional chaos in an instance of epilepsy" Proc. Natl. Acad.
Sci. USA, 83: 3513-3517, 1986
Babloyantz A, Salazar JM, Nicolis C "Evidence of chaotic dynamics of brain activity during the sleep
cycle" Physics Letters, 111A: 152-156, 1986
Besson M; Faita F; Requin J "Brain waves associated with musical incongruities differ for musicians
and non-musicians" Neurosci Lett 1994 Feb 28; 168(1-2): 101-5
Birbaumer N., Flor H., Lutzenberger W., Elbert T "Chaos and order in the human brain" Perspectives
of Event-Related Potentials Research (EEG Suppl. 44): 450-459, 1995

TueAM1_1 Birbaumer
Birbaumer N, Lutzenberger W, Rau H, Braun C, Mayer-Kress G "Perception of music and

dimensional complexity of brain activity" Intern. J. of Bifurcat. and Chaos, 6 (2): 267-278, 1996
Chialvo DR, Jalife J "Nonlinear dynamics of cardiac excitation and impulse propagation" Nature, 330:
749-752, 1987
Elbert T, Ray WJ, Kowalik ZJ, Skinner JE, Graf KE, Birbaumer N "Chaos and Physiology"
Physiological Reviews, 74: 1-47, 1994
Fell J, Roschke J, Beckmann P "Deterministic chaos and the first positive Lyapunov exponent: a
nonlinear analysis of the human electroencephalogram during sleep" Biol Cybern 1993; 69(2): 139-46
Garfinkel A, Spano ML, Ditto WL, Weiss JN "Controlling cardiac chaos" Science, 257: 1230-1234,
1992
Guevara MR, Glass L, Shrier A "Phase locking, period doubling bifurcations, and irregular dynamics
in periodically stimulated cardiac cells" Science, 214: 1350-1353, 1981
Hart JH, Cogan R "Sex and emotional responses to classical music" Percept. Mot. Skills 1973 Jun;
36(3): 1170;
Hebb D "The organization of behavior" 1949, Wiley
Hibler N, Wallner K "Can musical feeling be measured?"Laryngol Rhinol Otol (Stuttg) 1981 Jun:
60(6): 284-8
Hodgkin A, Huxley A "Action potentials recorded from inside a nerve fibre" Nature, 144: 710-711,
1939
Hodgkin A, Huxley A "A quantitative description of membrane current and its application to
conduction and excitation in nerve" Journal of Physiology, 117: 500-544, 1952
Holstein-Rathlou N-H, Marsh DJ "Renal blood flow regulation and arterial pressure fluctuation: a
case study in nonlinear dynamics" Physiological Reviews, 74: 637-681, 1994
Jeong J, Joung MK, Kim SY "Quantification of emotion by non-linear analysis of the chaotic
dynamics of electroencephalograms during perception of 1/f music" Biol Cybern 1998 Mar, 78 (3):
217-25 a
Jeong J, Kim SY, Han SH "Non-linear dynamical analysis of the EEG in Alzheimer’s disease with
optimal embedding dimension" Electroencephalogr. Clin. Neurophysiol 1998 Mar; 106(3): 220-8 b
Kaplan DT, Glas L "Direct test for determinism in a time series" Physical Reviews Letters 68:
427-430, 1992
Kowalik ZJ, Franaszek M, Pieranski P "Self reanimating chaos in the bouncing-ball system" Physical
Review A, 37: 4016-4022, 1988
Krumhansl CL "An exploratory study of musical emotions and psychology"Can J Exp Psychol 51(4):
336-53
Lund NL, Kranz PL "Notes on emotional components of musical creativity and performance" J
Psychol 1994 Nov; 128(6): 635-40.
Lutzenberger W, Elbert T, Birbaumer N, Ray WJ, Schupp H "The Scalp distribution of the fractal

TueAM1_1 Birbaumer
dimensions of the EEG and its variations with mental tasks" Brain Topography 5: 27-33, 1992a
Lutzenberger W, Birbaumer N, Flor H, Rockstroh B, Elbert T "Dimensional analysis of the human
EEG and intelligence" Neurosci Lett 143, 10-14 1992b
North AC "Liking, arousal potential, and the emotions expressed by music" Hargreaves DJ Scand J
Psicol 1997 38 (1):45-53
Olsen LF, Schaffer WM "Chaos versus noisy periodicity: alternative hypotheses for childhood
epidemics" Science, 249: 499-508, 1990 Robazza C, Macaluso C, D’Urso V "Emotional reactions to
music by gender, age and expertise" Percept. Mot. Skills 1994 Oct; 79(2): 939-44;
Petsche H; Pockberger H; Rappelsberger P "Music perception, EEG and musical training" EEG EMG
Z Elektroenzephalogr Elektromyogr Verwandte Geb 1985 Dec; 16(4): 183-90
Rössler OE, Wegman K "Chaos in the Zhabotinskij reaction" Nature, 271: 89-90, 1978
Schupp H T, Lutzenberger W, Birbaumer N, Miltner W, Braun C "Neurophysiological differences
between perception and imagery" Cognitive Brain Research 1994 2,77-86
Tekman HG "A multidimensional study of preference judgements for excerpts of music" Psychol Rep
1998 June: 82 (3Pt1): 851-60
Tervaniemi M; Ilvonen T; Karma K; Alno K; Natanen R "The musical brain: brain waves reveal the
neurophysiological basis of musicality in human subjects" Neurosci. Lett. 1997 Apr 18, 226 (1): 1-4
Terwogt MM, Van Grinsven F "Recognition of emotions in music by children and adults" Percept.
Mot. Skills 1988 Dec, 67(3): 697-8
Wallbott HG "The ‘euphoria’ effect of music videos-a studyof the reception of music with visual
images" Z. Exp Angew Psychol 1989: 36(1): 138-61
Back to index

The Poshs and the Slangs: The effects of social identity on music learning
Borthwick, Sophia, J.
Centre for the Study of Music Performance and Perception, Department of Music, University of
Sheffield, Sheffield , U.K., S10 2TN
mup95sjb@sheffield.ac.uk
Background. Unlike previous research into influences on musical development which have looked at
the child and usually only one parent's account of that child's musical life, this study examines in
detail the identities of all family members in order to produce an integrated picture of the underlying
dynamics surrounding music within the home.
Aims. This study examines generational influence claiming that a child's musical identity and
'success' is directly shaped by their parents' social backgrounds.
Method. The study gleans best practice from clinical family therapy where it has been recognised that
to fully understand a person, the other sides of the story must be acknowledged. This entails each
member of the family telling his/her own story so that all constructions are assessed within a
qualitative quasi-anthropological framework. Using semi-structured interview techniques, this study
examines the multiple layers of interaction within twelve families where all have at least one child
learning an instrument.
Results. The concept of script patterning can be used to explain the way in which a child's musical
identity is influenced by his/her parents. Although peer pressure is undoubtedly significant, it is
parents' attitudes to music that have been determined by their own social makeup, which are found to
be fundamental in fashioning their children's musical success.
5. Conclusions. Although it cannot be disputed that significant others outside the home context play a
key role in developing a child's musical interest and skills, the social identities of the parents'
themselves from their respective Families of Origin seem directly instrumental in fashioning the
musical outcome of their children.
Back to index
file:///g|/Tue/Borthwic.htm [18/07/2000 00:33:06]

From my hand to your ear: The faces of meter in performance and perception
Proceedings paper
From my hand to your ear: The faces of meter in performance and perception.
Caroline Palmer and Peter Q. Pfordresher, Ohio State University
Background.
Although much theoretical and experimental study in music cognition has examined the role of meter
in perception, less work has examined the role of meter in music performance. On the one hand, many
studies have documented that perception of meter may arise from a variety of acoustic cues, and
metrically regular patterns allow more accurate perception of and memory for music (eg. Jones &
Pfordresher, 1997; Povel, 1981). Such evidence has inspired theories to posit that the perception of
meter arises from attention to surface-level periodicities in a sequence that generate expectancies by
driving internal rhythmic oscillations (Large & Jones, 1999). Other work has suggested that meter
may serve as a well-learned abstract schema that guides listeners' interpretation of strong and weak
beats even in the absence of surface cues (Palmer & Krumhansl, 1990); although meter stems from
the musical surface, it is not entirely dependent on surface structure.
In contrast to the perceiver's task, performers do not have to derive the meter; they know it
beforehand. Furthermore, they often choose not to emphasize the meter in terms of the acoustic cues
found useful for listeners. This may be because expressive nuances in performance are for the most
part subtle, and metrical accents interact with many other accents in terms of performers' expressive
nuances (Drake & Palmer, 1993). A performance that emphasized meter might even be considered
exaggerated and unmusical. Yet evidence from many performance situations suggests that meter is far
from irrelevant in performance. Pitch errors in experienced pianists' performances of well-learned
music reflect the tactus or metrical level considered most important (Meyer & Palmer, submitted).
The precision of performance timing, as measured by deviations in interonset intervals, suggest that
some metrical levels are more directly timed than others (eg. Shaffer, 1981). Production of event
sequences that match a metrical framework is often more accurate than production of sequences that
do not (Povel, 1981). Finally, performance of the complex meters present in polyrhythms
demonstrates that metrical complexity is an important dimension of performance (Handel, 1989).
Music-theoretic approaches to meter suggest at least two alternatives for the role of meter in musical
structure: as a time-based metric, in which metrical beats are separated by equal time-units, or as an
accent-based metric, in which metrical beats are distinguished by accents. Both approaches point to
some regularity in the pattern of events. That regularity can be defined in terms of accent strength;
meter can be described as an alternation of strong and weak accents, usually in binary or ternary
alternation between strong beats (eg. Cooper & Meyer, 1960). The regularity can also be defined in
terms of time spans; a beat is defined in terms of a point in time and the time elapsed between one
beat and the next offers a source of regularity (eg. Lerdahl & Jackendoff, 1983). The accent-based
approach assumes only an ordinal scale, which means that the downbeat is stronger than the second
beat and so on. Ordinal scales for meter make the assumption that strong beats are separated by weak
beats but do not rely on assumptions about the ratios of timespans between such beats. The timespan
approach defines meter as a periodic alternation of strong and weak beats and incorporates an
assumption of a ratio scale of events, which means that one timespan has twice the duration of
file:///g|/Tue/Palmer.htm (1 of 11) [18/07/2000 00:33:10]

another, and so on. The assumption of an ordinal versus ratio scale for meter has important
implications: conclusions such as relational invariance of timing across tempo in performance follow
from the ratio-scale metric, but not from the ordinal-scale metric.
Theories of perception and performance tend to differ in the scale assumptions they make. Both
music-theoretic (Hasty, 1997) and psychological approaches (Jones, 1976; Large & Jones, 1999)
propose that interactions between rhythms of the preceding event structure and rhythmic
predispositions of the listener generate expectancies such as meter. These theories assume that the
perception of rhythm incorporates an underlying ratio scale and can explain findings such as listeners'
detection of timing deviations and categorization of ratio-based time intervals (Jones & Yee, 1997;
Large & Jones, 1999).
However, there is less evidence that performance can be explained by invoking similar ratio-based
models. For instance, the ratio-scale assumptions conflict with evidence that performers do not use
ratio-based time units - timing in performance is always fluctuating. Also, some work suggests that
listeners can use temporal cues other than simple ratio intervals to perceive meter (Large & Palmer, in
preparation). In addition, ratio-based theories do not explain perceived similarity among the musical
events that make up a performance; therefore, it is difficult for such an approach to explain memory
confusions that arise in performance errors, such as the common error of substituting the correct event
with one intended for a nearby location in the same musical sequence: a serial ordering error. Because
of these problems, and the simplicity of the ordinal time scale inherent in the accent-unit approaches
to meter, we rely in this paper on an accent-based (ordinal) approach to meter in performance.
Another possible metrical distinction to consider between perception and performance of music is the
role of memory. Both perception and performance require integration of musical events over time in
memory. Related work in psychology of memory suggests that behaviors as diverse as speech,
categorization, and decision-making reflect temporal constraints on short-term memory.
Developmental work suggests that older children show increased temporal persistence of auditory
sensory memory relative to younger children, as well as increased storage of phonological (verbal)
information (Gathercole, 1999). A related finding suggests that children make relatively more serial
order errors than adults (Brown et al, in press). One explanation offered for these findings is that
younger children's slower mental rehearsal leads to faster decay of information over time. If memory
demands in general play a larger role in performance than in perception, then temporal constraints on
memory may be more apparent in music performance, especially at faster tempi.
Two problems arise in the performer's memory for sequences of events: knowing what to do next (the
serial order problem) and knowing when to do it (the relative timing problem). Early work in memory
for sequences of words, tones, and other lists showed that when we are required to remember a
sequence of items, we often remember the items but not the order in which they occurred (Gathercole,
1999). This result suggested that there is an important difference between remembering the items in a
sequence, and remembering the order in which those items occurred. However, these two dimensions
may not be separate in memory for hierarchically organized sequences such as music. For example,
both speech errors and music performance errors tend to reflect sequence events intended for
elsewhere in the sequence that arose from the same phrase rather than from a different phrase,
suggesting that mistakes in serial order are not random but instead reflect hierarchical constraints on
memory for the sequence (Garcia-Albea et al, 1989; Palmer & van de Sande, 1995). The question we
address here is: is this scope constraint on how much of a musical sequence is accessible in memory
based on ordinal or ratio-scale properties? That is, are elements within a sequence related in memory
in terms of their ordinal properties (such as same or different phrase), their ratio properties (such as
twice the duration or half the duration), or both?

Modeling Meter in the Serial Ordering of Performance

We describe a two-part model of how performers retrieve musical events from memory and organize
them in music performance. In this model, meter serves as an accent-based ordinal grid that provides
performers with a schema or general, abstract framework which, combined with temporal constraints
of short-term memory, predicts the likelihoods of the correct musical events being retrieved from
memory and performed. The theoretical assumption that events are related metrically on an ordinal
scale is based on definitions of metrical accent strength derived from metrical grids, as used in
Western tonal music theory (Lerdahl & Jackendoff, 1983) and English metrical phonology (Liberman
& Prince, 1977).
Metrical grids define a sequence of musical events in terms of their accent strengths on an ordinal
scale; take for instance a metrical grid for 4/4 meter, which defines 4 hierarchical levels of accent,
with binary alternation of strong and weak beats at each level. The highest hierarchical level in this
grid corresponds to the downbeat of each measure, which is aligned with 4 accents: one for the
sixteenth-note level, one for the eighth-note level, one for the quarter-note level, and one for the
half-note level. Such a grid would apply to all pieces whose time signature is 4/4, or 4 beats per
measure (metrical unit); similar grids have been proposed to represent ternary meters, with two weak
accents between strong accents at one level of the grid. Perceptual evidence supports the notion of
metrical grids; listeners tend to infer patterns of strong and weak accents in the absence of sufficient
information (Palmer & Krumhansl, 1990) and musicians tend to have more well-defined metrical
hierarchies (Jones & Yee, 1997; Palmer & Krumhansl; 1990). Frames or schemas are instantiated in
musical pieces by the distribution of events that fall at each accent location (Palmer & Krumhansl,
1990; Palmer, 1996). Thus, sufficient cues across many musical pieces instantiate for the musician the
metrical frame through exposure to the musical style.
The advantage of applying metrical grids to performance is that they offer a bidirectional (past and
future) framework that can guide the planning of sequence events. Consider as evidence the serial
order errors that arise in music performance. Although timing has been the focus of most performance
studies, performance errors (breakdowns that result in unintended events) provide more information
on memory and planning constraints. Figure 1 shows a typical distribution of pitch errors from
different performances of a Bach prelude in D-Major, in 4/4 meter; the histogram of error frequencies
is shown in terms of the distance of the error (unintended pitch) from where it was intended to have
been performed (Meyer & Palmer, submitted). Two patterns are evidenced here (see also Palmer &
van de Sande, 1995, for similar error distributions with different musical pieces); the first is that errors
are more likely to reflect events from nearby distances than from faraway distances; the distribution is
symmetrically distributed around the present event (distance 0), and approaches a normal distribution.
The second pattern is that the histogram shows periodic peaks at binary distances from the present
event. These peaks suggest a periodic rise in the accessibility of that information to the performers.
These events at binary distances (multiples of 2) can be more similar to each other in metrical accent
for this piece in 4/4 meter than events at other distances.
Metrical grids can be used to predict similarity among sequence events that have similar accent
strengths (ie, are aligned with the same metrical level in the grid). Within each level in a metrical grid,
events that are nonsequential have greatest similarity. As a result, events from farther away are more
similar to each other metrically than events close by. This prediction is attractive because
similarity-based interference, a common cause of serial ordering errors in performance, often arises
among nonadjacent events (Palmer & van de Sande, 1995). Figure 2 shows an example of an abstract
representation of metrical accent strength in 4/4 meter, in which repeats throughout the music in a
cyclic fashion; 0 degrees defines the onset of each metrical bar (the downbeat), aligned with the

greatest number of accent levels in the grid. Thus, the number of total event locations in one cycle of
the grid, n, is determined here by the number of divisions from the lowest level to the highest level.
The metrical accent strength of each event, m, is represented by the length of each vector, which is
equal to the number of metrical levels in the grid with which that event coincides. Note that the
accents are not temporally defined; the grid can stretch or shrink to fit the tempo of the sequence.
Time is defined in terms of a serial (proximal) component of the model, described later.
The first component of the model, metrical similarity (Mx), defines the similarity in metrical accent
strength between sequence events. The absolute difference in metrical strength between an event at
position i and another event at distance x (position i+x) is computed and divided by the sum of the
metrical accents for the two events. That difference is subtracted from 1 to form a similarity metric, as
follows:
Equation 1:
The righthand side of Equation 1 reflects the fact that this function for metrical similarity is a form of
Weber's law. This is psychological appealing because it captures the perceptual analogue that listeners
are more sensitive to a given accent difference among a pair of events when presented in a context of
low-intensity accents, than in a context of high-intensity accents.

Figure 1: Frequency
histogram of the number of
serial order (pitch) errors by
event distance, in
performances of Bach
Prelude in D-Major (from
Meyer & Palmer, in
preparation).

Figure 2: Circular representation of metrical accent strength for 4-tier metrical grid. Metrical strength
of each event is represented by the length of each vector; concentric circles represent each level in the
grid.
Similar contrast functions have been used in vision to model the detection of luminance differences
(Michaelson, 1906). This metrical similarity measure is summed across all positions in a sequence and
divided by the total number of positions n, to generate the vector of metrical similarity values across
distances from the current event, Mx, as follows:
Equation 2:
The second component of the model, serial proximity (Sx), captures the fact that memory for sequence
events is less accessible the farther away they are from the performer's present position in the

sequence. Event strength (Sx) is assumed to be maximal at the current position and equal to 1; event
strength for other sequence events decreases both with absolute event distance (x) from the current
event and as event duration (t, defined here as seconds per event) increases in the following nonlinear
relationship:
Equation 3:
This function takes an initial activation a, a value between 0 (no activation) and 1 (total activation)
that represents temporal constraints on short-term memory. The exponent (x / t) refers to number of
events per unit time (similar to beats per minute in musical terms) and leads to two predictions. First,
the larger x is (the farther away an event from the current event), the weaker the event strength.
Second, the smaller t is (as tempo gets faster), the weaker the event strength. Thus, the serial
component represents a proximity-based combination of decay (over elapsed time) and interference
(over intervening events). Sequence events from the future and the past will decay faster as the
number of intervening events increases and as the rate increases.
The model makes a basic assumption common to many formal models of memory, that sequence
elements can be represented as vectors of relations among elements. The metrical similarity and event
strength of each sequence element at each distance from the current event are represented in M and S
vectors, respectively. Position within the vector represents comparisons among sequence events at
different positions and distances; the vector size is equivalent to one metrical cycle.
Finally, the two components of the model are combined in a multiplicative fashion to predict relative
event strength or activation for any event x at time t as the product of metrical similarity and serial
activation (Sx ⋅ Mx) function. The relative activations of sequence elements at each distance x from
the current event are then normalized to determine relative error probabilities for each sequence event.
The error probabilities for each event distance from the present event reflect the fact that sequence
events from greater distances have greater event strength in some cases than sequence events from
smaller distances. This is psychologically appealing because it reflects Garrett's (1980) caveat that
although speech errors often reflect access to sequence events from some distance from the error
location, it does not follow that a speaker has access to all intervening events. This model is the first
to make specific predictions for which elements are more or less accessible from various sequence
distances.
The model makes a further prediction for the absolute mean distance between any serial order error
and its target pitch. The mean range is computed as the weighted sum of the error probabilities at each
sequence distance multiplied by each sequence distance. This is shown in Equation 4.
Equation 4:

A final prediction of the model is that for any two tempos t1 and t2, such that t1 is less than (faster
than) t2, the mean absolute range for t1 will be smaller than the mean absolute range for t2. This
follows from the predictions of the product model (combination of metrical and serial components):
the serial component decreases activation of sequence elements from farther distances faster for t1
than for t2, in essence damping the effect of metrical similarity for events form larger distances. This
fact, combined with the fact that sequence events at closer distances are more accessible at fast than at
slow tempi, and sequence events at farther distances are more accessible at slow than at fast tempi,
account for the general prediction that events will be accessible from greater sequence distances on
average at slower performance tempi than at faster tempi. .
Palmer, Pfordresher and Brink (in preparation) tested the model's predictions in two experiments.
Pianists performed simple musical excerpts during which both practice and production rate (tempo)
were varied to test the predictions of the model. Increased practice and slower rate both led to fewer
errors. Performances at slower tempi generated a larger range of planning, with sequence elements
arising in errors from greater distances. Furthermore, more errors reflected nearby elements when the
music was performed at a faster tempo, and more errors reflected distant elements when performed at
a slower tempo. In a second experiment, novice child pianists performed the same task. The
performances showed relatively more serial order errors, consistent with psychological theories of
short-term memory processes that predict faster decay of information for children than adults (Brown
et al, in press). Because the initial activation parameter of the serial component predicts faster decay
of information in memory for children than for adults, the novice performances showed less
contribution of metrical frames to range of planning than the expert performances (these findings are
described further in Palmer, Pfordresher and Brink, in preparation).
Implications:
We have presented a model of a metrical framework based on accent similarity that guides the
retrieval and organization of musical events during performance; the metrical component of this
model highlights some features that may be unique to performance. First, the metrical similarity
component of the model predicts a symmetrical influence of past and future events relative to the
present. This feature may be specific to performance because memory for past and future events may
be simultaneously available and the weighting of memory may be symmetrical. Perceptual tasks in
contrast may reflect lower memory load but the burden of generating expectations for upcoming
events from past events may force listeners to have asymmetrical influences of past and future on
processing of current events. Second, the reliance on metrical grids requires only an ordinal-scale
assumption about metrical similarity, which is sufficient to generate predictions about memory
retrieval in music performance.
The proximity-based decay component of the model specifies how temporal constraints of short-term
memory can influence serial order of events in performance. The metrical and serial components
interact to moderate the influence of meter on a performer's scope of planning at different tempi.
These psychological consequences of time in memory for musical sequences may also extend to other
aspects of music performance, such as those related to tempo effects on musical interpretation and
relational invariance of motor programs.
Are principles such as metrical similarity and temporal proximity common across music perception
and performance? Strangely enough, similarity and proximity principles are more commonly found in
perceptual theories but rarer in performance theories. The model of metrical similarity described here
is simple because it has relatively few parameters: production rate, metrical grid size, and initial
activation strength. The first is established by the experimental conditions; the second is schematic

(general and abstract) and acquired through exposure, and the third reflects temporal constraints on
short-term memory and is posited to increase with age. In principle, none of these parameters need be
specific to music: even the metrical schemas, which might be most specific to musical styles and
periods, resemble in fundamental ways those proposed for language (Hayes, 1984), and perception of
their component periodicities has been modeled with dynamical systems (Large & Jones, 1999).
The model's simplicity also gives rise to its limitations. Perhaps most important is its adherence to
only one dimension of similarity among musical events: that of meter. Research in music performance
has documented other musical dimensions that influence similarity judgments or confusion errors,
including melodic contour, tonality, harmony, rhythm, and timbre (cf. Palmer & van de Sande, 1993,
1995). Another limitation is the model's inability to explain how the metrical grid is learned.
Statistical analyses of frequency distributions of note events across metrical positions document the
common compositional technique of establishing a meter by putting more notes in positions of
metrical strength (Palmer 1996; Palmer & Krumhansl, 1990), but how these are acquired in memory
is unsolved. More recently, dynamical systems models have been posited that track events over time
and generate expectancies for when events will occur, based on prior sequence structure (Large &
Jones, 1999; Large & Palmer, in preparation). Oscillators with adjustable period and phase
components may respond to periodicities represented at each level in a metrical grid, offering an
explanation of how metrical frameworks are acquired.
Musical meter provides an important testing ground for comparing the role of temporal sequence
structure in perceptual tasks (such as beat tracking) and motor tasks (such as music performance).
Comparisons between these domains may lead to the identification of different constraints on
attention and memory processes, as well as some similarities. For example, the psychological
constraints in the metrical similarity model described here reflect general principles that can be tested
in perceptual tasks, a necessary step in bridging the gap from the pianists' hand to the listeners' ear.
Acknowledgements
This research was sponsored in part by NIMH grant R01-45764 to the first author. Reprint requests
should be addressed to Caroline Palmer, Psychology Dept., Ohio State University, 1885 Neil Ave.,
Columbus Ohio 43210, USA, or to palmer.1@osu.edu.
References
Brown, G.D.A., Vousden, J.I., McCormack & Hulme, C. (in press) The development of memory for
serial order: a temporal-contextual distinctiveness model. International Journal of Psychology.
Cooper, G., & Meyer, L.B. (1960). The rhythmic structure of music. Chicago: University of Chicago
Press.
Drake, C. & Palmer, C. (1993) Accent structures in music performance. Music Perception, 10,
343-378.
Garcia-Albea, J.E., de Visa, S. & Igoa, J.M. (1989) Movement errors and levels of processing in
sentence production. Journal of Psycholinguistic Research, 18, 145-161.
Garrett, M.F. (1980) Levels of processing in sentence production. In B. Butterworth (ed,), Language
production: Speech and talk (pp. 177-220) London: Academic Press.
Gathercole, S.A. (1999). Cognitive approaches to the development of short-term memory. Trends in
Cognitive Science, 3, 410-418.

Handel, S. (1989). Listening: an introduction to the perception of auditory events. Cambridge: MIT
Press.
Hasty, C.F. (1997) Meter as rhythm. New York : Oxford University Press.
Hayes, B. (1984). The phonology of rhythm in English. Linguistic Inquiry, 15, 33-74.
Huron, D., & Royal, M. (1996). What is melodic accent? Converging evidence from musical practice.
Jones, M.R. & Pfordresher, P.Q. (1997). Tracking musical patterns using joint accent structure.
Canadian Journal of Experimental Psychology, 51, 271-290.
Jones, M.R. & Yee, W. (1997) Sensitivity to time change: the role of context and skill. Journal of
Experimental Psychology: Human Perception & Performance, 23, 693-709.
Large, E.W., & Jones, M.R. (1999) The dynamics of attending: How people track time-varying
Large, E.W,, & Palmer, C. (In preparation). Temporal response to music performance: Perceiving
structure in temporal fluctuations.
Lerdahl, F., & Jackendoff, R. (1983) A generative theory of tonal music. Cambridge: MIT Press.
Liberman, M.Y., & Prince, A. (1977) On stress and linguistic rhythm. Linguistic Inquiry, 8, 249-336.
Meyer, R.K. & Palmer, C. (submitted). Temporal control and planning in music performance.
Palmer, C. (1996) Anatomy of a performance: Sources of musical expression. Music Perception, 13,
433-454.
Palmer, C., & Krumhansl, C.L. (1990) Mental representations of musical meter. Journal of
Palmer, C., Pfordresher, P.Q., & Brink, D. (in preparation). Music errors, speech errors, and cognitive
constraints on sequence production.
Palmer, C., & van de Sande, C. (1993). Units of knowledge in music performance. Journal of
Palmer, C., & van de Sande, C. (1995). Range of planning in music performance. Journal of
Povel, D.J. (1981) Internal representation of simple temporal patterns. Journal of Experimental
Shaffer, L.H. (1981) Performances of Chopin, Bach, and Bartok: Studies in motor programming.
Sloboda, J.A. (1983). The communication of musical metre in piano performance. Quarterly Journal
of Experimental Psychology, 35, 377-396.
Thomasson, J.M. (1982) Melodic accent: Experiments and a tentative model. Journal of the
Acoustical Society of America, 71, 1596-1605.

Back to index

Reinhard Kopiez
Proceedings paper
Reinhard Kopiez
Hochschule für Musik und Theater
Emmichplatz 1, 30175 Hannover, Germany
Phone: +49-511-3100 608, Fax: +49-511-3100 600
e-mail: kopiez@hmt-hannover.de
Intonation of embedded intervals. Adaptation to two tuning systems

"On brass instruments ... equal tempered intonation is unattainable"
(Vogel, 1961, p. 97)
ABSTRACT
Amongst theories of categorial interval-perception, intonation skills seem to be of second-rate interest,
due to subjects' tendency usually to evaluate poor intonation of intervals within a deviation of 20
Cents or more as acceptable. This is in contrast to the demands made on the intonation skills of expert
musicians. The experimental setting should reveal that musicians are able to adapt to different tuning
contexts. An embedded interval paradigm and controlled varied condition were used with a 2 (tuning
systems) x 2 (players) x 5 (repetitions) design. Two professional musicians (trumpet players) served
as subjects and received the soundfiles to practise 10 days in advance. S's listened to a short, specially
composed three-part sound-example in adaptive just (JI) and equal tempered (ET) intonation via
headphone, playing the missing upper voice. As a control variable, ear-training skills were tested with
an informal tuning-identification task. Analysis of variance showed a significant difference between
the mean deviation of intonation in the two systems used. Despite variance in the five repetitions,
subjects reached a mean deviation of only 4.9 Cent in ET and about 7.5 Cent in JI. We conclude that
intonation is not an innate ability but is acquired through practice and experience.
INTRODUCTION
Since Seashore's (1938/1967, p. 218) ground-breaking studies in violin intonation, it is a well known
fact that we have to differentiate between tuning (this means an idealized system of pitch relation like
just or meantone tuning) and intonation (this is what the performer really does). As he was able to
show, the intonation of the same interval in a piece of music can be described by a distribution of
deviation, which depends on the musical context.We also know that the intonation of a performance is
influenced by expressive deviations - due to the melodic structure of a piece, the performer's expertise
and the instrument's imperfections. Until today little attention has been focused on the performer's
role. For example, the expertise theory would predict that experts can adapt to different task
constraints (e.g. different tuning systems) to a high degree. But as Fischer (1996) reports in his
file:///g|/Tue/Kopiez.htm (1 of 7) [18/07/2000 00:33:13]

Reinhard Kopiez
meta-analysis, there are no studies using controlled varied condition although we can find numerous
studies on ensemble intonation (e.g. in a string quartet). A recent extensive study by Fyk (1995) on
violin intonation concentrates on solo performance and confirms the dynamic, context-sensitive
character of intonation. As she was able to demonstrate, the player's tendency is to increase the size of
large intervals and to compress intervals smaller than a fifth in melodic (horizontal) intonation (see
also Rakowski, 1990).
Due to this lack of research, this study addresses the following questions:
● How can a musician cope with an instrument's technical obstacles and adapt to a given tuning
system?
● How reliable is intonation over different renditions?
● Are there any evident effects of tonal gravitation (Fyk, 1995), causing "islands of intonational
stability" or can we observe overall stability, indepent of the interval category?
● How important is the degree of expertise for successful task adaptation?
METHOD
Material
A short piece of music was composed, which had to fulfill the following demands:
● No expressive melodic movement, to avoid an overlap of harmonic intonation and expressive
melodic intention
● A slow tempo, to enable the player to listen to and adjust his intonation without too many time
constraints. Additionally a slow tempo would deliver signal durations which would leave a
quasi-stationary part of the note after removal of the attack and decay.
● Technical simplicity, to render the player free from technical obstacles
● A comfortable pitch range, to enable repeated renditions without effects of exhaustion
● A four-part structure, to simulate an ensemble timbre over which the subject could play the
upper part (embedded interval paradigm)
● An A-B-A form with a modulation in the B-part, to test subject's adaptation to harmonic
changes
● Chordal progression in root position only, to make identification of the harmonic context easier
● Concentration on only a few test-intervals like prime, minor third, major third, fifth and minor
seventh. Example 1 shows the score of the short composition.

Reinhard Kopiez

Reinhard Kopiez
Example 1. Test composition used in the experiment.
Two 3-part tuning versions were generated from the original midi-file, using the software
RealTimeTuner for Macintosh (V 1.2) by William Cooper
(http://socrates.berkeley.edu/~wcooper/realtimetuner.html) for the generation of the just version in 5
limit. The option "automatic chord following" was used, which corresponds to an adaptive just tuning
(for intonation details see Blackwood, 1985). In a next step sound files for CD recording were
generated from the midi-files, using a Yamaha Sampler (TG 77) with sound "French horn".
Subjects
2 trumpet players took part in the experiment: Player "H" (24 years) was a trumpet student at a music
conservatoire. He had been playing for 15 years. His additional monthly ensemble activities added up
to 6-10 hours depending on seasonal activities. Subject "K" (39 years) was a professional player in an
orchestra. He had been playing for 28 years, 19 of those professionally. His additional monthly
ensemble activities added up to 15-20 hours, mostly in an ensemble for avantgarde music. Subject "K"
was recommended by a conductor due to his outstanding intonational skills.
Procedure
Subjects received a practice CD 10 days before the recording session. The CD contained the
test-composition in ET and in this particular JI rendition in a 3-part version, and a repeated sample
tone (Eb) to tune up to. Score and solo voice were added as a print out. A short written introduction to
the subject's task was given, and subjects were asked to notate the time used for practising during the
10-day preparation phase. Their main task was to produce a "best fit" to the indicated tunings on CD.
The recording procedure took part in each subject's home. Subjects listened to the 3-part
accompaniment (at first in ET, followed by JI) through headphones and played the upper voice. The
recording was done with a DAT-recorder, using a microphone (Type Sennheiser E 608) attached
directly onto the instrument's bell. 5 renditions in each tuning system were done. The recording
session lasted about one hour. As a control variable and to assess the subjects' skills of perception, a
"surprise" informal aural test was constructed, consisting of a cadence in 3 tuning systems: (a)
pythagorean, (b) equal tempered, and (c) just. After a short trial section, explaining the features of
each tuning systems, subjects listened to the cadences in the sequence a-b-c-c-b-a. Subjects
recognized the sequence faultlessly.
RESULTS
Solo voice recordings were sampled onto hard disk (sample rate = 11.025 kHz) and a pitch analysis of
each of the 21 notes was calculated, using the software PRAAT for Macintosh (V 3.8.16) by Paul
Boersma. The first and last 200 ms from each note were removed, so that only the quasi-stationary
part of each note with a duration of 1.6 s was analyzed (see Fyk, 1995, p. 65). The module
"Periodicity" was chosen, with an FFT-size of 16.384. This results in a frequency resolution of 0.67
Hz, corresponding to a smallest difference of 1.4 c in the vicinity of 622 Hz (Eb5). Intonational
deviations were calculated using EXCEL and SPSS (V 9.0). Although figure 1 represents statistical
means of repeated renditions, a natural range of intonational deviations could be observed.
As a first step, data were analyzed with a repeated measure general linear model using the factors
tuning (2) * version (5) * interval (5) * player (2). The first factor "tuning" showed a significant effect

Reinhard Kopiez
(Hotelling's T: F (1,32) = 8.5, p = 0.006). ET had a mean overall deviation of 4.9 c with a standard
error of 0.4 and JI of 7.5 c with a standard error of 0.5 (see figure 1). As we are interested in the
H0-hypothesis (which states that there is no difference between the adaptation to different tuning
systems) we would have to reach at least an alpha error level of > 0.20. Our findings show that the
adaptation to ET is significantly better than to JI (see figure1).
As a next step we analyzed the degree of adaptation to single intervals in the two tuning systems. The
analysis of variance showed a significant effect for tuning * interval-interaction (F (4,32) = 38.7, p =
0.000). This means that the intonational deviations differed between the 5 interval categories. The
factor "player" showed no significant effect, which allows us to concentrate on the explanation of the
observed interval-effect. Let us try a simple explanation: we can hypothesize that the players used the
same intonation (ET) for both tasks. Although this explanation is somewhat provoking it is supported
by the fact that the high minor third from ET (+9.2 c) fits extremely well when used in this particular
JI renditions (should be +15.6 c). But if the major third from ET, played only 2 c higher than its ideal
pitch, is transferred to JI, it is much higher (+16c) than expected in JI (-13.7 c). The minor seventh is
too high in ET (+1.7 c) and if transferred to JI, where the seventh would be expected to be -3.9 c
lower than in ET, this error results in a total deviation of +7 c for the minor seventh.

Reinhard Kopiez
To sum up some other results: the tuning * interval-interaction produced a significant effect. Major
and minor thirds showed the smallest deviations in ET, and minor thirds the best performance in JI.
The interval-performance did not improve over the five renditions and players did not differ in their
performance in the two tuning tasks.
DISCUSSION
We can confirm the predictions from expertise theory, namely that expertise is always
domain-specific (see Ericsson, 1996): there is no evidence of successful task-adaptation if there has
not been enough time for skill acquisition. The standard tuning system for a trumpet player in an
orchestra is the equal tempered system. This seems to be already internalized in the early stage of
higher music education, as with a conservatoire student. The student player "H" already shows a
remarkable instinct for his major thirds and minor sevenths. From this point we can say that subject
"H" was not a novice and his near-perfect adaptation to ET was not expected. There is no evidence for
an automatic adaptation to a so-called "natural" tuning system like JI. Both performers had far less
expertise in ensemble playing (without piano) and had had little chance to acquire intonational skills
for JI to the same extent as for ET. We can assume that the player's adaptation to JI would be much
better for instance after one week of intensive rehearsals in a brass ensemble. On the basis of
Sundberg's (1987, p. 178) studies on intonation in barbershop singing which showed that these singers
can adapt to beat-free just intonation with a mean deviation of less than 3 c, we can hypothesize that
expert musicians are capable of perfect task adaptation. These results open perspectives for music
education. The surprisingly successful adaptation to the "unnatural" ET system shows that only
deliberate practice is required to adapt to a given task. So we cannot support Vogel's assumption (see
quotation in the header) of the unattainability of ET for brass instruments. From our point of view the
role of the human factor, the professional musician and his skill to compensate for a wide range of
imperfections has been underestimated. In other words: not the trumpet, but the trumpeter makes the
music.
Acknowledgement
A full documentation of the experiment including the sound examples can be obtained from the URL
http://musicweb.hmt-hannover.de/intonation
REFERENCES
Blackwood, E. (1985). The structure of recognizable diatonic tunings. Princeton: Princeton University
Press
Ericsson, K.A. (Ed.) (1996). The road to excellence. The acquisition of expert performance in the arts
and sciences, sports and games. New Jersey: Erlbaum.
Fyk, J. (1995). Melodic intonation, psychoacoustics, and the violin. Zielona Gòra: Organon.
Fischer, M. (1996). Der Intonationstest. Seine Anfänge, seine Ziele, seine Methodik. [The intonation
test. Its history, its aims, and its method]. Frankfurt: Lang.
Rakowski, A. (1990). Intonation variants of musical intervals in isolation and in musical contexts.
Seashore, C.E. (1938/1967). Psychology of music. Reprint New York: Dover Publications.

Reinhard Kopiez
Sundberg, J. (1987). The science of the singing voice. Illinois: Northern Illinois University Press.
Vogel, M. (1961). Die Intonation der Blechbläser. [Intonation on brass instruments]. Bonn: Orpheus.
Back to index

gesture paper JM
Procedings paper
MUSICAL GESTURE, ANALYSIS, AND LISTENING

Zohar Eitan, Department of Musicology, Tel Aviv University
Reflecting on non-reflective listening
In Music, Imagination, and Culture (Cook 1990: 152-60) Nicholas Cook contrasts "musicological listening," the reflective listening of the musician or music scholar
"whose purpose is the establishment of facts or the formulation of theories" with "musical listening," the ordinary listening experience of the Western concertgoer, whose
aim is "direct aesthetic gratification." While musicological listening observes music as a perceptual object to be analyzed, judged, or acted upon, musical listening is
holistic, direct, and non-dualistic: the listener is engrossed in the music, rather than observing it as an object.
Though Cook naturally does not specify what one attends to when immersed in such experience, some of his observations are suggestive. First, musical listening is
aesthetically oriented, by which Cook primarily means (following an established tradition ranging from Baumgarten to Dahlhaus) that it aims at perceiving a piece as an
integrated, unified whole. Second, its main object is the affective content of the piece, rather than its structure or analyzable components. Thus, commenting on a
description (by John Sloboda) of a common reaction to a recent listening experience, where the listener recalls no themes, rhythms, or harmonies, but rather a
metaphorically represented impression of affective and dramatic processes, Cook remarks that "when this happens, it is the affective content conveyed, however
inadequately, by means of such description, and the sense of satisfaction engendered through absorption in a piece of music, that is the real object of the listening process,
and not the sound of the orchestra, not the score, nor any musicological representation of what was played" (p. 160). The unity pursued by the "musical" listener is, then,
not the structural unity sought by a Schenkerian or a Grundgestalt analyst, but unity of affect or psychological narrative. Third, musical listening is grounded in primal
responses, rather than in explicitly-learned knowledge. "We all listen to music on an elementary plane of musical consciousness" and "respond to music in a primal and
almost brutish level--dumbly, as it were, for on that level we are all firmly grounded," Cook cites Aaron Copland (p. 165).
Cook is very specific about what we don't attend to in musical, as opposed to musicological, listening. Notes and intervals are not the objects of listening musically, but
rather of reflection upon the music and its notation, i.e., of musicological listening. Musical listening also has little to do with the tenets of music analysis. Large-scale
tonal relationships, conventional musical forms, thematic or serial transformations--in short, basic objects of mainstream music theory and analysis--have no bearing on
musical listening, aimed at unmediated experience. In fact, as a number of empirical studies by Cook and others indicate, such objects are in most circumstances
imperceptible to listeners, and even when perceived they are irrelevant to the listeners' aesthetic and expressive concerns (Cook 1987, also discussed in Cook 1990: 43-70;
Francès 1988, Gotlieb & Konecni 1985, Konecni 1984, Millar 1984, Tillman & Bigand 1996).
Still, even such non-reflective musical experience, though not directly concerned with structure and other "musicological" entities, must be mediated, though
subconsciously, by perceptual, cognitive, and emotive processes; and these do presumably relate to properties of the musical pieces attended to (or of mental
representations of these pieces). Hence, we may want to look "musicologically"--that is, reflectively and analytically--at musical features that possibly engender the
non-reflective, non-analytical, "musical" listening.
The question motivating the present paper is, then: what musical structures or processes may be associated with the expressively and aesthetically-oriented experience
Cook calls "musical listening?" We can, of course, still maintain, despite the somewhat disappointing empirical evidence, that structures proposed by established
music-theoretical models do engender that experience, though without the listener's conscious awareness of their existence. After all, for decades cognitively-oriented
music theorists such as Meyer, Narmour, or Lerdahl & Jackendoff, have explicitly presented their notions of musical structure as hypotheses regarding the listener's
file:///g|/Tue/Eitan.htm (1 of 11) [18/07/2000 00:33:32]

gesture paper JM
subconscious perceptual and cognitive processes. Yet, we may take another, perhaps more interesting, route to "musical" listening by investigating alternative analytical
objects that may reflect more directly Cook's characterization of the listener's concerns. Thus, we may want to focus on simple, even primitive musical shapes and
processes, not necessarily derived from the enshrined notions of tonal, posttonal, or serial analyses, that may engender or support a coherent affective or dramatic
impression of a piece of music.
In this paper I illustrate, through examples taken from frequently-analyzed common-period pieces, what such alternative analytical objects may be, and characterize the
musical structures and processes these examples present. My examples concentrate on objects of thematic or motivic significance, that is, objects that are repeated in
diverse variants and contexts throughout the piece, thus possibly contributing to a sense of coherence and unity. These "themes" and "motives," however, are somewhat
different from those described in most analytical literature.
Different motives, different themes
Example 1a presents the opening motives (marked "x" and "y") of Beethoven's piano sonata op. 31 no. 3, in Eb major. My concern here is with the initial motive, x.
Example 1b, from the concluding phrase of the sonata's opening theme, presents 5 successive variants of motive x. Yet the motive's intervals--a fifth followed by a
prime--are retained only in the first of these variants (x2), its characteristic rhythm is used only in two of the five variants, and its distinctive harmony is completely
absent. What, then, are these variants of? What, in other words, is motivic about motive x itself?

gesture paper JM
I suggest that while neither intervallic, tonal, voice-leading, or exact rhythmic configurations--the supposed constituents of musical motives--are retained in these variants,
a different kind of motive emerges: a complex of simple, broadly-defined relationships in various parameters. Here, this complex consists of a wide fall in pitch,
articulated legato, from an accented yet shorter note to an unaccented, yet longer one.
Beethoven defines this complex as a musical idea through a careful, gradual process, in which intervallic and rhythmic figures are altered, while the above broad features
are retained (Example 1b). First (starting at x3), the original leap of a perfect fifth is stretched; then, while the continued process of intervallic stretching and a held upper
pitch provide for association between successive motives, the truncated and rhythmically diminished x4-x6 are presented, embedded within a variant of y. Once

gesture paper JM
established, the complex itself may serve as a basis for variation, and some of its constituents may thus be altered (see, e.g., Example 1e, where the upward leap x10
replies the preceding falls, x5' and x6).
What distinguishes the above notion of a motive from conventional ones is that the motivic features here are not exclusively musical. Unlike intervals, scale degrees, or
exact proportional rhythms, relationships such as longer/shorter, wider/narrower (for pitch intervals), or higher/lower can be expressed just as well by non-musical sound
gestures, such as speech intonations or expressive vocalizations; they can also be depicted by analogous patterns of motion in space, in which, for instance, "fall" in
metaphorical pitch space would be replaced by an actual fall in physical space. Beethoven's initial idea is, then, not so much an exclusively musical motive, but a sound
gesture: a potentially kinesthetic, cross-modal figure outlining a pattern of change, motion, or interaction of forces.
Example 2, from Beethoven's Piano sonata op. 57 in F minor, the Appassionata, presents this gestural quality in an even clearer way, since what is "motivic" here is not a
specific complex of features, but a particular quality of change, expressible analogously through various musical features. This gesture (only some of its many
manifestations in the Appassionata are presented here) ruptures the terminal, accented event of a figure or phrase from the preceding events through an abrupt or extreme
change, and thus creates a striking discontinuity just where continuity is most expected (points of rupture are marked with an arrow in Example 2).
Example 2

gesture paper JM

gesture paper JM

gesture paper JM
Beethoven's "rupture" gesture has two contrasting directions. In one (Examples 2a, 2c, 2d), rupture is generated through a sudden decline in intensity; in the other
(Example 2b), through an extreme and sudden intensification. Though both versions are most frequently generated through dynamics, dynamic changes are often
accompanied or replaced by analogous actions in other parameters. Thus, sudden decrease in intensity is delineated through a subito piano, sometimes following a
crescendo (Example 2c), but also by a sudden slowdown of rhythmic activity (often through the elimination of a dense figuration) or a drop to a lower register (Example
2d). Abrupt intensification may involve a subito ff, but also increased attack rate and textural density, as well as a steep registral expansion and a pitch rise (Example 2b).
Furthermore, while initially performed by "brute" agents like dynamics and attack rate, in latter variants the rupture is enacted by harmonic syntax itself, as the disrupting
function is rendered by replacing an expected cadential resolution with a dissonant, chromatically altered, substitute (see Example 2e, the dramatic open ending of the
sonata's 2nd movement). Harmonic syntax thus emulates gestures established by the supposedly "secondary" parameter of dynamics--a surprising discrepancy with our
cherished hierarchy of musical parameters in tonal music.
What makes the figures in Example 2 variants of a single gestural "motive" is, then, not their shared musical features but the act they all perform: severing an anticipated
termination from its "body"--the figure or phrase it is supposed to close and resolve--through an abrupt and extreme change. The musical parameters and procedures
enacting this change, its "direction," (increase or decrease in intensity) and its specific melodic, rhythmic, and harmonic constituents, may all be different in each case. It is
the act itself that is motivic here, not any specific musical rendering of this act.
How may a listener hear such gestures and respond to them, consciously or subconsciously? Obviously, an ability to perceive immediate and short-range structural
relationships is a prerequisite. For instance, in Example 2c continuity from m. 40 to 41 is suggested by local harmonic and voice-leading implications (m. 41 resolves a
dissonant V34 harmony and its upper-voice leading-tone, and serves as a goal of a passing I-V34-I6 progression) and by rhythmic and melodic conformance with preceding
figures and phrases (the rupture in m. 41 interferes with a repetition of the mm. 35-39 phrase and severs an omnipresent rhythmic figure). The impact of dynamic rupture
in m. 41, which clashes with these implied continuities, cannot be perceived, then, without sensitivity to the immediate implications of harmonic syntax, voice-leading,
and grouping structure (though not necessarily to their larger, long-range aspects). Yet the various figures in Example 2, enacted by very different parameters and
procedures (e.g. a subito pp and a chromatic deceptive cadence), cannot be perceived as expressing a single expressive act, a unifying gestural motive, unless one
transcends the structural and the syntactic. What associates these gestures more than any structural similarity is their shared expressive allusion: a violent interference with
an implied course of events, frustrating an expected close and resolution; and it is shared expression that enables them to serve as agents of expressive coherence--the aim
of "musical hearing."
My final example illustrates gestural coherence on a larger scale. If analogues of expressive action indeed play an important role in musical experience, one may look for
such analogues also on a dimension larger than that exhibited by the small-scale examples above. The principal phrase-groups in the Appassionata's opening movement,
most of which can be conceived as variants of a single gesture (which I call, for reasons soon to be clarified, the Sisyphean gesture), provide an example of such wide
gestural utterance.
The overall melodic contours of most themes and phrase-groups in this movement outline vast, asymmetrical curves, circuitously rising several octaves, only to fall back
again, swiftly and directly. These arches all delineate a process in which musical activity intensifies and musical patterning gradually disintegrates as the curve rises,
finally breaking down into a repetitious, shapeless fall. Structurally, this process may be described as a four-phase progression:
(1) A low register group, symmetrically and hierarchically articulated, presents distinct rhythmic and melodic figures. Despite the group's neat symmetry, some facets of
its organization, such as a metrical clash between parts (2nd theme), or a steep initiating ascent (opening theme) charge the music with tension and instability.
(2) A repetition of this group, higher in pitch.
(3) A further rise, characterized by simultaneous processes of intensification and disintegration, conveyed through a host of alternative means. Intensification may be
conveyed by a progressive shortening of phrase length, a steeper rise, an acceleration of attack rate, harmonic progression (tonic harmony is usually situated at the registral
bottom, while dominants or their substitutes are at the top), or dynamics. Disintegration is conveyed by thematic liquidation (a progressive elimination of distinct melodic
gesture paper JM
and rhythmic patterns), and by gradually breaking symmetrical and hierarchical phrase structure into a chain of short, uniform figures. This phase may lead to a climactic
registral highpoint, marked by a shiver-like trill or arpeggiation over a dissonant harmony, or break directly into phase 4.
(4) A rapid fall back through several octaves to the initial register, composed of a repetitious and uniform melodic and rhythmic pattern over an unchanging harmony
(prolonging the peak harmony, to be resolved only at the registral bottom). Thus, falls present a paradoxically active inertia, displaying the fastest rate of motion, yet the
slowest rate of change. If very fast, the fall's "momentum" may generate, near its end, a brief bounce upward (e.g., mm. 15-16).
Table 1 marks these phases in the movement's three main subjects and in the climactic retransition ending its development section.
Phase 1 Phase 2 Phase 3 Phase 4
Principal subject mm. 1-4 mm. 5-8 mm. 9-13 mm. 13-16
(mm. 1-16)
Secondary Subject mm. 35-39 mm.39 -41 mm. 41-46 mm. 47-51
(mm. 35-51)*
Closing Subject mm. 51-54 mm. 55-58 mm. 59-60 mm. 61-65
(mm. 51-65)
Climax & mm. mm. mm. mm.
retransition (mm.
109-135) 109-113 113-117 117-126 126-135
* Stages 2 & 3 are merged in this subject
Table 1. The four stages of the "Sisyphean" gesture

The Appassionata's Sisyphean gesture, like my previous examples, outlines a process that can be mapped onto different melodic, harmonic, and rhythmic contexts, and
may be applied using diverse, though analogous, musical strategies and parameters. Again, yet on a much larger scale, it is a specific manner of change and (metaphorical)
motion that is "thematic" here--that provides for association and coherence among diverse groups--rather than a particular melodic, harmonic, or rhythmic configuration.
Here, however, the thematic gesture is constantly related to one "brute" musical parameter, melodic contour; and it is this relationship that charges the process with
kinesthetic and expressive qualities. Depicting "rise" (phases 1-3) as a laborious process, which gradually, amid intensifying efforts, surrenders to opposing forces and
breaks into an entropic, shapeless fall (phase 4) evokes two levels of metaphorical projection. First, the gesture serves as a musical icon of bodily rise and fall, using the
metaphor of registral "space" to depict increase in pitch as analogous to a bodily climb, struggling with gravity, and pitch decrease as a roll downhill, succumbing to it.
But furthermore, it materializes an extension of this bodily polarity: the opposition between an effort-filled, intentional and structured action ("rise"), which resists--and
finally succumbs to--a strong and constant opposing force, and a passive, entropic motion, submitting to a strong outside force ("fall"). It is an enhanced motion (and thus
faster and more direct), since it is supported by, rather than resisting, external forces. Yet, being the passive outcome of the climb's growing fatigue and entropy, it is
structureless and unproductive. Repeating this embodied process over and over thus serves to present a truly Sisyphean image--the image of a repetitious labor, doomed to
fail--substantially contributing to the tragic ethos of the Appassionata.
This Sisyphean process makes sense--structural sense as well--when we hear it not only in terms of pitch motives, phrases, or harmonic progressions, but
metaphorically--as an embodiment of human motion and action. To hear each of these sections as a unified process, to grasp them all as variants of the same thematic
gesture, and even to grasp significant details within them (such as the "bounce upward" ending rapid falls) a listener has to experience them as expressive motions. And to
gesture paper JM
do that one has, once more, to transcend the "structural" and attend to their kinesthetic ramifications.
Gesture, expression, and "musical listening"
This paper presented three examples of motivic or thematic gestures. Different in many respects, these examples share one important feature: while presented within the
context of tonal syntax and utilizing conventional Classic forms and procedures, their defining properties are not exclusively musical. Thus, the motives in Example 1
were characterized in terms of broad and simple relationships between two sound events, relationships not intrinsically related to a specific musical syntax or style, or
even to music as such. The two Appassionata gestures (see Example 2 and Table 1) were described in even more general terms, delineating forces, actions, or processes
analogous to bodily motion, rather than specific musical or sound parameters. These processes were again represented mainly (though not exclusively) by "brute" aspects
of the musical surface, such as changes in dynamics, overall melodic contour, or rhythmic density, aspects shared by music and the most rudimentary modes of sound
expression.
I suggest that this use of non music-specific properties of the musical surface as primary compositional tools is closely related to the expressive capacity of the examples
above, and in particular to the kinesthetic or cross-modal analogies they may convey. Since such elementary, "proto-musical" properties of sound are primarily associated
with everyday expressive contexts, they may communicate expressive properties to the listener more directly, and with lesser difficulty, than the intricacies of tonal syntax
or conventional form. Speech intonations, expressive vocalizations, or the inflections of "baby-talk"--our first experiences as musical listeners and performers (Papousek
1996)--all use melodic contour, dynamics, pace of utterance, manner of articulation, or relative duration as expressive means (but not, for instance, enharmonic
modulations). The identification and proper use of such expressive means has, then, been for each of us a matter of great practical importance since our first months. One
thus expects similar expressive means to be easily and directly discernible in musical contexts as well (at least for the "ordinary" concertgoer in "ordinary" circumstances,
that is, for Cook's "musical listener"), perhaps more directly than devices particular to a specific musical syntax or style, even if the listener is well acquainted with this
style.
Furthermore, expression in sound may be inherently kinesthetic and inter-modal, and thus intrinsically "gestural." This may be the case, firstly, since in natural contexts
sound and its qualities are used as a means for identifying and obtaining information about a sound-source, and are thus associated with such qualities as location,
dimensions, power, and the direction and pace of motion (see Bregman 1990 for a comprehensive account of such "ecological" application of sound). More specifically,
intermodality may originate in the earliest experiences of sound as an expressive and communicative means, since pre-verbal infant-parent interactions involve
simultaneous and analogous uses of vocal, tactile, kinesthetic and vestibular stimulation (Sullivan and Horowitz 1983, Stern 1984).
Indeed, hypotheses that different modalities are analogous in their temporal and intensity curves have been suggested and corroborated empirically in diverse contexts
(Stern 1984 for infant communication, Todd 1985, Friberg and Sundberg 1997, for expressive musical performance, Clynes and Nettheim 1983 for the expression of basic
emotions; see also Sloboda 1998 for a general discussion on the possible role of such analogies in musical expression). The notion of musical gesture suggested by the
examples above informally applies such analogical perspective, implying that musical events and processes may be experienced and described as changes in intensity or
"energy" level. In the above musical analyses, this notion is applied at two levels. First, I suggest (following a number of music theorists, e.g., Berry 1976, Cohen 1971)
that intensity changes in different musical parameters may be analogous, and hence may create analogous shapes and patterns. Thus, equivalent patterns of intensity
change may be expressed by different musical domains (as in Example 2).
Furthermore, such perspective suggests that patterns of intensity change in music may also be heard as analogous to extra-musical intensity curves, in particular to those
involving bodily motion. This perspective informs my kinesthetic and spatial approach to the Appassionata's "Sisyphean" processes.
In the above discussion and musical analyses I presented a hypothesis concerning the musical objects and processes ordinary "musical listening" may attend to, and the
facilities that may enable it. I suggested that sensitivity to the expressive, kinesthetic, and inter-modal associations of simple proto-musical factors, as encountered in
extra-musical situations (combined with sensitivity to some aspects of surface musical syntax) may take our "musical listener" a long way toward a non-propositional
grasp of music as an expressive, dynamic whole. Not less important, I suggested (and tried to exemplify) that such "brute" elements can also be used as sophisticated
gesture paper JM
compositional tools, creating intricate thematic and motivic connections and processes, and thus enhancing structural coherence in music.
Here, perhaps, is a possible bridge between Cook's two separate worlds--"musical" listening and "musicological," that is analytical, discourse. It is, though, a bridge that
has yet to be erected.
References:
Berry, W. (1976). Structural Functions in Music. Englewood Cliffs, N.J: Prentice-Hall.
Bregman, A. S. (1990). Auditory Scene Analysis. Cambridge, Mass.: MIT Press.
Clynes, M. & E. Nettheim (1983). The living quality of music. In M. Clynes (ed.), Music, Mind, and Brain: The Neurobiology of Music. New York: Plenum, 47-82.
Cohen, D. (1971). Palestrina counterpoint: a musical expression of unexcited speech. Journal of Music Theory 15/1, 99-111.
Cook, N. (1987). The perception of large-scale tonal closure. Music Perception 5 (2), 197-205.
--------. (1990). Music, Imagination, and Culture. Oxford: Clarendon.
Cumming, N. (1997). The subjectivities of 'Erbarme Dich.' " Music Analysis 16 (2), 5-44.
Eitan, Z. (2001). Thematic gestures: Theoretical preliminaries and an analysis. Orbis Musicæ 13.
Francès, R. (1988). The Perception of Music. Translated by W.J. Dowling. Hillsdale: Erlbaum.
Friberg, A., and J. Sundberg (1997). "Comparing Runners' Decelerations and Final Ritards." In A. Gabrielsson (Ed.), Third Annual ESCOM Conference: Proceedings.
Uppsala, 582-586.
Gotlieb, H., & Konecni, V.J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg Variations by Johann Sebastian Bach. Music Perception 3
(1), 87-102.
Hatten, R. S. (1997-99). Musical gesture: on-line lectures. Cyber Semiotic Institute, University of Toronto. URL: http://www.chase.utoronto.ca/epc/srb/cyber/hatout.html
Konecni, V.J. (1984). Elusive effects of artists' "messages." In W. R. Crozier and A. J. Chapman (Eds.), Cognitive Processes in the Perception of Art. Amsterdam: North
Holland, 71-93.
Lidov, D. (1987). Mind and body in music. Semiotica 66, 69-97.
Millar, J.K. (1984). The aural perception of pitch-class set relations: A computer-assisted investigation. Ph.D. dissertation, North Texas State University, 1984.
Papousek, M. (1996). Intuitive parenting: a hidden source of musical stimulation in infancy. In I. Deliege and J. Sloboda (Eds.), Musical Beginnings: Origins and
Development of Musical Competence. Oxford, New York, and Tokyo: Oxford University Press, 88-112.
Sloboda, J. (1998). Does music mean anything? Musica Scientiæ 2/1, 21-32.
Stern, D. N. (1984). Affect atunement. In J. D. Call, E. Galenson, and R. L. Tyson (Eds.), Frontiers of Infant Psychology, Vol. 2. New York, Basic, 3-14.
Sullivan, J. W. and Horowitz, F. D. (1983). Infant intermodal perception and maternal multimodal stimulation: implications for language development. In L. P. Lipsitt and
C. K. Rovee-Collier (Eds.), Advances in Infancy Research, Vol. 2. Norwood, N. J.: Ablex, 183-239.

gesture paper JM
Tillmann, B. & Bigand, E. (1996). Does formal structure influence perceived musical expressivity?. Psychology of Music 24, 3 - 17.
Todd, N. (1995). The Kinematics of Musical Expression. Journal of the Acoustical Society of America 97/3, 1940-1949.
Back to index

Mr Matthew M Lavy
EFFECTS OF EXTRA-MUSICAL NARRATIVE ON EMOTIONAL RESPONSE TO MUSIC
Mr Matthew M Lavy
mml1000@jesus.cam.ac.uk
Background:
The traditional dispute between essentialists and referentialists in the study

of emotional response to music appears to have been resolved. Most researchers
now accept that music is able to evoke emotions both by virtue of its structure
and by sparking extra-musical associations. This reconciliation, however, has
as yet had limited effect on empirical work: there is still a division between
musicologically inspired research, which focuses on the emotionality of musical
structures, and those studies which concentrate on the role of music in
everyday life, for whom extra-musical context is of primary interest.
Relatively few studies have investigated interactions between these domains.
Aims:
This study is intended to investigate the effect of listening context on

perceived emotionality of non-vocal music, specifically, the influence of
explicit extra-musical narratives on listeners' emotion judgements.
method:
Participants will be presented with a series of short self-contained

narratives, each of which will be accompanied by one of several musical
extracts. The pool of extracts consists of music that has been used in previous
investigations. After each presentation, participants will be required to
answer questions relating to the factual content of the narratives
(participants will be told that this is the primary task), and to rate the
musical extracts on a number of standard scales.
Results:
It is anticipated that music which contains structures considered to be highly

emotive will be judged independently of the narratives, but that music whose
structures are normally considered to be neutral will "appropriate" the
emotional valence of narratives. Pilot studies have corroborated this
hypothesis.
Conclusions:
Music and extra-musical context do not operate independently in the evocation

of emotional responses, but interact in complex ways. Research on emotional
response to music must take account of the listening contingencies of everyday
life.
file:///g|/Tue/Lavy.htm (1 of 2) [18/07/2000 00:33:33]

Mr Matthew M Lavy
Back to index
file:///g|/Tue/Lavy.htm (2 of 2) [18/07/2000 00:33:33]

TueAM1_2 Belardinelli
Proceedings paper
LOOKING FOR THE ANCHOR POINTS FOR MUSICAL MEMORY

Marta Olivetti Belardinelli*°, Clelia Rossi Arnaud*°, Giuliana Pitti°, Stefania Vecchio°
* ECONA (Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial
Systens)
° Department of Psychology, University of Rome "La Sapienza"
Background and significance
Musical messages are processed sequentially in time without any possibility of coming back to
previously heard elements if they are not clearly understood. For this reason decoding musical
meaning entails that what is heard must be kept in memory in order to process it and reach message
comprehension.
In accordance with Lerdahl & Jackendoff (1983) the literature considers Tonality as a powerful mean
for anchoring memory while the musical message is being processed. However shared capabilities of
processing either music of other cultural traditions as well as contemporary atonal music led
psychologists to assume that during listening the subject develops a temporary, perceptual and
contextual hierarchy of tensions, and relaxations of tension and this hierarchy is continuously revised
as listening goes on (Imberty, 1999; see also Butler, 1990).
Also from the musicians’ point of view, the cognitive relevance of the relations within the music
event and those between the sonic event and the auditory environment is determined by their
variability in time according to the perceived similarity or contrast (Cifariello Ciardi, 1999). This
position is in agreement with the cognitivistic approach which attributes to perceived characteristics a
precise function in structuring auditory time. According to both the musicological and the
psychological approaches the role of memory in establishing the temporary relative hierarchy of
auditory events and in determining what is heard as salient, while the sequential processing is still
developing, is implicitly hypothetized. Moreover, in both cases, the stress on the listener’s perceptual
characteristics sends back to the subjective state of awareness that deeply influences recognition
memory.
It is possible to measure subjective states of awareness in recognition memory by means of
Remember (R) and Know (K) responses deriving from the episodic or from the semantic memory
respectively (Tulving, 1985). In the rich literature adopting these measures there is also evidence that
R responses, as reports of recollective experience, are heavily under the influence of perceptual
factors (Rajaram, 1996), while K responses, taking place on the basis of misty impressions of
familiarity, are influenced by higher cognitive processes like conceptual learning and relational
encoding (Conway, Gardiner, Perfect & Anderson, 1997).
Tulving’s paradigm with musical material was previously used by Gardiner’s group (1995, 1996) to
verify Tulving’s hypothesis on the complete independence of the two types of responses. The same
paradigm was used by our group to investigate the structural and age factors influencing recognition
memory for musical themes (Olivetti Belardinelli, Rossi Arnaud, 1999).
file:///g|/Tue/Belardin.htm (1 of 3) [18/07/2000 00:33:34]

Aims
In this research we shall adopt Tulving’s paradigm in order to investigate what could help memory
while listening to music and by this way to determine which are the perceived characteristics that
contribute to build up the subjective temporal schema implicit in every temporal processing.
In particular, we will investigate if it is possible that salience prevails over tonality in determining
how much a musical theme is remembered. In case of a positive answer the problem of cognitive
processing of atonal music will be overcome.
As the concept of salience is not univocal a subsidiary goal will be considered: it consists in
determining possible differences or correspondences between objective salience, as defined by
Musicology, and perceived salience as assessed by Psychology.
Method
We used a set of 48 new melodic stimuli composed on the purpose by F. Cifariello Ciardi. The stimuli
are grouped in 4 categories (12 stimuli each) according to the musical genre: Tonal Salient (TS);
Tonal NonSalient (TNS); NonTonal NonSalient (NTNS); Non Tonal Salient (NTS). An operational
definition of salience was given to the composer on the basis of a predefined grid of both tone
intervals and temporal parameters.
The stimuli are divided in two lists of 24 stimuli each (6 stimuli for each category). In the study phase
subjects listened to one of the two lists 1,2,or 3 times. Afterwards in the test phase subjects listened to
all 48 stimuli and had to recognize the stimuli they heard earlier by means of R or K responses.
296 Subjects with different characteristics of age and experience were examined.
Besides the statistical analysis of the collected data a musicological analysis of the stimuli was
performed by E.Pozzi.
Results
Synthetically our results show that: additional study trials improved recognition in adults but not in
younger children; gestaltic accentuation enhanced precise recollection expecially for Salient stimuli;
more R responses are given for Salient stimuli and, when salience is absent, for Tonal ones.
By means of the Thyeory of Signal Detection we found that 6 stimuli (3 of which NTS) are
recollected better than the others of their category. 7 stimuli (3 of which TS) are remembered worse
than the others of their category. A good correspondence was found with the results of the
musicological analysis.
The cluster analysis shows 2 main clusters corresponding to all Salient stimuli (TS + NTS) on one
side and to all Non Salient stimuli on the other (TNS + NTNS).
The ANOVA shows: significantly more R responses for all Salient stimuli; more K responses for TS;
less K responses for SNT; more wrong R responses for NSNT; more wrong K responses for ST; more
wrong recognitions for all Non Salient stimuli (the highest level for NSNT). These rsults confirm that
in the development of the relative hierarchy of incoming musical events Salience prevails over
Tonality in affording a better anchorage for musical memory.
References
Butler, D. (1990). A study of event hierarchies in tonal and post-tonal music. Psychology of Music, 18

(1), 4-17.
Cifariello Ciardi, F. (1999). Natura e funzione della discontinuità nell’ ascolto musicale. Draft for the
ECONA Symposia.
Conway, M.A., Gardiner, J.M., & Perfect, T.J. (1997) .Changes in memory awareness during
learning: The acquisition of knowledge by psychology undergraduates. Journal of Experimental
Psychology: General, 126 (4), 393-413.
Gardiner, J.M., Kaminska, Z., Dixon, M., & Java, R.I. (1996). Repetition of previously novel
melodies sometimes increases both remember and know responses in recognition memory.
Psychonomic Bulletin and Review, 3 (3), 366-371.
Imberty, M. (1999). Continuité et discontinuité de la matière sonore dans la musique du XX siècle.
General Psychology, 3-4, 49-69.
Java, R.I., Kaminska, Z., & Gardiner, J.M. (1995). Recognition memory and awareness for famous
and obscure musical themes. European Journal of Cognitive Psychology, 7 (1), 41-53.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge: MIT Press.
Olivetti Belardinelli, M., & Rossi Arnaud, C. (1999). Recollection and familiarity in recognition
memory for musical themes. In Proceedings of the XI Conference of the European Society of
Cognitive Psychology. Academic Press, Gand, Sept. 1-4, 1999, p.193.
Rajaram, S. (1996) Perceptual effects on remembering: Recollective processes in picture recognition
memory. Journal of Experimental Psychology: Learning, Memory & Cognition, 22, 365-377.
Tulving, E. (1985). How many memory systems are there? American Psychologist, 40(4), 385-398.
Back to index

TueAM2 Pitts
Proceedings paper
An Inherited Musical Self?

Exploring musical ability within an adoptive family
environment
Stephanie E. Pitts & Jane W. Davidson
Department of Music,
Sheffield, S10 2TN, U.K.
E-mail contacts: s.e.pitts@sheffield.ac.uk j.w.davidson@sheffield.ac.uk
FAX: +44 114 266 8053

Phone: +44 114 222 0475/81
Abstract
This paper examines recent theories of children’s motivation and strategies in music
learning, within a case study of an adoptive family in which the middle child takes on a
teaching role in his younger sister’s practice. Questions of heredity and environment are
raised by the adoptive relationships, so that this research offers a new perspective on the
ongoing nature/nurture debate in music research. Video evidence and in-depth interviews
with each family member are used to explore their perceptions of each other’s skills and
roles in relation to music. The sense of freedom expressed by the parents allows the
individuality of the children to emerge strongly, as the family seek to provide narratives that
explain the children’s skills and interests in relation to their family and culture of origin, as
well as their existing environment.
Introduction
This paper brings together two seemingly diverse areas in which psychological research has
advanced understanding in recent years: the dynamics of adoptive families, and children’s
motivation and strategies in music learning. Dealing with a single case study family, we are
able to look at questions of heredity and environment in the development of musical skills.
The focus of the investigation is the Eccles family, who live in a wealthy suburb of a major
city:
file:///g|/Tue/Pitts.htm (1 of 6) [18/07/2000 00:33:36]

TueAM2 Pitts
Bob (father) - an accountant, with a passion for listening to music.

Suzie (mother) - an interior designer, with a busy practice in an affluent suburb.
Sean (aged 16) - a studious teenager, who enjoys maths and music.
James (aged 12) - an energetic boy, with an intense enthusiasm for music.
Lucy (aged 10) - a cheerful girl, who is passionate about dancing and art.
What is immediately interesting about this family is not that the children are adopted -
although this generates fascinating insights from parents and children alike - but that James
has taken on a teaching role in his younger sister’s clarinet practice. Video evidence shows
Lucy patiently working through a 45 minute practice session, in which James takes on an
adult teaching role, despite their age gap of only two years. This paper seeks to extrapolate
some of the musical and social implications of this unusual behaviour, and to use this
family’s perception of adoption to highlight more general areas of interest concerning
musical development and family interactions.
Evidence has been gathered through longitudinal interviews with James, Lucy and their
adoptive parents, video recordings of their practice sessions, and a detailed case study
interview with the family. The case study approach, whilst not giving generalisable data, is
shown in this instance to be of great value in accessing perceptions of music in the family,
and so increasing our understanding of musical behaviour in this non-genetic family
environment. This paper therefore offers an important new perspective on the ongoing
nature/nurture debate in music research (Sloboda et al, 1994; Howe et al, 1998; Plomin,
1998; Sternberg, 1998), and offers educationalists insights into the family dynamics
supporting music learning.
The research context
A belief exists that musical ability is inherited rather than acquired, and whilst there is
research evidence to support the importance of inherited physical and mental attributes
(Kemp, 1996; Plomin, 1998), behavioural research (Davidson et al, 1996; 1997; Pitts et al, in
press) shows the impact of environmental influences, especially the quality of parental
support. Sloboda and Davidson (1996) have discussed at length the transfer that needs to
occur between extrinsic (usually parental) and intrinsic (self-generated) motivations to
practice. Looking specifically at practice activities, Hallam (1997; 1998) has revealed that
‘repetitious practice to automatize ... developing cognitive, aural and technical skills’ (1998:
145) is critical to the progress of the young musician. Linking these two arguments, we can
see that the initial stages of practice will appear challenging and therefore require external
assistance, but that once automaticity is achieved, an individual sense of developing skills
will generate more intrinsic pleasure and so increase motivation.
To our knowledge, there has been no specific research on music in adoptive families,
despite a number of investigations looking at changes to intelligence quotients (IQ) under
different adoptive conditions (Capron & Duyme, 1989). Adoptive families make more explicit
the family interactions that are of interest to researchers in music, as the parents will often
speak openly about ‘acquiring their sense of entitlement in order to feel fully a parent’ (Hill,
1998: 39). Balancing the perceived importance of genetic and environmental influences
raises difficult questions for adoptive parents, given that Howe (1998) reports that ‘in many
key areas such as temperament, sociability, personality, skills and interests, as well as

TueAM2 Pitts
intelligence and physical characteristics, genetic inheritance plays a significant part’ (1998:
8). A large scale US survey of ‘mental health of adopted adolescents’ (Benson et al, 1994)
showed that genetic roots were less of an everyday concern for the children, who tended to
explain their skills and behaviours in terms of patterns learnt within the adoptive family,
rather than their family of origin. Adoptive relationships are subject to the same strains as
any other family (Triseliotis et al, 1997), with sibling rivalry, age differences and sibling
position effects (Sulloway, 1996) occurring amongst the children. Trans-cultural adoptive
families face additional challenges, and the Eccles family falls into this pattern, the parents
being of white European heritage, and the children of South American origin, coming from
Chile and Columbia. The literature is divided in its analysis of trans-racial adoptions, with
Hoksbergen (1997) emphasising the greater responsibilities that are placed upon the
parents, whilst accounts reported by Austin (1990) suggest that the comments of other
people are most likely to cause distress to parents and children (cf. Bartholet, 1993).
The case study family
Perceptions of the children’s characteristics
Suzie and Bob Eccles take a robust approach to their parenting roles, and Suzie says ‘It’s
actually quite wonderful that they’re adopted, because when someone says, "Oh, isn’t Lucy
beautiful?", I can say with a totally clear conscience "Oh yes, she’s absolutely stunning, isn’t
she?"’. Their perception that the children’s characteristics, whether physical, intellectual or
emotional, are very much their own, seems to make for a healthy parent-child relationship,
where any dreams held for the children are not the result of frustrated parental ambition, as
they might be in a genetic family. As Suzie describes it, ‘The children are entirely different
and it’s sort of meeting those needs when the occasion arises’. Whilst the Eccles children
identify closely with their adoptive parents - James seeing his musical interests as having
come from his father, and Sean wanting to emulate Bob as an accountant - Bob and Suzie
see their role as one of exposure, giving the children access to diverse opportunities,
through which they can ‘find their passion’. Unlike biological parents, they have less of a
predetermined idea about what that passion might be, because they are not trying to identify
their own physical and mental characteristics in their children, but rather accepting them for
who they are.
Through the varied opportunities that their parents have given them, each of the Eccles
children have learnt two musical instruments; the piano and a band instrument. Despite this
equality of provision, it is James who has been labelled as ‘the musical one’ by the rest of
the family, whilst Sean is seen as more studious, and Lucy is the dizzy artist. The three
children are generous in their perceptions of each other’s skills and interests, and even
where Sean feels he has been overtaken in musical ability by James, he acknowledges that
‘people have talents’, and recognises his own skills in maths and science, which he identifies
with Bob, who is an accountant. James articulates his passion for music, stating that ‘at
school they know me for my music talent; I would hate not to be able to play, I’d hate to not
have that privilege.’ He too identifies with his father, saying that ‘he gets all emotional when
he listens to music’, an engagement that James can empathise with. Lucy is far more willing
to give up her musical ambitions, as dance and art her personally acknowledged strengths.
She claims to have inherited her artistic interests from Suzie, but links dance with her
Colombian origins, showing an interesting resolution of her identity as an adopted child.
Clearly, there is a gender split in evidence, as the boys identify with their father’s interests,
whilst Lucy feels closer connection with her mother. Support for their interests is coming from

TueAM2 Pitts
the family environment, yet there is an acknowledgement that their country and family of
origin are also influential, with some sense of confusion implicit in the children’s comments
about which set of parents have provided the critical genetic variable.
Teaching and learning roles
There is a substantive family research literature showing that eldest children often take on
the role of teacher, most typically with older girls teaching younger brothers (Dunn &
Kendrick, 1982). Within the Eccles family, James, the middle child, has taken on the role of
musical expert and so transcends the normal sibling boundaries, giving advice to his older
brother and acting as Lucy’s teacher, despite the small age difference between them. Lucy is
complicit in this arrangement, having chosen to play the same two instruments as James,
and preferring to be taught by him rather than an adult; ‘I don’t really need a teacher every
day, because I can be helped by my brother’. Video evidence shows James revelling in the
teacher persona - ‘Come on Luce, concentrate and try it one more time’ - whilst Lucy shows
a high level of responsiveness throughout long practice sessions (c. 45 mins.). Immediately
after teaching Lucy, James begins his own practice, demonstrating complete absorption and
intrinsic motivation. Despite his youth and lack of explicit teaching strategies, James shows
an awareness of different practising styles, including long term repetition and more detailed
work, in line with established theory (Hallam, 1998); as he says, ‘practising does not just
mean playing it once and flipping the page over’.
Two years into learning the clarinet, Lucy decided to give up, stopping her lessons with
James and withdrawing from the school band. She and her parents are adamant that this
was caused by her dislike of early morning band practices, and her growing enthusiasm for
dance, which was occupying increasing time and energy. The decision to give up was
Lucy’s, as Suzie says ‘I won’t keep badgering them, that’s why Lucy dropped the clarinet,
because I said I’m not going to be the one yelling and screaming about practice’. For the
highly self-motivated James, this lack of explicit direction provides enough of a supporting
environment, but Lucy’s waning interest suggests that a greater level of parental intervention
can sometimes be necessary. James, having held the role of extrinsic motivator for Lucy,
seems to feel some sense of guilt at her decision, blaming his teaching: ‘I was just trying to
teach her what she had to learn ... but sometimes I would go too far, I would go out of her
span. I could have done better if I was older, like me now and her when she was, because
I’d know what to do’.
Conclusions
This brief glimpse of the case study family has highlighted the complex interaction between
heredity and environmental influence in the development of musical skill. At a genetic level,
the three children are recognised as being very different, unrelated to their parents and to
each other. There are some connections of cultural identity, as they are all South American,
and family narratives acknowledge that some characteristics, such as Lucy’s love of
dancing, may have come through the genes. The immediate concern for the family,
however, is to create a secure and enriching environment, offering opportunities for
individual fulfilment.
Within this framework, it is not clear whether James’s identified talent for music is something
he has inherited from his family and culture of origin, or whether it has come from the
opportunities afforded to him within the adoptive environment. What is clear, however, is that
he is abundantly more self-motivated than the other two children, especially in music. So, is

TueAM2 Pitts
it that James has a personality characteristic that predisposes him towards solitary,
systematic learning, for which music is only one possible outlet? His other interests, in
writing and journalism, support this. Or, is the balance more towards identifying with Bob’s
love of music, and the ideal of what being a musician represents? James is very pleased
with the lucrative opportunities offered by busking, for instance, and likes his status within
school and the family as a musician: ‘I just always liked music and I’ve kind of grown up with
it’. The third possibility is that James accesses something intrinsic within the music, but
deciding between these three explanations is virtually impossible, and it is likely that the third
is in itself a subtle consequence of the other two. Genetic and environmental factors are
interwoven in the children’s explanations, so that for them, music is a product both of
inheritance and the life they are now leading. The circularity of the nature/nurture debate
demands that both influences are equally acknowledged, without being shaped by parental
pressure or expectation. Sean, James and Lucy illustrate the importance of allowing
individuality to flourish, so that musical development can become a source of intrinsic and
growing satisfaction to the child.
References
Austin, J. (Ed) (1990) Adoption: The Inside Story. London: Barn Owl Books.
Bartholet, E. (1993) Family Bonds: Adoption and the Politics of Parenthood. Boston:
Houghton Mifflin Company.
Benson, P. L., Sharma, A. R. & Roehlkepartain, E. C. (1994) Growing up adopted: A portrait
of adolescents and their families. Minneapolis: Search Institute.
Capron, C. & Duyme, M. (1989) Assessment of effects of socio-economic status on IQ in a
full cross-fostering study, Nature, 340: 552-554.
Davidson, J. W., Howe, M. J. A., Moore, D. G. & Sloboda, J. A. (1996) The role of parental
influences in the development of musical performance, British Journal of Developmental
Psychology, 14: 399-412.
Davidson, J. W., Howe, M. J. A. & Sloboda, J. A. (1997) Environmental factors in the
development of musical performance skill in the first twenty years of life, in D. J. Hargreaves
& A.C. North (Eds.), The Social Psychology of Music (pp. 188- 206). Oxford: Oxford
University Press.
Dunn, J & Kendrick, C. (1982) Siblings: Love, envy, and understanding. Harvard University
Press, Cambridge, MA.
Hallam, S. (1997) Approaches to instrumental practice of experts and novices: Implications
for education, H. Jorgensen & A.C. Lehmann (Eds.), Does practice make perfect? Current
theory and research on instrumental music practice (pp. 89-107). Oslo: Norges
Musikkhogskole.
Hallam, S. (1998) Instrumental Teaching: A practical guide to better teaching and l earning.
Oxford: Heinemann.
Hill, M. (1998) Concepts of parenthood and their application to adoption, M. Hill & T. Shaw
(Eds.) Signpost in Adoption: Policy, practice and research issues (pp. 30-44). London: British
Agencies for Adoption and Fostering.

TueAM2 Pitts
Hoksbergen, R. A. C. (1997) Child Adoption: A Guidebook for Adoptive Parents and Their
Advisors. London: Jessica Kingsley.
Howe, D. (1998) Adoption outcome research and practical judgment, Adoption and
Fostering, 22 (2): 6-15.
Howe, M. J. A., Davidson, J. W. & Sloboda, J. A. (1998) Innate gifts and talents: Reality or
myth?, Behavioural and Brain Sciences, 21 (3): 432-442.
Kemp, A. E. (1996) The Musical Temperament: Psychology and Personality of Musicians.
Pitts, S. E., Davidson, J. W. & McPherson, G. E. (in press) Developing effective practice
strategies: case studies of three young instrumentalists, Music Education Research,
forthcoming issue.
Plomin, R. (1998) Genetic influence and cognitive abilities. Behavioural and Brain Sciences,
21(3): 420-421.
Sloboda, J.A., Davidson, J.W. & Howe, M.J.A. (1994) Is everyone musical? The
Psychologist. 7 (4) 349-354.
Sloboda, J. A. & Davidson, J. W. (1996) The young performing musician, in I. Deliege & J.A.
Sloboda (Eds), Musical Beginnings: Origins and Development of Musical Competence (pp.
171-190). Oxford: Oxford University Press.
Sloboda, J. A., Davidson, J. W., Howe, M. J. A. & Moore, D. G. (1996) The role of practice in
the development of performing musicians, British Journal of Psychology, 87: 287-309.
Sternberg, R.J. (1998) If the key’s not there, the light won’t help. Behavioural and Brain
Sciences, 21(3): 424-425.
Sulloway, F. J. (1996) Born to Rebel. New York: Little, Brown & Co.
Triseliotis, J., Shireman, J. & Hundleby, M. (1997) Adoption: Theory, Policy and Practice.
London: Cassell.
Back to index

TueAM3_2 Tekman
Proceedings paper
Psychoacoustic Determinants Of The Use Of Time In Musical Performance

Hasan Gürkan Tekman, Middle East Technical University
Although the written music gives the appearance of steady tempo and division of time into equal beat
periods this is rarely the case in performance. Rather there are frequent accelerations and decelerations
in tempo. One common example of this is speeding up from the beginning of a musical phrase to the
middle and slowing down in the remaining part until the end (Repp, 1990, 1992, 1995, 1996, 1998;
Penel & Drake, 1998; Shaffer, Clarke, & Todd, 1985; Todd, 1985; Windsor & Clarke, 1997). Two
possible causes have been offered for such expressive temporal variations. On the one hand,
performers may consciously try to communicate the structure of a piece of music to the listeners
(Palmer, 1989; Repp, 1998). On the other hand, such variations may be a result of distortions of time
perception (Drake & Palmer, 1993; Penel & Drake, 1998). That is, performers may manipulate tempo
in a certain manner because the resulting flow of time may be perceived as uniform rather than
variable. The experiments that are reviewed below exemplify cases in which perception of the
duration of time intervals separating onsets of consecutive sounds is altered as a result of
manipulations of other dimensions of sound. As the stimuli in these experiments have minimal
musical structure, these results support the hypothesis that expressive temporal variations in
performance must in part be the result of properties of how we perceive sound events rather than
conscious efforts on the part of the performer.
In all experiments reviewed, stimuli were sequences of pure tones. The interonset asynchrony (IOI)
between two consecutive tones was 260 ms. In some cases all tones in a sequence had the same pitch
and in some cases the tones made up chromatic scales. Pitch characteristics of the sequences are
described only when they are relevant.
Cue trading between timing and intensity
Cue trading is a term borrowed from speech perception. The main characteristic in cue trading is that
identification of two sounds as the same word or the same phoneme results in an inability to
distinguish them from each other even if they are physically different. This means different physical
dimensions can be traded for each other. For example the difference between the words "slit" and
"split" can be signaled either by the duration of the silent gap following the initial sibilant or by the
onset frequency of the first formant after the gap. Both longer silent gaps and higher formant onset
frequencies result in perception of "split". In this case the probability of distinguishing two tokens on
the "slit" – "split" continuum can be predicted form the probabilities of identifying each token as
"split" regardless of which factor gave rise to that identification (Best, Morrongiello, & Robson,
1981).
Similar trading relationships were observed in the perception of musical accents. In one experiment
(Tekman, 1995, Experiment 1) I observed cue trading between timing and intensity of tones. Both
separation from the preceding tone by a longer time interval and higher intensity can cause a tone to
be perceived as accented. In the experiment participants were asked to make three kinds comparisons.
In the one-cue comparison participants had to distinguish two sequences of pure tones that differed
only in that every third tone had higher intensity in one of them whereas in the other all tones had
file:///g|/Tue/Tekman.htm (1 of 4) [18/07/2000 00:33:37]

TueAM3_2 Tekman
equal intensity. In the two-cues-cooperating comparison timing and intensity were combined to
strengthen the same rhythmic structure. Two sequences differed in terms of the presence of higher
intensity tones again but this time the higher intensity tones were preceded by longer time intervals
compared to their counterparts in the sequence with equal intensity. In the two-cues-conflicting
comparison differences in timing undermined the difference created by the higher intensity tones. The
higher intensity tones in one sequence were preceded by shorter time intervals compared to their
counterparts in the sequence with equal intensity. One key prediction from the identification
performance was that two-cues-cooperating performance should be better than one-cue performance
but two-cues-conflicting performance should be poorer than one-cue performance. The results of the
experiments verified the prediction in that addition of the second cue helped discrimination only if it
was in the cooperating direction.
Integrality of timing and intensity in tone sequences
Interaction of timing with other dimensions of sound has been demonstrated in other ways as well. On
of them is the independence/integrality approach. Two perceptual dimensions are considered to be
processed independently, if detection of variations in one of them (the relevant dimension) is not
affected by variations in the other (the irrelevant dimension). If totally uncorrelated variation on the
irrelevant dimension reduces discrimination performance on the relevant dimension the two
dimensions are not considered to be independent (Garner, 1974; Melara & Marks, 1990). Further
conclusions can be reached from the cases when variation on the irrelevant dimension is correlated
with variation on the relevant dimension. If both positive and negative correlations between the two
dimensions facilitate discrimination on the relevant dimension the interaction is considered to be at a
sensory level. In contrast, if only positive correlation facilitates discrimination but negative correlation
makes it worse the interaction is considered to be at a higher lexical level. That is, at a level where
stimuli are labeled.
I have investigated the independence and integrality of timing and intensity as two means of accenting
in tone sequences (Tekman, in preparation). In one experiment the relevant dimension was variability
of IOIs whereas in the second one it was variability of tone intensity. In the control conditions the
irrelevant dimension was held constant. In the uncorrelated condition presence of intensity and timing
accents varied independently of each other. In the positive correlation condition if a sequence
contained higher intensity tones they followed longer time intervals and if there were no higher
intensity tones then all IOIs were equal. Conversely, in the negative correlation condition if a
sequence contained higher intensity tones then all IOIs were equal and if a sequence had equal tone
intensities then some tones had longer IOIs.
For both timing and intensity as the relevant dimension uncorrelated variation reduced accuracy in
making discriminations on the relevant dimension. Correlation of the two dimensions helped only in
the case of positive correlation. Negative correlation did not have a significant effect on detection of
timing variations and had a negative effect on detection of intensity variations. Thus, timing and
intensity did not appear to be processed independently. Facilitation by only positively correlated
variations indicated that perception of tones as accented interfered with accurate perception of the
actual physical dimensions that created accenting.
Detection of variations in expected and unexpected directions
One important question about perception of timing is how it relates to expected variations in
performance. In two experiments in which I manipulated IOIs in the directions consistent with and
opposed to the direction that they were usually varied in performance I investigated this question
(Tekman, submitted).

TueAM3_2 Tekman
In one experiment intensity in addition to timing was manipulated. It performance of intensity accents
shorter time intervals typically precede the higher intensity tones (Billon & Semjen, 1995; Semjen &
Garcia-Colera, 1986). In this experiment, participants had to detect temporal variations in sequences
when either shorter or longer time intervals preceded higher intensity tones. It was found that the
higher intensity tones reduced sensitivity for shorter time intervals more compared to sensitivity for
longer time intervals. That is, the expected shorter IOIs were harder to detect.
In the second experiment pitch accents were created by having skips in pitch to coincide with the time
intervals that were manipulated. In musical performance larger changes in pitch are typically
combined with longer tie intervals (Drake & Palmer, 1993). In contrast to the previous experiment,
introduction of the pitch skips reduced sensitivity to longer IOIs more than sensitivity for shorter IOIs.
The consistent aspect of these results with the results of the previous experiment was that the expected
longer IOIs that coincided with the pitch skips were hard to detect.
Another variable that was manipulated in these experiments was whether the deviant intervals had
temporal regularity or not. This variable did not affect the way accenting by variation on a second
dimension changed how timing was perceived. Thus, the way variations in other dimensions of sound
affects perception of timing in sound sequences was closely related to how the two dimensions would
be used in performance. Furthermore, this relationship did not appear to be sensitive to global
temporal structure.
Conclusion
Multiple methods converge on the observation that perception of timing in sound sequences depends
on variations in other dimensions of sound. This results in distortions in time perception that have a
close relationship to how these dimensions interact in expressive musical performance. Such effects
are observable with simple acoustic stimuli and are not affected by manipulations of structure such as
temporal regularity. All this supports the conclusion that expressive timing variations must be
determined in part by limitations in our perception of timing.
References
Billon, M. & Semjen, A. (1995). The timing effects of accent production in synchronization and
continuation tasks performed by musicians and nonmusicians. Psychological Research, 58,
206-217.
Drake, C. & Palmer, C. (1993). Accent structures in music performance. Music Perception, 10,
343-378.
Garner, W. R. (1974). The processing of information and structure. Potomac, Maryland:
Erlbaum.
Melara, R. D. & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch,
and loudness. Perception & Psychophysics, 48, 169-178.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental
Penel, A. & Drake, C. (1998). Sources of timing variations in music performance: A
psychological segmentation model. Psychological Research, 61, 12-32.
Repp, B. H. (1990). Patterns of expressive timing in performances of a Beethoven minuet by
nineteen famous pianists. Journal of the Acoustical Society of America, 88, 622-641.

TueAM3_2 Tekman
Repp, B. H. (1992). Probing the cognitive representation of musical time: Structural constraints
on the perception of timing perturbations. Cognition, 44, 241-281.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A
partial connection between music perception and performance. Perception & Psychophysics,
57, 1217-1232.
Repp, B. H. (1996). The art of inaccuracy: Why pianists’ errors are difficult to hear? Music
Repp, B. H. (1998). Variations on an theme by Chopin: Relations between perception and
production of timing in music. Journal of Experimental Psychology: Human Perception and
Semjen, A. & Garcia-Colera, A. (1986). Planning and timing of finger-tapping sequences with a
stressed element. Journal of Motor Behavior, 18, 287-322.
Shaffer, L. H., Clarke, E. F., & Todd, N. (1985). Meter and rhythm in piano playing. Cognition,
20, 61-77.
Tekman, H. G. (1995). Cue trading in the perception of rhythmic structure. Music Perception,
13, 17-38.
Tekman, H. G. (in preparation). Perceptual integrality of timing and intensity variations as
means of creating accents. Music Perception.
Tekman, H. G. (submitted). Accenting and detection of timing variations in tone sequences:
Different kinds of accents have different effects. Perception & Psychophysics.
Todd, N. P. (1985). A model of expressive timing in tonal music. Music Perception, 3, 33-58.
Windsor, W. L. & Clarke, E. F. (1997). Expressive timing and dynamics in real and artificial
musical performances: Using an algorithm as an analytical tool. Music Perception, 15, 127-152.
Back to index

Intonation and Interpretation in String Quartet Performance
Proceedings paper
Intonation and Interpretation in String Quartet Performance: the case of the flat leading note
Peter Johnson, Birmingham Conservatoire (UK)
EMail: peter.johnson@uce.ac.uk
In a famous paper from the 1930s, Carl Seashore observed a tendency among violinists to adopt the very wide major thirds and narrow semitones of Pythagorean tuning (Seashore 1938: 218-224). His data,
however, reveal extreme deviation in the size of these intervals, and this is inconsistent with any formal temperament. The minor seconds vary from 38 cents smaller than equal temperament to 18 cents larger, a
difference of more than half a semitone (p.221). Either he was using poor performances, or the variety of intonational practice must be regarded as musically significant. Studies of unaccompanied cello playing,
including recent and admired recordings by Mischa Maisky and Anner Bylsma, show a similar predilection for wide major thirds and narrow semitones together with a striking range of actual tunings (Johnson
1999b).
Casals has argued that good playing demands what he calls 'expressive intonation', where tones are sharpened or flattened in accordance with the direction of melodic movement (Blum 1987). This principle is
consistently applied in his own recording of Bach's C minor Sarabande (Johnson 1999b). Intonation thus serves as an indexical sign of voice-leading. But is this its only expressive function? It is widely assumed in
the profession that intonation has a more general expressive function, and it is well-known that we tend to interpret small deviations of tuning qualitatively rather than quantitatively (Makeig 1982). And what
happens when a performer reverses the tendency, by playing a falling leading-note sharper than a rising one in the same phrase? In the Notes that append this paper, I discuss an example from Beethoven's last slow
movement, the Lento assai from the Op.135 quartet (Example 1). In bars 3 and 5, there are conjunct falling and rising leading-notes in the first violin. Out of 25 recordings, sixteen violinists take the falling
leading-note in bar 3 sharper than the rising, five tune the notes almost identically, and only four conform to 'expressive tuning' (Johnson 1999b). Explanations can no doubt to be found for this reversal of normal
practice in the special qualities of Beethoven's music, but these would confirm an expressive function for intonation transcending the strictly syntactical.
In fact, whatever principle we devise to justify normal practice, the anomalies remain problematic. From the study of leading-notes at the start of the Lento assai, one recording in particular stands out as highly
idiosyncratic. This is No.17, the Lindsays' 1987 recording, in which the leader consistently tunes all the leading-notes in bars 3 and 5 as pure major thirds against the dominant in the second violin. Against the
normative tunings of Equal Temperament, this gives very flat leading-notes and correspondingly wide semitones to and from the adjacent tonics.
What are we to make of such divergent practices? In particular, how do we 'read' and experience the idiosyncratic tunings in the Lindsays' recording? In this paper, I propose to address these questions by comparing
this recording with two others. One will be an example of very sharp tuning in the same bars of the Lento assai, the other an application of Just Intonation by the Lindsays in the very different context of the Heiliger
Dankgesang from Op.132. First, however, we need critically to review our methods.
file:///g|/Tue/Johnson.htm (1 of 8) [18/07/2000 00:33:41]

Example 1
1. Method
An obvious problem in dealing with an idiosyncratic performance is how we assess its competence. How do we know that our analyses are not revealing simple errors or miscalculations in the execution of the
performance? By using commercially released recordings, we have a strong, although not foolproof, assurance of an error-free performance that has met the approval of its performers, the producer and engineer,
and eventually the wider community of experts and the listening public. On the other hand, recordings made in the laboratory carry no such assurances, neither are they normally available for external scrutiny or
further analysis.
How, then, do we accurately and reliably analyse intonation from a recording? Necessarily, we have to work from small samples, and these need to be representative or symptomatic of the larger musical context.
My examples have been supported by note-by-note analyses of adjacent events, which confirm the stylistic integrity of the performances. A preparatory survey exposes the moments of special interest, which,
perhaps not surprisingly, tend to involve the leading-notes.
We can define the frequency spectrum of any complex musical event quickly and efficiently using Spectrum Analysis. In ensemble performance, the chief question raised by this method is whether the fundamental
frequencies are accurate indicators of our perceptions of pitch and interval. In a paper on intonation in Barbershop singing, Hagerman and Sundberg support the contention that they do (1980), and empirical tests in
which single-beat extracts are compared with synthesised tones of know frequency, also confirm this in most cases (Johnson 1999b). Nevertheless, there are anomalies, and the Lindsays' recording of the Lento assai
provides an interesting example.
Figure 1 shows the spectra of two chords. In the upper plot, the chord is from the opening chorale of the Heiliger Dankgesang from Op.132, and represents the frequency-content of the C major chord in bar 4. The
leading-note is E4 in first violin, and C3, G3 and C4 complete the dominant harmony in the lower strings. The lower plot is from the Lento assai, and represents the last beat of bar 3, a more complex second
inversion dominant seventh, but also with the leading note, C4, in melodic prominence in first violin (see Example 1).


Figure 1
The accuracy of the frequency-data shown in Figure 1 is determined by the value of k, shown to the right of the title-line. The analysis resolves the source signal into discrete bands each of width k Hz, and shows
the strength of each band. By adjusting this value, more precise readings are available without overtaxing a standard personal computer, but there is a trade-off between accuracy and efficiency. Amplitude levels in
the figure are shown as decibel-difference calculated from the strongest peak in the signal. For a more technical description see the Notes appended to this paper.
In both cases, the analysis has generated a spectrum from k-11025Hz, but it is evident that in this quiet music, most of the relevant acoustical information is contained within the 50-3000Hz range. We can see a
marked difference between the two plots, the lower showing a tailing off of peaks in the spectrum above about 900Hz, whereas the upper plot suggest that we shall need to rescale the plot to gain access to all the
significant harmonics. To this extent, the analyses give a good visual analogue of the differences of tone-colour of the two chords. Although both are played quietly and in the same tessitura, the lower has a
noticeably darker quality of tone.
2. Just Intonation
The two extracts analysed in Figure 1 are chosen because they illustrate Just Intonation in the tuning between first and second violins. The relevant calculations are, from the upper plot:
C4 at 263.45 x 5/4 gives E4 at 329.3Hz
and from the lower plot:
Aflat3 at 209.44 x 5/4 gives C4 at 261.8Hz.
Both results are within 0.4 Hz of the actual peaks for the leading-notes, a barely perceptible difference of 2.2 cents. The margin of error is < (k x 5/4) = < 0.21 Hz or about 1 cent. For a brief explanation of the
relevant acoustic theory see the Notes at the end of this paper.
Examination of the lower-order harmonics in the upper plot reveals close conformity to the exact integer multiples of acoustical theory. In the lower plot, that is the case for all the harmonics of the first violin (< 2
cents) but not for those of the second violin's Aflat3. The second harmonic at 416.03Hz is flatter than the theoretical second harmonic (209.44 x 2) by almost 12 cents. In other words, the interval between the
second harmonics of first and second violins is an equal tempered major third. In fact, the peaks at Aflat4 and C5 are close integer multiples of the cello's Aflat2. The fundamental of this tone is unclear in Figure 2,
but by replotting the figure with a higher sample-size and hence greater precision (see Notes), Aflat2 is shown to be 104.2Hz. The multiples of this frequency are within 4 cents of the peaks at Aflat4 and C5, < 4
cents being the level of precision of these calculations. The tuning of the other cello note, the Eflat, is very close to Just Intonation in relation to the two violin tones. One other detail to emerge from this spectrum is
that the viola's Gflat3 is tuned as an almost true 7th harmonic against the second violin's A flat (183.54 x 8/7 = 209.76). This Just dominant seventh is some 25 cents flatter than the Just major second from A flat.
3. Comparison and Interpretation
We have, then, two examples of applied Just Intonation, from the same ensemble and in a similar repertoire. But many other factors are different. In the case of the Heiliger Dankgesang, the intention to find pure
tuning is explicit throughout the opening statement of the Chorale, which is to say that we hear the passage in terms of the tones of Just Intonation. However, the execution does not always match this ideal, and in
the other chords there is a liveliness in the sonorities arising from intonation that is very nearly but not quite pure. And we have seen that in the Lento assai, there is a direct conflict between the second violin and
cello A flats, creating on the one hand a Just major third and on the other, an equal-tempered tenth. Both performances are therefore less than ideal. Should we therefore conclude that our sources are invalid as
exemplary or at least expert performances? The expert community has clearly judged otherwise, for these recordings have been in the CD catalogues already for some 13 years. Let us make up our own minds by
hearing the extracts. Here, first, is the start of the Heiliger Dankgesang.
[HERE PLAY
OP.132iii b.1-6 (Lindsays)]
What I find interesting here is the way we differentiate between intention and execution. We can handle the differences either evaluatively or interpretatively, but we elect to listen evaluatively, we must bear in
mind that the apparent imperfections have not been edited away in the recording process, as they could have been. If instead we listen interpretatively, we assume that what we hear is intentional or acceptable to the
performers and, in some musical sense, construct meaning from what we hear. In the Lindsays' Chorale, for instance, there is no singing persona from the first violin, as there is in some other recordings. With no
hint of vibrato, there is a clear intention of creating a blended, single sonority for each chord. Yet the first violin is still the top line and the step-wise movement confirms that this is a chorale melody. There is
therefore a deictic tension between what we know the music could be, and what we hear. This tension, when appropriately used, is, I suggest, one of the most positive aspects of good performance. It explains why
we need more than one good performance of the same work, and why there is no single definitive interpretation.
If the peculiar tension in the Lindsays' playing of the chorale arises from the gap between intention and realisation, it is reinforced by the withdrawal of persona from the first violin, which plays the melody almost
as if it were not the principal line. This is signified by the lack of vibrato and by using Just Intonation, thus denying the normative sharpening towards the tonic. But Just Intonation, I suggest, has its own surcharge
of meaning, and this may even transcend the semiotic in its strictly natural origins. It is perhaps its very neutrality that has made Just Intonation unusual in mainstream classical performance over the last seventy
years. On the other hand, period instrument performance, in which Mean-tone tuning with its pure thirds is widely used, serves to remind us that any interpretation of performance practice is style-specific, and that

sharp tuning and vibrato are not the only way of signifying salience or a soloistic persona.
This example also highlights the inevitable interaction between work and performance. Beethoven's hymn is addressed, as it says in the score, 'an die Gottheit', and can thus be read as aspirational, a glimpse of an
ideal that remains inattainable. This invites an interpretation of the Lindsays' performance in terms of embodiment or enactment of that aspiration. In this context, Just Intonation together with the senza vibrato may
symbolise an aspirational loss of individuality, the transcendence of self. Yet in the Lindsays' performance, Just Intonation is noit perfectly achieved, and the performance itself demonstrates the extreme difficulties
of sustaining pure tuning on modern stringed instrument over the duration of these slow-moving minims. It is the very impossibility of achieving a clearly defined ideal that emerges, metaphorically, in this
performance. Its warm, deeply human properties derive from its quite specific vulnerability. To this extent, the performance ironizes Beethoven's Dankgesang, although without a trace of the cynicism we have
come to associate with postmodernist and deconstructive performances in the theatre. Here are four human beings, highly skilled and imaginative in their art, utterly dedicated to performing Beethoven's music as an
engagement with the transcendental, and fully aware that their aspiration is idealistic. Were it to achieve the perfection of pure, natural tuning, we would read it very differently.
This, at least, is my interpretation, offered as a metaphor of a strictly musical experience of the embodied sonorous properties of the recorded performance. But how, then, are we to interpret the more complex
signifiers in the Lindsays' performance of the opening bars of the Lento assai?
[HERE PLAY Op.135iii b.1-6 (Lindsays)]
At first glance, intentions again seem explicit. Here is a song, a 'Süsser Ruhegesang' as Beethoven wrote in a sketch-book. And in some respects, the Lindsays' leader, Peter Cropper, adopts the contemporary
conventions of a singing line, with a steady, continuous tone and normal vibrato. Here, deictically, we do have our vocal persona, standing before the accompanying tonic and dominant chords of the lower strings.
The quiet, undemonstrative playing is not incompatible with this reading, but significantly qualifies it. But there are three factors in the performance that undermine this idealised image. Firstly, the tempo is
extraordinarily slow, by far the slowest of any post-war recording of this movement. It can be seen from Figure 1 that the single note has a duration of 1.5s., and beat-timings indicate that there is a long tenuto over
this upbeat, as if the bar-line were problematic to cross. Thus the intonational anomalies discussed above are emphasized, even dwelt upon, and the step from leading note to tonic is easily heard as a wide interval in
the recording. And yet, the sustained 6th beat is not heard unambiguously in terms of the resonances of Just Intonation, as are the chords in the Chorale from the Heiliger Dankgesang, and instead tend to sound a
little flat, particularly in relation to the next tonic. What was presumably a minor aberration in tuning between second violin and cello here becomes meaningful. The serendipedous, we know, is not necessarily
inartistic.
Beethoven's 'Ruhegesang', therefore, is performed as a song enveloped in difficulties: the 6/8 metre doesn't flow, the alternating tonic and dominant chords don't sway; there is no trace of the gentle lullaby that is
implicit in the score. Intonation is here one of a complex set of factors. Again, therefore, Beethoven's score is ironized. But here, the tension is between the simplicity of the score-content and the complex and
contradictory signifiers of the performance.
And yet, if the Lindsays' interpretation is a manifestly inauthentic reading of the opening bars of the movement, it is arguably implicit in the way Beethoven develops the theme in the first two variations. Already at
bar 22 we come to a deeply tragic, C sharp minor variation, complete with the slow tread of a funeral march and strange, stalking chromatic harmony. In most performances, this variation arrives as an abrupt fall, or
perhaps a reminiscence. The Lindsays' is the only recording that performs this moment as if it were the inevitable consequence of the Theme itself. It is, they seem to say, what the movement is all about. I simplify,
of course, but I hope my words give some intimation of the richness of this interpretation.
And so, what of very sharp tunings of the leading notes in the Theme of the Lento assai? In the Leipziger Quartet's 1999 recording, the rising leading note is tuned 37 cents sharper than Just (No.25 in Figure 2
below). Checks on the higher harmonics show that there are no peaks flatter than the integer multiples, and there is no doubt that the reading is accurate. And, surprisingly, the tuning does not seem to offend.
Perhaps this is because it is consistent with other signifiers: Beethoven's 'cantante tranquillo' is presented very much cantante, but less tranquillo; the playing is markedly soloistic, as if inviting us to admire the
singing tone of the first violin, even to participate mentally in its production. But we are not taken into the deeper recesses of musical experience as we are by the Lindsays, and this may be because of the lack of
ambivalence in the performance and its relation to the score.
Theodor Adorno argued that the task of the performer is to problematise the composer's work (1997: 106). The Lindsays' performance shows some of the ways in which this can be achieved to artistic purpose. The
possibility of a radical rethinking of the score through performance may be one measure of the value both of the performance and of the work itself. Beethoven's music invites a multiplicity of readings, whether in
the spirit of authenticity or of reinterpretation and re-evaluation. The Lindsays demonstrate that performance can be both creative and critical. Our task, as listeners, is to carry this process to the music as we hear it,
as the product of both compositional and performance artistry and of interactive listening on our part. I have tried to show that a useful preliminary step is to isolate salient properties of the performance, such as
intonation and vibrato, but I have also argued that interpretation necessarily follows as a process of reintegration or synthesis between score, performance and our own interactive response as listeners. We cannot
adequately describe our musical experiences, but words are there to help us, and as Nicholas Cook has argued, without them, our experience would be impoverished (Cook 1999).
.......................
Additional Notes
1. Pitch-names, cents and intervals

I use the American Standard pitch-names, where the cello C-string is C2, middle c is C4 and soprano top-c is C6. A440 is therefore A4. I refer to performed notes as 'tones'.
The interval, I, between two frequencies f1 and f2 is defined by I = f1/f2. This is converted to cents, c, by:
c = 1200 x log2(I).
An equal tempered semitone is 100 cents. At C4, a difference of 2 cents is about 0.3 Hz, and the equivalent in other registers can quickly be calculated by multiplied or divided by a factor of 2 per octave. So at C2,
2 cents represents a difference of 0.075 cents. Note that in Seashore (1938), intervals are shown in fractions of a whole-tone, where 0.01 = 2 cents.
Just Intonation is defined by the convergence of the lowest common harmonics between any two tones. For two frequencies, f1 and f2, the Just major third is given as f1x 5 = f2 x 4, so that f2 = f1 x 5/4 (386 cents).
The Just perfect fifth is 3/2, and the major seventh is 15/16.
2. Spectrum Analysis
The accuracy of a spectrum analysis depends upon the ratio between the sample-rate of the source recording and the size of sample submitted to analysis. The resultant analysis is a plot of frequency against
amplitude, the readings along the frequency axis proceeding in steps of k Herz, where
k = sample-rate/sample-size.
It is this ratio that determines precision in the reading of frequency. In Figure 1, k is set at 0.1682Hz, this being the ratio between a sample-rate of 22.05kHz and a sample-size of 217. The latter in fact represents a
duration of almost six seconds, but shorter extracts may be used with the same sample-size by padded the source-file with zeros. This generates mathematically predictable anomalies in the form of smooth curves
connecting the peak readings, but does not affect the peaks. A Hanning Window is applied to reduce this effect. In Figure 1, the original sound-source has a duration of 1.5s, 1/4th of sample-size.
The software I am using is SPAN, which is a purpose-specific implementation of the signal processing routines in Matlab (Johnson 1999c). For a more detailed implementation and discussion of spectrographic
analysis see Johnson 1999a. The somewhat complex mathematics is explained in Poularikas & Seely 1991: 259-260, or in any standard text on signal processing.
3. Comparison of Leading Notes in Op.135iii, bars 3 and 5
Figure 2 shows how 25 string quartets handle the tuning of the melodic leading-notes in bars 3 and 5 of Op.135iii. The ensembles are arranged in chronological order along the x-axis, from the Flonzaley Quartet's
recording of 1927 to the Leipziger Quartet's of 1999, and the sample includes a quite remarkable performance of an arrangement for the strings of the Vienna Philharmonic under Bernstein (No.22). This example
can be included because a well-trained string section plays with sufficient precision of intonation to generate very clear peaks in a spectrum analyses.
The upper plot relates to bar 3, and the lower to the equivalent tones in bar 5. The signs ∨ and ∧ indicate the tuning of the falling and rising leading notes respectively, calculated as intervals in cents from the
sustained Aflat3 in second violin. The zero line represents a Just major third (386 cents) between first and second violin. The equal tempered major third would be about 14 cents sharp, the Pythagorean, 22 cents
sharp.
Contrary to the principle of expressive tuning, the falling tone is on average the sharper, by a factor of about 5 cents in the upper plot and 6 cents in the lower. Contrary to 'expressive tuning', 16 (13) out of the
twenty-five recordings tune the rising leading-note flatter than the rising one, whereas only 4 (8) show the ascending leading higher than the lower; 5 (4) are identical, within a safe margin of error of <3 cents
(figures for bar 5 in brackets). And we can note that all the entries below the zero line, i.e. flatter than Just, relate to the rising leading note in both plots. We can similarly compare the melodic semitone between
these leading notes and the preceding and following tonics. As we would expect, the sharp major thirds are reflected by narrow semitonal steps in the melody (Johnson 1999b).

Figure 2
Key:
1 Flonzaley(1927) 14 Melos(1984)
2 Busch(1934) 15 Vermeer(1985)
3 Loewenguth(?) 16 Guarneri(1987)
4 Budapest(1941) 17 Lindsays(1987)
5 Hungarian(1953) 18 Emerson((1988)
6 Hollywood(1957) 19 Alban Berg(1989)
7 Fine Arts(c.1960) 20 New Budapest(1991)
8 Italian(1968) 21 Medici(1991)
9 Amadeus(1969) 22 Bernstein(VPO,1991)
10 Vegh(1973) 23 Juilliard(1996)

11 La Salle(1976) 24 Vanbrugh(1996)
12 Talich(1977) 25 Leipziger(1998)
13 Amadeus(1982)
Bibliography
Adorno, T.W. (1997). Aesthetic Theory. (Translated by Hullot-Kentnor from Asthetische Theorie. Suhrkamp, 1970). London, Athlone Press.
Blum, D. (1977). Casals and the Art of Interpretation. Berkeley, Los Angeles, London, University of California Press.
Cambell, M. & Greated, C. (1987). The Musician's Guide to Acoustics. London and Melboune, J.M.Dent.
Cook, N. (1999). Words about Music, or Analysis versus Performance. In Dejans (1999), pp.9-52.
Dejans, P. (Ed.,1999). Theory into Practice: Composition, Performance and the Listening Experience. Leuven, Leuven University Press.
Hagerman, B. & Sundberg, J. (1980). Fundamental Frequency Adjustment in Barbershop Singing. In Journal of Research in Singing, Vol.4 No.1, pp.3-17.
Johnson, P. (1997). Musical Works, Musical Performances. The Musical Times, August 1997.
Johnson, P. (1999a). Performance and the Listening Experience. In Dejans (1999), pp.55-101.
Johnson, P. (1999b). Intonation in Professional String Quartet Performance. ESCOM 1999 (proceedings forthcoming).
Johnson, P. (1999c). SPAN: spectrographic analysis of musical extracts in the Matlab environment. Birmingham Conservatoire.
Makeig, Scott (1982). Affective versus Analytical Perception of Musical Intervals. In Clynes, M., Music, Mind and Brain: The Neuropsychology of Music. New York & London, Plenum Press.
Poularikas, A.D. & Seely, S. (1991). Signals and Systems. Boston, PWS-Kent.
Seashore, C. (1938). Psychology of Music. McGraw-Hill, 1938, reprinted by Dover Books, New York, 1967.
Tarasti, E. (1994). A Theory of Musical Semiotics. Bloomington & Indianapolis, Indiana University Press.
Back to index

TueAM5_2 Madison
Proceedings paper
Interaction between melodic structure and performance variability

on the expressive dimensions perceived by listeners
Guy Madison
Department of Psychology
*Uppsala university, Sweden
Key phrases: Music perception, meaning, aesthetics, performance variability, factor analysis
Both the music structure and the music performance can in themselves can be powerful means of communication.
Listeners can typically identify the emotional character of a composition in the absence of expressive devices, for
example through a deadpan (without variability) sequencer realisation (Thompson & Robitaille, 1992).
Performances of the same melody can also be made to communicate different emotional expressions (Gabrielsson &
Juslin, 1996). Although both levels of communication can thus be effective, at least as far as emotions are
concerned, the question remains how they interact in normal music performance. Given that (systematic)
performance variability (PV) is almost ubiquitous in music performed by people, what purpose does it fill that is not
satisfied by other aspects of the music?
Several different scenarios can be imagined. Generative theory (e.g. Clarke, 1986; Lerdahl & Jackendoff, 1983)
implies that the role of performance expression is to facilitate the communication of the music structure. According
to this view, the performer's creative space must be constrained by the structure, and acceptable performances
according to experienced listeners can only exist within narrow bounds. It might for example only be possible to
reinforce or attenuate the same pattern of perceived expression ratings, not alter it altogether. This can be called a
reinforcement hypothesis. A different view has been advocated by (Repp, 1998), who suggests that the structure
provides a frame in which an expressive landscape can be painted by the performer. For example, a trivial melody or
rhythm may call for the performer to endow it with more content, or a slow tempo or a low event density may call
for things to happen so that the listeners will not grow bored. As a result, we would expect greater difference in
ratings as a function of PV for melodies with simpler structure or lower event density, constituting a compensation
hypothesis. Finally, structural information and PV might convey different kinds of expressions. It has recently been
demonstrated that emotional intentions could to a certain extent be identified through PV alone (Juslin & Madison,
1999), which might for example raise the question whether other kinds of expression may capitalise more on
structural features. This could be called a dissociation hypothesis.
The present study was designed to explore the relations between music structure and performance variability, by
comparing ratings of expression for natural performances with performances from which all PV was removed. To
this end, generality and ecological validity was sought by sampling melodies that were unknown to both performers
and listeners from a body of publicly available Western music, in order to avoid possible extramusical and individual
associations. The performances should also be natural, and although this is left to the performers' discretion, highly
experienced professionals can be expected to share an understanding of what is musically appropriate. The piano
was considered the most suitable instrument due to its limited means for expression. Neither performers nor listeners
would expect or try to use any other cues than tempo/timing, loudness, and articulation, and the risk for inconsistent
use of these cues is small. Fewer cues are also easier to control experimentally.
The study incorporates the generation of performances of 25 different melodies, a manipulation of these so as to
eliminate the PV, and a listening experiment.
file:///g|/Tue/Madison.htm (1 of 8) [18/07/2000 00:33:44]

TueAM5_2 Madison
Method
Performance session
One female and two male professional keyboards musicians participated, 34, 45, and 31 years old. Each had played a
keyboard instrument since early childhood (at least for 25 years) and had worked professionally for at least 10 years.
All frequently performed on both piano and organ, and they were paid for their voluntary and anonymous
participation.
The performers played a Casio Celviano AP-20 digital piano and received feedback through Sony CD-250
headphones, while the MIDI signal was recorded and stored in files on a PC by means of a MIDI interface.
The melodies were selected to represent a wide range of structural features, such as harmonic mode, metre, interval
size, and rhythmic and melodic complexity. The performances were between 13 and 31 seconds in duration (M =
19.3).
Performance manipulation
A deadpan realisation of each performance was made by creating a new MIDI file based on means for the three
performance variables (tempo, loudness, and articulation) in the corresponding original performance. The 75 original
and the 75 manipulated MIDI files were played back through the Casio piano, and the sounds were recorded digitally
on disk.
Listening experiment
Six male and four female musicians, 31 to 52 years of age (M = 43) participated. All had played an instrument for at
least 15 years and were currently active musicians
The number of adjectives to rate was kept at a minimum not to exhaust the listeners. The words angry, calm,
complex, fearful, happy, longing, musical, sad, solemn, stable, tender, and tense were selected as being
representative, based on a several studies of dimensionality for full performances (Campbell, 1942; Hevner, 1936;
Watson, 1942); Wedin, 1969; 1972a; 1972b). In addition, expressive and beautiful were included to test for the
presence of PV and the agreeableness of the performances.
The stimuli were presented in 6 melody × performance blocks, arranged such that each melody only occurred once
within each block, and the 6 performance conditions (3 performers × 2 PV) were rotated within the block. The
purpose of this design was to maximise the distance between presentations of the same melody, in order to avoid
contrast effects due to the comparison of different performances – while still retaining a representative balance
between the performance conditions within each block. Each listener individually attended four sessions and the
presentation order of blocks and performances within blocks was individually randomised. The first block in the
experiment contained 14 performances randomly sampled from the entire pool. Its purpose was to establish an
impression about the range of expressive features in the experiment, and these ratings were not included in the
analysis.

Most adjectives received higher ratings with PV, with expressive and musical almost one scale unit higher.
Conversely, angry and stable received lower ratings with PV. According to three-way repeated measures ANOVAs
(2 PV × 3 performers × 25 melodies) for each scale, the effect of PV was significant for all rating scales except
solemn, calm, and tense (F1, 9, p < .05).
There were also strong main effects of melody for all scales, which is in line with the selection criteria for the
melodies. The reinforcement and compensation hypotheses are complementary, and the latter would predict larger
increases in ratings as a function of PV when ratings based on structure alone are low and vice versa. A lack of
positive correlation coefficients between rating for the original performances and the difference in ratings between
original and deadpan performances favour the compensation hypothesis. In support of the dissociation hypothesis,
we find that sadness and a number of "positive" ratings are substantially reduced by removing PV, fearful and happy
are moderately reduced, angry and stable are increased, and some of the ratings are not at all affected by PV.
The two sets of ratings on the 14 scales were subjected to factor analysis (FA), so as to identify the number and

TueAM5_2 Madison
relative magnitude of dimensions that were actually used by the listeners. The inter-scale correlation matrix
indicated a cluster of "positive" adjectives, of which the highest correlation was for musical and expressive (.73). In
descending order we find correlations between tender and longing (.70), beautiful and musical (.63), beautiful and
expressive (.62), beautiful and longing (.61), and beautiful and tender (.60), followed by tender and calm (.56), angry
and tense (.55), and calm and tense (-.54). Exploratory FAs on a total of 72 cases (3 performers × 24 melodies)
favoured five-factor solutions for both performances with and without PV (based on the scree plot and
interpretability of factors) and the two sets of factors were similar, both in their respective explained variance (93.4%
and 86.4%) and factor loadings.
Factor I, with high loadings in expressive, musical, beautiful, and longing was interpreted as goodness or
pleasantness. Factor II, with high positive loadings for tense and angry, and high negative loadings for longing,
beautiful, tender, and calm was interpreted as tension versus calm, and factor III as fear and sadness versus
happiness. It is unusual that fear and sadness join in a factor, but one explanation may be that terms representing fear
and music expressing fear have rarely appeared together. Factors IV and V were identified as complexity versus
stability and solemnity. Most of the effect of PV on complexity–stability is probably trivially related to the stable
pulse which results from removal of PV. High factor loadings, with at least one loading > .80 for each factor,
indicates that the factors successfully compress the semantic scales into dimensions which can be given well-defined
semantic labels. These five dimensions are summarised in Table 1, and will in the following be called goodness,
tension–calm, fear–happiness, complexity–stability, and solemnity.
Table 1
Interpretation of the five factors, and the semantic scales with the highest loadings in each factor. Factor loadings for
adjectives in parentheses are smaller than 0.5.
Interpretation + adjectives - adjectives
I Goodness, expressivity Expressive
Musical
Beautiful
II Tension, anger– Angry Tender
Tenderness, calm Tense Calm
(Complex) Longing
Beautiful
III Fear, sadness–Happiness Fearful Happy
Sad
IV Complexity–Stability Complex Stable
V Solemnity Solemn
Under the assumption that these dimensions provide a valid data reduction and an interesting level of analysis,
another FA was performed on the means across performers for both PV conditions, and the factor scores for all 144
(2 PV × 3 performers × 24 melodies) cases were submitted to a three-way ANOVA for each factor, with the highest
interaction as error term. The results, summarised in Table 2, show that there were significant main effects of PV for
goodness, tension–calm, and complexity-stability. There were also main effects of performers for all factors but
fear–happiness, and of melody for all factors. The effect sizes d’ related to performers were however smaller than
0.20 (typically < 0.10) and these effects should therefore not be very important.
Table 2
Summary of three-way repeated measures ANOVAs for the factor scores related to a five-factor solution, in terms of
p levels for all effects.
PV Perf. Melody PV PV Perf. PV ×
× × × Perf. ×
Perf. Mel. Mel. Mel.

TueAM5_2 Madison
I. Goodness ***** * ***** — *** **** --a

II. Tension–Calm ** ***** ***** ** — ***** --
III. Fear–Happiness — — ***** — — **** --
IV. Complexity–Stability ***** **** ***** — — *** --
V. Solemnity — ***** ***** — — ***** --
— n.s., *p < .05, **p < .01, ***p < .005, ****p < .001. *****p < .0001. a Highest-order interaction was used as
error term.
The main effects of PV were to increase the melodies’ goodness and decrease their tension. It also increased the
scores for complexity-stability, which was however essentially an effect of the isochronous (stable) pulse in deadpan
performances, as indicated both by interviews and adjective ratings. Finally, PV had no main effect on the
dimensions characterised as solemnity and fear–happiness. The interaction between PV and melody on goodness
suggests that natural variability patterns might have quite different effects for different structures. The factors will,
under the term perceived expressive dimensions (PEDs), replace the 14 adjective scales in the following analyses.
Contribution of structure, performance levels, and performance variability to the PEDs
The purpose of this section is to explore the relative contributions to the PEDs among a set of potentially relevant
variables describing the performances. Although this approach does obviously miss much of the detail of the
phenomena, it might tap possible general relations across the range of different music structures.
Overall tempo, loudness, etc., have substantial effect on, for example, ratings of emotion words (Juslin, 1997). It is
therefore important to account also for the mean levels of the mean performance levels (MPLs) of the expressive
variables although they were not experimentally controlled. MPLs were measured in terms of the means for all notes
across each performance, and its index of dispersion – that is, performance variability – is here expressed as the
dimensionless coefficient of variation (SD / M). Tempo is expressed as the intervals between subjectively perceived
beats (the tactus) in the sounding music, for the sake of consistency with other measures of time. Thus, higher MPL
values indicate high loudness (vs. low), legato articulation (vs. staccato), and long tactus duration (TD), that is, slow
tempo (vs. short duration).
The nine structure variables were based on transcriptions of the melodies (not shown here). Mode was coded as
major (1), minor (2), and mixed (3), melodic complexity as to correspond with function harmony (1), chromatic or
modulated (2), and free-tonal (3), according to the author’s judgement. Pitch, interval, and duration levels refer to the
number of different values, regardless of their distance or frequency, and range is the number of semi-tones between
the highest and lowest value. For time, it seemed more appropriate to measure the range in terms of the ratio
between the longest and shortest duration.
The usefulness of these variables depends on the amount of variance in the listeners responses that they explain.
Substantial multicollinearity can be expected within each group of variables (structure, MPL, and PV), but not
between the groups, and a hierarchical multiple regression analysis (MRA) was therefore performed for each PED
on its mean factor scores across the ten listeners (72 cases). The same kind of MRA was also made for the original
performances, so as to detect any differences in the way listeners used structure and MPL when PV was absent.
Figure 1 shows that all variables together could explain between 34 and 63 percent of the variance across all PEDs
and the two PV conditions. The variance explained by structure is highest for perceived complexity (49%), as might
be expected, but only for deadpan performances.

TueAM5_2 Madison
Figure 1. Cumulative proportion variance explained by hierarchical MRA for three blocks of variables; structure,
MPL, and PV. Each combination of a PED (1-5) and a PV condition (with or w/o PV) is represented by a column,
whose total height is equal to R2 for all predictor variables together in a simultaneous MRA. All differences are
statistically significant (p < .05) except the increments due to PV for fear-happiness.
It is notable that the presence of PV seems to increase the contribution from MPL for goodness, but decrease it for
tension–calm and solemnity. Also, PV decreases the contribution of structure for goodness and complexity–stability.
These results might indicate a dynamic effect of PV, in that it directs the listeners attention to certain features. For
example, if one can hear that the loudness varies, then it can be assumed that loudness conveys some kind of
information, which makes both the local and overall level of loudness more interesting.
Interaction structure–performance variability
This section focuses on the three hypotheses – reinforcement, compensation, and dissociation. The principal variable
will be the difference dPV = original - deadpan, for which a positive value indicates that the factor scores, and hence
some of the positively loaded adjective scale ratings, were higher with PV. A correlational analysis between ratings
of original performances and dPV on the one hand, and the structure and performance variables on the other, was
made across the entire set of 72 performances.
The compensation hypothesis suggested that not only complexity but also the time available should affect the PV.
There are at least two levels of time in music that might be relevant in this context, namely tempo (as measured by
TD) and event duration (ED), which is the time between onsets of note events. Whereas TD is nominally
isochronous, ED is also a function of the structure. There will be large differences between mean, minimum, and
maximum ED when there are many levels of note durations. As it was not obvious which measure is the most
important for PV, all were included in the analysis.
Table 3 shows the correlations between factor scores for original (left half) and dPV (right half), and the structure
and performance variables. If the corresponding correlations for original and dPV have the same sign, it means that
high factor scores are decreased more as a function of removing PV than smaller scores are, which is in line with the
reinforcement hypothesis. According to the same reasoning, opposite signs between the left and right half of Table 3
supports the compensation hypothesis.

TueAM5_2 Madison
Table 3
Correlations between structure and performance variables (rows) and PEDs based on ratings (columns). Left and
right panels of the table compare original performances and the differences between original and deadpan in terms of
mean factor scores across 10 listeners (N = 72).
Original dPV = Original – Deadpan
FI FII FIII FIV FV FI FII FIII FIV FV
Mode .26 * .07 .37 * .30 * -.15 -.21 .17 .25 * -.23 * -.04
Melodic complexity .15 -.01 .32 * .30 * -.03 -.20 .01 .17 -.21 -.10
Pitch levels .13 -.12 .41 * .01 .04 -.09 -.07 .05 -.11 -.15
Pitch range .24 * -.23 .32 * -.06 .06 -.09 -.04 -.02 -.09 -.02
Interval levels -.06 -.26 * .05 -.02 .19 -.00 .05 -.08 .04 .02
Interval range .14 -.08 .14 .14 .18 .05 .07 -.28 * .04 .09
Interval size -.03 .16 -.00 .17 -.02 .08 .20 .19 -.28 * .03
Duration levels .21 .06 .30 * .28 * -.19 -.18 .05 .27 * -.14 -.18
Duration range .33 * -.09 .17 .05 -.14 -.31 * -.04 .19 .03 -.19
Loudness .05 .58 * .09 .11 .08 -.22 -.25 * -.15 -.12 .04
Articulation (legato) .09 -.10 .24 * -.06 .09 .07 .17 .38 * -.17 -.18
Tactus duration -.01 -.16 .04 -.04 .22 .29 * .09 .16 .00 .01
Event duration min. -.10 -.23 * -.09 -.14 .34 * .27 * .01 -.07 .03 .02
Event duration Md -.08 -.31 * .29 * -.10 .34 * .20 .12 .16 -.13 -.01
Event duration max. .20 -.08 -.03 .34 * -.09 .06 .02 .19 -.15 -.18
Loudness variability — a — — — — .33 * .47 * .03 -.02 .08

Articulation variability — — — — — .11 .10 -.16 .03 .25 *
Timing variability — — — — — -.06 .05 .02 -.05 -.16
Note. I = Goodness; II = Tension-calm; III = Fear–happiness; IV = Complexity–stability; V = Solemnity. *p < .05. a There is no performance variability in the deadpan
performances.
For goodness, the correlations with mode, melodic complexity, duration levels, and duration range have all opposite
signs for original and dPV. These four variables measure the structure complexity as inferred from the scores, and
are also positively correlated with the listeners’ ratings of complexity (R2 = .61 for all 9 structure variables). Thus,
various indicators of complexity apparently increase ratings of goodness, but also decrease the effect of PV on
goodness. The same phenomenon is found for complexity–stability, but this will be ignored because it is likely a
trivial effect of deadpan performances being rated as more stable. Opposite signs are also found for loudness and
event duration (ED) in tension-calm.
All in all, these results indicate that the ratings behind the goodness and tension-calm PEDs are subject to
compensation by means of the variability that the performers impose when attempting to make a natural
performance. One could say that structures which are perceived as less pleasant – predominantly the simpler ones –
are made more pleasant, whereas already pleasant structures do not need as much improvement (based on the fact
that the main effect of PV on goodness is positive). Likewise, the compensatory effect on tension-calm is that
structures which appear more tense in a deadpan performance are made more relaxed and tender by the performance
variability than those that appear less tense in their deadpan version.
The two remaining PEDs, fear–happiness and solemnity, do not follow the same pattern. The three correlations
including fear–happiness that are significant for both original and dPV are all positive, which speaks in favour of the
reinforcement hypothesis for this PED. That is, when mode, duration levels, and articulation affect the ratings along
the fearful/sad-happy dimension, PV increases those effects further, so that PV makes a sad or fearful performance
even more sad etc. The bipolarity of fear–happiness is probably the reason why it did not demonstrate any
significant main effect according to the ANOVA (Table 2). Finally, there are no significant correlations at all
including dPV for solemnity, which is in line with its lack of main effect of PV.

TueAM5_2 Madison
General discussion
Whereas it has generally been assumed that the purpose of PV is to communicate more effectively the structure,
recent work seems to refute the perceptual reality of such a claim (Tillman & Bigand, 1996), and argue that it might
instead serve to communicate the performer’s aesthetic expression (Repp, 1998). In other words, although variability
patterns may perfectly well reflect performers’ overlearned hierarchical representations of the structure, it is doubtful
whether the ordinary listener would notice. In any case, the short melodies used in this study do not comprise many
organisational levels to convey.
The question remains why there was nevertheless a substantial amount of variability, and why it affected the
listeners’ perception in certain directions. If the purpose were to improve the communication of structure, consistent
effects of PV would have been less likely in view of the wide range of different melodies. However, a few possible
sources of errors in this study must be considered.
First, it can not be excluded that the crude elimination of PV might have caused some artefactual interaction with the
other expressive devices, such as tempo. It seems however unlikely that such phenomena be consistent enough to
cause significant effects across the relatively large number of very different structures.
Second, three performers and ten listeners are admittedly few. It was however believed that these performers’ long
experience would warrant their correct anticipation of listeners’ ideals. Musicians were chosen as listeners because
they were believed to be more internally and inter-individually consistent, and less susceptible to superficial
differences between the melodies.
Third, the setting was constrained to monophonic performances and the few expressive variables available on the
piano. It is possible that this paucity precludes several expressive scenarios, and one should be careful when
attempting to generalise to other conditions. But it is also possible that the performers’ high level of skill enabled
them to approach what would have been the case in a more expression-wise favourable setting.
Bearing these concerns in mind, a more general problem lies with the concept of performance expression,
specifically with its definition as systematic deviations from the "neutral" structure given in a score (Clarke, 1996).
This is obvious when the music is not notated, but one can also question the theoretical arguments for taking the
score "literally" in the sense that equal units of musical time should necessarily correspond to equal units of physical
time. However, extending the definition to include deviations from a norm (Desain & Honing, 1992, p. 175) could
raise even more severe problems, depending on what "norm" is meant to be. For example, Repp (1998) found that
one pattern of timing deviations accounted for 61.4 percent of the (timing) variance among 115 commercially
available recordings of Chopin's Etude in E major. As this pattern can justifiably be regarded as a norm, should we
consider the performances which were closest to this norm to be inexpressive? And if the norm or the most
pleasurable way to play a piece deviates in a specific way from the notation, why have we not developed a notation
with the appropriate level of resolution?
Generative theory suggests that rule-based transformations of canonical values constitutes an economical way of
rendering performances which convey the hierarchical structure of the music. In addition, performers' internal
representations may differ, and give rise to an aesthetic variety from which individual listeners can choose.
However, most music has a simple large-scale structure with repetitions of two or three sections, and is frequently
performed with uniform tempo and loudness, at least within consciously audible thresholds. The generative scenario
can only take place at the outskirts of music as a whole, where compositions are multi-level hierarchical, and the
performer is allowed to impose large-scale deviations. And even under these conditions, it appears the listeners
would have to know the piece very well in order to appreciate both the large-scale structure and to recognise
variability patterns that help to convey it (e.g. Cook, 1987; Karno & Konecni, 1992; Konecni, 1984; Gotlieb &
Konecni, 1985). For example, Tillman and Bigand (1996) arbitrarily shuffled musical chunks (~6 s duration) in
commercially available recordings of classical piano music, which means that any hierarchical performance
variability patterns must also have become scrambled. However, the listeners' appreciation of the music was not
reduced by the shuffling.
It has also been suggested that music elicits or communicates emotions. Ranging from the speculative, the purely
theoretical, and the empirical, much of the latter work has investigated how representations of emotions are
expressed and decoded. Although the melodies in the present study were selected in part for their wide range of
emotional character, the small reinforcements found for fear, sadness, and happiness are not strong or consistent

TueAM5_2 Madison
enough to warrant a simple communicative interpretation. The results do rather suggest a general tendency to
decrease tension and increase goodness, whatever the latter may stand for in a wider psychological perspective.
As inconsistent as the present results may be, they may help to point in certain directions concerning the function of
performance variability, and the role of the performer. Further research along this path might prove effective in
elucidating the performer’s contribution to the qualities that music convey.
References
Campbell,J.G. (1942). Basal emotional patterns expressible in music. American Journal of Psychology, 55, 1-17.
Clarke,E. (1986). Theory, analysis and the psychology of music: A critical evaluation of Lerdahl, F. and Jackendoff, R.:
A Generative Theory of Tonal Music. Psychology of Music, 14, 3-16.
Clarke,E. (1996). Expression in performance: generativity, perception, and semiosis. In J. Rink (Ed.), The practice of
performance. (pp. 21-54). Cambridge: Cambridge University Press.
Cook,N. (1987). The perception of large-scale tonal closure. Music Perception, 5(2), 197-206.
Desain,P., & Honing,H. (1992). Music, Mind and Machine. Amsterdam: Thesis Publishers.
Gabrielsson,A., & Juslin,P.N. (1996). Emotional expression in music performance: Between the performer's intention
and the listener's experience. Psychology of Music, 24, 68-91.
Gotlieb,H., & Konecni,V.J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg
Variations by Johann Sebastian Bach. Music Perception, 3, 87-102.
Hevner,K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48,
246-268.
Juslin,P.N. (1997). Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgment policy. Musicae scientiae, 1(2), 225-256.
Juslin,P.N., & Madison,G. (1999). The role of timing patterns in the decoding of emotional expressions in music
performances. Music Perception, 17, 197-221.
Karno,M., & Konecni,V.J. (1992). The effects of structural interventions in the first movement of Mozart's symphony in
G-Minor, K. 550, on aesthetic preference. Music Perception, 10, 63-72.
Konecni,V.J. (1984). Elusive effects of artist's "messages". In W. R. Crozier & A. J. Chapman (Eds.), Cognitive
processes in the perception of art. (pp. 71-96). Amsterdam: North-Holland.
Lerdahl,F., & Jackendoff,R. (1983). A generative theory of tonal music. Cambridge, Massachusetts: MIT press.
Repp,B.H. (1998). A microcosm of musical expression I: Quantitative analysis of pianists' timing in the initial measures
of Chopin's Etude in E major. Journal of the Acoustical Society of America, 104(2), 1085-1100.
Thompson,W.F., & Robitaille,B. (1992). Can composers express emotions through music? Empirical Studies of the Arts,
10 (1), 79-89.
Tillman,B., & Bigand,E. (1996). Does formal musical structure affect perception of musical expressiveness? Psychology
of Music, 24, 3-17.
Watson,K.B. (1942). The nature and development of musical meanings. Psychological Monographs, 54(2)
Wedin,L. (1969). Dimension analysis of emotional expression in music. Swedish Journal of Musicology, 51, 119-140.
Wedin,L. (1972a). A multi-dimensional study of perceptual-emotional qualities in music. Scandinavian Journal of
Wedin,L. (1972b). Multidimensional scaling of emotional expression in music. Swedish Journal of Musicology, 54, 1-17.
Back to index

INTERPLAY AND EFFECTS OF MELODIC STRUCTURE AND PERFORMANCE ON EMOTIONAL EXPRESSION
INTERPLAY AND EFFECTS OF MELODIC STRUCTURE AND PERFORMANCE ON EMOTIONAL

EXPRESSION
Mr Erik Lindström
Erik.Lindstrom@psyk.uu.se
Background:
Many empirical studies show how deviations in timing, dynamics, and

articulation in performance are made in order to illustrate the structure of
the music, to clarify the meter, signal the end, separate phrases or resolve
ambiguities etc. However, the interplay between performance and melodic
structure to achieve emotional expression is little explored.
Aims:
The main purpose of this investigation was to elucidate the effects of factors
in music performance and melodic structure in achieving emotional expression.
How is inherent expressiveness in melodic structure administered in real
performance? How is a happy tune performed to sound happy ? Is it possible to
increase expressed happiness of an inherently less happy tune by adequate
performance? Are certain notes of more importance and articulated more
explicitly in some expressions?
method:
From an earlier listening experiment on deadpan (computer made) performances of

systematically varied versions of "Frére Jacques", a selection was made of
versions judged to be (a) good and (b) less good as expressions of happiness,
sadness, tenderness, and anger.
Two performers were asked to play these versions so as to clearly express the
respective emotions. A listeningtest was conducted to evaluate their
communication of emotion to listeners.
Results:
The results confirmed earlier findings of expressive performance of emotions.

Furthermore, the results indicated some increase (or decrease) of certain
performance variables overall, when the performers were to express an emotion
not fitting the inherent expression of the respective version. Furthermore,
specific tones were stressed or marked in different ways for different
expressions.
Conclusions:
Means for emotional expression may be found in (a) the musical structure per
se, (b) in adequate performance, as well as (c) in the interaction between
structure and performance.
file:///g|/Tue/Lindstro.htm (1 of 2) [18/07/2000 00:33:45]

INTERPLAY AND EFFECTS OF MELODIC STRUCTURE AND PERFORMANCE ON EMOTIONAL EXPRESSION
Back to index
file:///g|/Tue/Lindstro.htm (2 of 2) [18/07/2000 00:33:45]

INTERACTIONS OF PERFORMANCE AND MEMORY SKILLS IN
CHILDREN'S MUSICAL THINKING
Carlos Xavier Rodriguez
College of Education
UNIVERSITY OF IOWA
Background
One of the most meaningful aspects of musical experience is the ability to

communicate, or interpret, musical ideas through performance and listening.
>From an instructional perspective, it is beneficial to investigate the
development of this ability in children, particularly the mutual influence of
music production and perception. Although there is considerable evidence of
relationships between performance and memory skills in musical development,
children+IBk-s ability to invoke these processes simultaneously is essential to
higher, more creative levels of musical thinking.
Aims
In this study, the following research questions are investigated: 1) to what

extent is the ability of children to identify their own musical performances
related to their ability to recall salient and non-salient musical fragments
that are presented tonally and non-tonally? 2) Are there age-related tendencies
in the relationship between performance recognition and fragment recognition?
Method
Children in three age groups will be administered two different measures: 1) a

fragment-based memory task developed by Marta Olivetti Belardinelli 2) a
performance-based interpretative performance task developed by Carlos Xavier
Rodriguez. Statistical analyses will be performed to reveal the presence or
absence of developmental trends in task performance, and any relationships
between self-awareness and task performance.
Results
Results and discussion will focus on the implications for music teaching and
learning. The interpretative performance task has already been piloted by the
author in the United States, as has the memory task by Prof. Belardinelli in
Italy.
Conclusion
This study is the first attempt to use these measures together to obtain a
richer profile of children+IBk-s musical thinking. We believe the gathering of
cross-cultural data will promote more generalizable conclusions about
children+IBk-s musical development.
Back to index
file:///g|/Tue/Rodrigab.htm (1 of 2) [18/07/2000 00:33:46]

file:///g|/Tue/Rodrigab.htm (2 of 2) [18/07/2000 00:33:46]

The Creative and Collaborative Musical Identity:
Proceedings paper

Investigating Different Social Approaches to Musical Composition
Karen Burland, The University of Sheffield
This study aimed to investigate the effect of different groupings on children’s collaborative musical
composition. Much research in domains other than music demonstrates that work in small groups
breeds greater success due to the exploratory discussions and interactions which take place whilst
solving problems and completing tasks (Barnes and Todd, 1977; Bullock, 1975; Galton and
Williamson, 1992). Other research suggests that group collaborations can help improve an
individual’s cognitive skills (Hertz-Lazarowitz, 1990:28, cited in Galton and Williamson, Ibid.).
Whilst the benefits of group collaborations seem certain, there is very little research which considers
the implications of group work in music education, and it is from this perspective that the current
research was undertaken. Variables such as ability, gender, and friendship could make an important
difference to group collaborations. The implications of each are discussed below.
There is much evidence to suggest that males and females have different personality characteristics. In
heterogeneously composed groups, men dominate group collaborations, receiving and initiating more
interactions than women (Aries, 1976). In the classroom, differences have been found between the
types of discussion which occur in homogenous pairs of boys and girls. Research suggests that
females place more emphasis on building interpersonal friendships, whilst boys are more concerned
with improving their individual status (Maccoby, 1990, cited in Galton and Williamson, op. cit.).
Additional evidence suggests that, on musical tasks, girls take control of a task in heterogeneous
gender groups, whilst mixed gender groups cooperate less well than single-sex groups (Morgan,
Hargreaves, and Joiner, 1997). Whilst there may be a lack of cooperation in mixed gender groups,
Galejs (1974) has noted that opposite sex pairs may engage in positive behaviours such as assisting
and demonstrating.
Intelligence is another factor which could influence group collaboration. Ability grouping is
commonly found in the classroom because it is believed to provide an optimum pace for learning and
instruction. Furthermore, evidence highlights the frustration that mixed ability groupings can often
cause, particularly for higher ability students (Jones and Gerig, 1994). However, some research
suggests that homogeneous ability grouping has little, if any, positive effect on achievement (Kulik
and Kulik, 1991, cited in Jones and Gerig, Ibid.). If children of lower ability are grouped together,
teachers may subconsciously treat them in a different, and more negative way than the more able
groups, and it is thought that the lower ability group could adopt these negative expectations and
become a self-fulfilling prophecy, resulting in lower achievement (Oakes, 1983). Converse to the
practice of homogeneous grouping, high ability students within a heterogeneous group may well act
as a positive influence on lower attaining children, contributing the most to discussions, resulting in
high levels of on-task behaviour (Bennett and Cass, 1989).
Similarity is an important variable in the choosing of one's friends, hence sex and to some degree,
ability, are crucial factors in this category. Evidence highlights that in young children, sex is an
file:///g|/Tue/Burland.htm (1 of 8) [18/07/2000 00:33:49]

important determinant of friendship (Aboud and Mendelson, 1996; Hallinan, 1980). Some research
shows that attempts of teachers to use friendship cliques as a means of judging academic achievement
have been largely unsuccessful (Waller, 1932; Brady-Smith, Newcomb, and Hartup, 1978, cited in
Foot et al, 1980), despite other evidence demonstrating that there is a positive relationship between
high cohesive groups and higher achievement (Seashore, 1954; Kruger and Tomasello, 1986).
One theory of why the collaborations of friends and non-friends may have different effects on
achievement is that conflict occurs more often in friendship groups. Using the Piagetian perspective,
psychologists believe that socio-cognitive conflict is the distinguishing factor of successful and
non-successful peer collaboration. When children experience confrontation whilst attempting to solve
a problem, the effect is 'disequilibrating'. The conflict forces the children to attempt an understanding
of the alternatives, and consequently they achieve a more in-depth and objective understanding of the
task (Kruger, 1993). However, social determinist psychologists believe that conflict does not lead to
productive group work. It is thought that it is cooperation and not conflict which will result in
cognitive change (Vygotsky, 1978). Additional reasons as to why friendship group collaborations may
be more successful than non-friendship groups are that individuals share more information with
friends than with non-friends (Newcomb and Brady, 1982) and that friends create a more desirable
working atmosphere (Foot, Chapman, and Smith, 1977).
Of course, friendship groups may not produce wholly desirable behaviours. Whilst they may
collaborate in a more positive and friendly way, there is a chance that this will provoke more off-task
behaviour and general chat or play. However, despite this, it appears that friendship groups have the
potential to create an effective working environment, with good interactions, which have the potential
to produce high quality results. In light of the positive and negative effects of friendship, it is
necessary to consider exactly what defines effective interaction.
Effective interaction is commonly understood to involve contribution from all group members.
Examples of such behaviour within a group includes participation, cooperation, listening, helping,
negotiating, and developing ideas with others. If individual group members interact in this way, it is
suggested that the group is collaborating successfully (Plummer and Dudley, 1992; Bales, 1950a;
Kruger, 1992).
There has been little research directly related to friendship group interactions and musical
composition. Morgan, Hargreaves, and Joiner (1997-98) investigated how peers collaborate in
compositional tasks, and the influence this has on the resulting music. They conducted two studies
investigating group interactions and the effects on musical composition. Their research demonstrated
the importance of gender as a factor in group collaborations, highlighting the dominance of girls in the
groups, and their overall ability to cooperate more successfully. Mixed gender groups cooperated less
well than homogeneous groups, with all-girl groups producing the most effective compositions. Since
friendship groups are usually same-sex, these findings suggest that friendship groups may well have a
positive effect on collaboration.
It appears that the study of group collaboration is relatively under researched, and that there are many
questions which remain unanswered. Evidence suggests that whilst group work is highly
advantageous, the specific conditions in which it should occur remain unclear. Comparisons between
variations of grouping such as friendship, ability, and gender, and their different influences on the
quality of interaction and outcome, have yet to be drawn. Furthermore, very little evidence considers
the impact of group work on musical activities.
EXPERIMENTAL STUDY

The current research aimed to investigate how various groupings of children, particularly friendship
groups, influence collaboration focusing on composition quality and the quality of interactions. Five
groupings were used for this purpose: friendship, non-friendship, matched-intelligence,
mixed-intelligence, and random. In addition, the study was conducted in two schools, one with a
reputation for musical activities and one which offers it’s pupils few musical opportunities, in order to
establish whether any differences existed between them.
Research highlights the benefits of friendship group collaborations, and so it was anticipated that
friendship groups would produce the highest quality compositions. Ability also has also been found to
have an important influence on past studies, and so it was thought that mixed ability groups would
produce more effective results than matched ability groups. Gender has been shown to play an
important role in the selection of friends in younger children, which means that friendship groups
consist of same-sex group members. It was hypothesised that since gender is such an influencing
factor on friendship groups in this age group, homogeneous gender groups would produce higher
quality compositions than heterogeneous gender groups. An additional research aim considered the
effect of school and whether there is any difference between the quality of composition produced in
each. It was thought that the children in the ‘musical’ school would create higher quality compositions
than the children in the ‘non-musical’ school, due to greater musical experience.
Friendship groups are more likely than the other grouping conditions to spend a substantial amount of
time together, and therefore it was anticipated that friendship groups would produce the most effective
group interactions. Ability, however, has not been found by previous research to be an influence on
the quality of interaction, and so it was not expected to be an influencing factor in the present study. It
was anticipated that the quality of outcome should reflect the standard of interaction. If a group does
not collaborate effectively, then it could be considered that the outcome will not be as good as that
produced by a group which has a higher quality interaction. Therefore it was expected that the highest
quality composition would occur with the most effective interaction.
METHOD
Participants: The participants used in this study were from two different schools - one of which does
not have an active music education (school A), and the other which does (school B). The children
were in Year 6, and therefore aged either 10 or 11. There were 29 children from school A; 11 boys
and 18 girls. There were a total of 30 children from school B; 17 boys and 13 girls.
Data was collected by the researcher and the class teachers who observed the childrens’ interactions.
The researcher and an additional, independent individual marked the compositions, and an extra
person operated the video camera.
Procedure: Initial planning The first stage of the study involved interviewing the teachers in order to
discover information about the friendship groups within the class, normal working groups, and
amount of music experienced in class. Teachers additionally provided approximate levels of
attainment for each child. The children were asked to complete a pre-study questionnaire which asked
about their friends, usual working partners, musical experience, and whether there was anyone in the
class that they did not wish to work with.
Once the data had been collected, children were allocated to groups. In school A, 29 children were
organised into 5 groups; 4 with 6 members, and 1 with 5. In school B, 30 students were organised into
5 groups of 6 children. As mentioned previously, 5 different group variables were manipulated in this
research. For the control variable, children were randomly allocated into groups. The ability matched

groups were organised as far as possible to ensure that each group contained children of the same
grade. Where this was not possible, care was taken to ensure that similar grades were present in each.
For the mixed ability variable, one student from each ability grade (1-5) was placed in each group. For
the groups of 6, the procedure remained the same as for the groups of 5, with the extra children
randomly assigned to each group. The children usually had pre-established friendship groups of 5 or
6, and these remained the same in the friendship variable. In this variable it was more difficult to
accommodate those children who were isolated, or generally disruptive, and who nobody wanted to
work with. Where possible, these children were placed with children whom they had named as their
friends, although the feeling may not have been reciprocated. The non-friendship groups involved
mixing those children together who had specified a preference not to work together. However, care
was taken to confirm the groups with the teacher, just in case there was a particularly volatile
arrangement.
The sessions: As far as possible, the research sessions proceeded in the same way for the two schools.
For school A, three visits were made. Each lasted approximately 2 hours. For school B, there were
two visits, each lasting approximately 2½ hours. A total of 10 compositional tasks were used in all.
They were taken either from existing education literature (Paynter, 1992; Mills, 1995; Morgan et al,
1997-98) or were specially constructed by the researcher.
The classroom was organised in order to place the tables as far away from each other as possible and
seats were arranged around the table, so that all of the group members could see each other, therefore
aiding participation and cooperation. Onto the tables, 5 or 6 different percussion instruments
(depending on the size of the group) were placed, in order to avoid arguments regarding which
instruments they wanted, and also to ensure that each group had different timbral sounds to explore
The children were then told which groups they would be in for the first two tasks. A brief introduction
to the sessions was given by the researcher, explaining that they would be working with groups of
different children from within their class, to compose different pieces of music using the percussion
instruments. They were informed that the video camera was going to film the sessions, and also that
the teacher and the researcher would be observing their work. Additionally, it was explained that they
would be expected to perform their compositions at the end of each task. The first task was then
described. There was an opportunity for any questions or clarification. They were given a time limit of
15 minutes in which to compose a piece.
As the sessions took place, the two observers moved around the room, completing the questionnaires
noting information regarding the group leader, major contributors to the discussion, the type of
interaction, the speed of progress, and whether they were operating as a group. The video camera
recorded those groups which were not being observed by other means, and the footage was later
analysed for signs of interaction quality by assessing aspects of non-verbal communication. The
researcher used the audio tape recorder on those tables she was observing, and this was later
transcribed and analysed to assess the quality of interaction. About 10 minutes into the session the
researcher informed the class that there were only five minutes remaining in which to organise their
final ideas, and prepare to perform their pieces. After the full 15 minutes, the groups proceeded in
numerical order to play their pieces whilst the video camera recorded them. Once all the pieces had
been performed, they were informed of their next task, and the procedure was repeated for the
remaining tasks.
After each group had completed the allocated two tasks, the children were asked to complete a
post-task questionnaire, concerning their perceptions of the group's interactions, whether or not they
enjoyed the activities, and who they did and did not like in the group. They were asked to complete
them on their own, without discussing them with the other group members.

Methods of assessment: data analysis The performances of the compositions were marked using an
adapted criteria of "Teacher’s constructs of music activities" (Hargreaves, Galton, and Robinson,
1996) which operated on a scale of 1-7 between two polar points. There was, however, one change
made to number 14 of this criteria. Whether the children were 'technically skillful or technically
unskillful' was not deemed relevant to this study, and so this was replaced with 'task requirements
fulfilled or unfulfilled', since it was felt that this was important due to the precise nature of the
composition tasks that were involved in this research. The compositions were marked by the
researcher and a musician external to the study in an attempt to reduce bias factors.
The video footage, audio tape transcriptions, observer questionnaires, and participant questionnaires
were analysed for quality of interaction. A chart was created by the researcher which used criteria
suggested by past research (Kruger, 1992; Berkowitz, 1980) to constitute good, effective group
interaction. Included are equal contribution, participation and cooperation from all members, listening
to each other, and discussion. If all listed elements were present then the group received the maximum
score. Thus, the lower the score, the weaker the group interaction, and vice versa. A total score of 11
was possible.
RESULTS
Data was collected for both compositional, and interactional quality, and the results for each were
analysed independently using a 3 Factor Analysis of Variance (ANOVA) - the factors being school,
grouping, and composition.
A three factor ANOVA on compositional quality found that the effect of grouping was not statistically
significant. However, a significant effect of school was found (F1, 35 =10.769, p=0.0023) and there
was significant interaction between school and grouping (F4,35=3.77, p=0.0119). More detailed
analysis reveals that in school A, the higher attaining ability-matched groups appear to have created
better quality compositions than the lower attaining groups. This finding is interesting since school A
was the school which provided least musical instruction. The ANOVA also revealed that the
compositional tasks did not influence the standard of composition, which suggests that no task was
more difficult than another, a factor that could have potentially distorted the results.
Grouping, however, was found to be a highly significant influence on the quality of interaction
(F4,35=6.814, p=0.0004). Friendship groups produced the highest interaction scores in both schools.
The Fischer PLSD revealed that the friendship groupings produced significantly higher results than
the other groupings (p<0.05). The mean marks across the two schools also reveal that the
non-friendship variable produces the lowest scores for interaction results.
Whilst no statistically significant difference was found between the quality of interaction in the two
schools, there was some effect of school and groupings on the group interaction scores (F4,20=3.2,
p=0.0243). The analysis also revealed that the different compositional tasks also influenced the
interactions (F16,140=2.57, p=0.0097).
A one factor ANOVA testing the effect of ability on the quality of interaction revealed that the result
for school B was significant (p=0.0216), although the result for school A was not. Gender was also
found by a one factor ANOVA to be an important influence on the quality of interaction (p=0.0059).
The Fisher PLSD reveals that both the homogeneous-boys, and homogeneous girls groups produced
higher quality interactions than both the mixed gender groups (1.544), and those groups which consist
of one girl and four or five boys (1.947). This allows the conclusion that homogeneous gender groups

in this age group collaborate more effectively than heterogeneous gender groups, supporting the
findings of Morgan et al (1997-98).
A Spearman’s correlation revealed no relationship between the quality of interaction and the quality
of composition.
DISCUSSION
Contrary to expectation, grouping was not found to influence the quality of composition. There are
several possible explanations for this finding. One possible reason is that when friends work together,
there is always the danger that there will be some off-task discussion and general playing around. One
example of this occurred in school A. The boys in the fourth group were trying to decide which
emotion they wanted to depict musically. Having established that they would try and depict sadness,
the amount of on-task talk was very limited. It seems, therefore, that friendship groups may not
achieve higher results than any other grouping simply because the amount of on-task behaviour is
less. A second possible explanation is a methodological one: the ability levels provided by the
teachers were approximate, and the grades did not consider musical ability. The estimated grades do
not reflect aptitude for particular subject areas (for example, if a child is of average ability but
outstanding in Art, the grade does not represent the fact that s/he is better at creative subjects than
scientific subjects). Nor is music an assessed part of the National Curriculum at this level, and so the
children's estimated grades did not include their musical ability.
Another possible reason why grouping did not have the anticipated effect on compositional quality is
that the majority of previous research does not focus on musical activities. The fact that the findings
of the current research do not concur with existing evidence may be because past studies have focused
on settings other than music, and so the results cannot be generalised to this particular domain.
Furthermore, music is seen as a specialised field, and unlike other academic subjects such as science
and mathematics, it is given very little emphasis by the National Curriculum. Children, therefore, may
have a lack of experience with music, which may have an effect on the results.
However, although there is no effect of grouping on compositional quality, there was a significant
effect on the quality of interaction. Friendship groups interacted most effectively, and there are several
possible explanations for this. Firstly, it is obvious that friends spend a lot of time together, both in the
classroom and at play. They will have established methods of interaction, and will have developed a
way to share ideas, compromise, and complete work on time. Secondly, in accordance with past
research, conflict situations occurred more frequently in the friendship groups than in any other group
situation (Berkowitz and Gibbs, 1983). Whilst not all friendship groups experienced conflict
situations, there were often instances when the children were critical about each others ideas, which
eventually resulted in the development of those ideas. Comments such as "no, we don't want no silly
idea", and "no, don't do that, that's crap", led to more productive results. Sometimes the comments
would be substantiated with encouraging remarks such as "No, it sounds good the way it is....That
sounds best". The marks for the friendship interactions were considerably higher, probably because
these kinds of conversations led to extended discussions and the elaboration of ideas - important
criteria on the marking scheme. When critical comments, such as an idea being "silly" were made, no
negative feelings seemed to have been created, supporting suggestions from psychologists who
believe that criticism is received differently by friends than non-friends (Gottman and Parkhurst,
1980). In the groups of non-friends, negative comments become a lot more personal. "You're just
stupid then", "If you don't shut up, I'm going to put you down as being bossy" (on the questionnaire),
"Rosa, are you actually awake?", "You're not supposed to sing, you sound stupid". Remarks such as
these can hardly improve the quality of the composition, since they are negative criticisms regarding
personal characteristics of the individuals to whom they are directed. This obviously had an impact on

the group collaborations, since the non-friendship groups produced the lowest interaction scores.
Additional conflict situations in the groups of friends occurred due to leadership problems. If an
individual takes control, it may be the case that the other group members resent one person taking
over, or that the leader resents having to tell everyone what to do. This happened in school A, with
one group of girls. One of the girls was feeling terribly isolated, whilst one of the other group
members resented being expected to tell everyone what to do. Interestingly, the results achieved by
this group were the highest - some 4 marks above the other groups. This provides further support for
the findings of psychologists such as Kruger (1993), and Berkowitz and Gibbs (1983).
Gender was also found to have a significant influence on the quality of interaction. In the ability
groupings (which were mixed gender groupings), analysis of the interactions revealed that it was
usually the girls that took control of the activities, and assigned roles to each of the group members,
which is in direct contrast to the findings of previous research (Aries, 1976); they were more
conscientious, and would often make notes about their plans. In the mixed gender groups the children
would generally work within gender subgroups, and if they did collaborate with the whole group, they
would only do so after they had discussed their ideas separately. This supports the findings of Morgan
et al (1997-98), who in a similar study found that mixed gender groups cooperated less well than
single sex groups.
CONCLUSION
The present study was designed to investigate the influence of friendship, gender, and ability
groupings on the standards of composition and interaction produced by children aged 10 and 11 from
two different schools. The results of the study suggest that grouping does not influence the quality of
composition produced, although it does have an effect on the interaction. Friendship groups were
found to produce the most effective group interactions. Gender was additionally found to be an
important influence on the interactions, with girls interacting more effectively than boys. No evidence
was found that higher quality compositions occur with higher quality interactions.
There has been little research directly related to that of the present study, although the findings
regarding gender influences on the group collaborations do agree with previous research by Morgan,
Hargreaves, and Joiner (1997-98) and friendship groups were found to produce more effective
interactions, supporting past research (Kruger and Tomasello, 1986). However, contrary to previous
research, ability was not found to be an important influence on compositional quality, although the
results suggest that matched ability groupings are not as effective as friendship groupings, thus
supporting the findings of Kulik and Kulik (1991).
There was a highly significant difference between the quality of compositions produced by the two
schools, in favour of the school which did not provide as many musical opportunities. This has certain
implications for the teachers of music. School B had certain kinds of musical practices which
concentrated on theory and instrumental teaching. This suggests that if a school chooses to
incorporate music into their lessons, then steps should be taken to ensure that the education is
well-balanced, and does not simply emphasise theory and performance.
Further implications for classroom teaching (of music) can be drawn from these findings. It suggests
that grouping can positively influence the work atmosphere by allowing individual group members to
share and develop ideas, which may consequently lead to personal learning, and possibly group
success. These findings suggest that this is indeed a topic deserving of further investigation.
REFERENCES

Aboud, F. E., & Mendelson, M. J. ‘Determinants of Friendship selection and quality: Developmental perspectives’. In
Bukowski, Newcomb, and Hartup (eds.) ‘The company they keep: Friendship in childhood and adolescence’.
Cambridge University Press, 1996.
Aries, E. ‘Interaction patterns and themes of male, female and mixed groups’ Small group behaviour 7: 7-18, 1976.
Bales, R. ‘Interaction Process Analysis: A method of the Study of Small Groups’ Chicago, University of Chicago
Press, 1950.
Barnes, D., & Todd, F. ‘Communication and Learning in Small Groups’ London, Routledge and Kegan Paul, 1977.
Bennett, N., & Cass, A. ‘The Effects of group Composition on Group Interactive Processes and Pupil Understanding’
British Educational Research Journal, 15 (1), 19-32.
Bullock, A. ‘A language for life’. London, HMSO, 1975.

Foot, H. C., Chapman, A. J., & Smith, J. R. ‘Friendship and Social Relations in Children’ John Wiley and Sons Ltd.,
1980.
Galejs, I. ‘Social Interaction of preschool children’ Home Economics Research Journal, 2, 153-159, 1974.
Galton, M., and Williamson, J. ‘Group work in the Primary Classroom’ Routledge, 1992.
Hargreaves, D. J., Galton, M. J., & Robinson, S. ‘Teachers’ assessments of primary children’s classroom work in the
creative arts’. Educational Research, 1996, 199-211
Hallinan, M. T., ‘Patterns of Cliquing Among Youth’. In Foot, Chapman, and Smith ‘Friendship and Social Relations
in Children, John Wiley and Sons Ltd., 1980.
Jones, M. G., & Gerig, T. M. ‘Ability Grouping in Classroom Interactions’ The journal of classroom interaction, 1994,
27-33.
Kruger, A. C. ‘The Effect of Peer and Adult-Child Transactive Discussions on Moral Reasoning’ Merrill-Palmer
Quarterly, pp. 191-211, 1992.
Kruger, A. C. ‘Peer Collaboration: conflict, cooperation, or both?’ Social Development, 1993, 165-180.
Kruger, A. C., & Tomasello, M. ‘Transactive discussions with peers and adults’ Developmental Psychology, 22,
681-685, 1986.
Mabry, E. A. ‘The Effects of Gender Composition and Task Structure on Small Group Interaction’ Small Group
Behaviour, 1985, 16 (1), 75-96.
Morgan, L. A., Hargreaves, D. J., & Joiner, R. W. ‘How do children Make Music? Composition in Small Groups’
Early Childhood Connections, 1997-98.
Plummer, A. D., & Dudley, P. (project leaders) ‘Assessing Children learning collaboratively’. Essex Development and
Advisory Service, 1993.
Seashore, S. E. ‘Group cohesiveness in the industrial work group. University of Michigan Press, 1954.
Vygotsky, L. ‘Mind in society: The Development of Higher Psychological Processes’ Cambridge, Mass: Harvard
University Press, 1978.
Back to index

Penel
EFFECTS OF GROUPING AND METRICAL ORGANIZATIONS ON TIME PERCEPTION AND

PRODUCTION : FROM SIMPLE SEQUENCES TO MUSIC
A. Penel
penel@psycho.univ-paris5.fr
Background:
On the one hand, the process of sequential grouping, the use of temporal
regularity, and the possible incorporation of these into hierarchical
organizations have been demonstrated with simple auditory sequences. On the
other hand, Western tonal music presents two related structural components, the
hierarchical grouping and hierarchical metrical structures, and the functioning
of hierarchical grouping and hierarchical metrical organizations have also been
demonstrated in the case of music.
Aims:
The aim is to show that hierarchical grouping and hierarchical metrical

organizations may affect time perception and production, both as a result from
these processes and as a witness for them.
Main contributions:
Experimental evidences of hierarchical grouping and hierarchical metrical

organizations will be briefly summarized, stemming from studies with simple
auditory sequences and music. It will be argued that at least Western tonal
music developed according to these fundamental cognitive abilities, as
evidenced by the two previously mentioned basic structural components. We will
then review data showing how hierarchical grouping and hierarchical metrical
organizations may affect time perception and production, from studies using
both simple auditory sequences and musical materials. The relation between time
perception and time production will be discussed in the light of grouping and
metrical organizations. In particular, we will explain how fundamental aspects
of expressive timing may arise from these.
Implications:
So, time perception and time production may be affected by processing of

musical structure. One implication is that expressive timing as measured by
deviations from what is written in the musical score is incorrect. Rather, it
should be measured either by deviations from (average) perceived time or
(average) produced time.
Back to index
file:///g|/Tue/Penel.htm [18/07/2000 00:33:49]

MEASUREMENTS OF THE PITCH, STRENGTH OF PERCUSSION =INSTRUMENTS
MEASUREMENTS OF THE PITCH STRENGTH OF PERCUSSION INSTRUMENTS

Andrzej Rakowski
Fryderyk Chopin Academy of Music, Warszawa
Background. Percussion instruments are frequently divided into two groups, instruments of "definite
pitch" and those of "indefinite pitch." However, such an arbitrary classification, seems insufficient and
the degree of pitch definition or pitch strength as well as the pitch level of percussion instrumental
sounds should be determined experimentally.
Aims. The experiment was carried out todetermine the pitch strength and pitch level of various
percussion sounds.
Method. Pitch strength was determined for 10 percussion instruments and the violin. Eight subjects
adjusted the frequency of a pure tone presented at a constant level of 60 phons to set its pitch equal to
the pitch of an investigated instrumental sound. Listening was monaural through high-quality
earphones. The subject activated a presentation of each of the sounds by pressing a key on a computer
keyboard and could listen to the sounds as many times as it was necessary to make an adjustment.
Each subject performed 7 pitch adjustments of each of the sounds investigated. Altogether, a set of 56
pitch adjustments (8 subjects x 7 adjustments) was obtained to determine the pitch level (median of
adjustments) and pitch strength (reciprocal of the inter-quartile range) of each investigated sound.
Results. As expected, the best precision in pitch adjustment of instrumental sounds, i.e., lowest
dispersion of pitch adjustments, was obtained for the violin. Dispersion of adjustments was larger in
the case of percussion instruments, however, in many cases the pitch of percussion sounds could be
determined quite precisely. It is important to note that relatively good pitch =definition was observed
also in the case of certain instruments belonging to the "indefinite-pitch" group.
Conclusions. The experiment has demonstrated that pitch stregth and pitch level of musical
instruments =can be effectively determined with the use of the method of adjustment. Although
certain percussion instruments, such as the triangle, are referred to as "indefinite-pitch" instruments,
their pitch can be determined quite accurately in a psychoacoustic experiment.
Back to index
file:///g|/Tue/Rakowski.htm [18/07/2000 00:33:50]

ICMPC Expression DRAFT 1.1
Proceedings paper
Musical Expression and Musical Meaning in Context

Justin London
http://www.acad.carleton.edu/curricular/MUSC/faculty/jlondon/index.htm
1. Some preliminaries.
There is a growing body of work in the philosophy of music and musical aesthetics that has
considered the various ways that music can be meaningful: music as representational (that is, musical
depictions of persons, places, processes, or events); musical as quasi-linguistic reference (as when a
musical figure underscores the presence of a character in a film or opera), and most especially, music
as emotionally expressive. Here I will focus on the last topic, for I believe it will be useful for
researchers in music perception and cognition to avail themselves of the distinctions that aestheticians
have worked out regarding the musical expression of emotion.
Now we often say that music is "expressive," or that a performer plays with great expression, but what
exactly do we mean? There are at least things one may be saying. First, one may be praising a
performer for their musical sensitivity, that he or she has a keen sense of just how a passage is
supposed to be played. Such praise is often couched in terms of the performer's "musicality" (in
statements that border on the oxymoronic, as when one says that a performer plays the music very
musically). Such praise may also be couched in terms of expression--i.e., that a performer plays
"expressively." I have little to say about these attributions, save that they are often linked to the
second thing one often means when speaking of the music or a performance being expressive: an
expressive piece or performance is one that recognizably embodies a particular emotion, and indeed
may cause a sympathetic emotional response in the listener. Thus if one plays "expressively," this
means that the music's particular emotional qualities--its sadness, gaiety, exuberance, and so forth, are
amply conveyed by the performer.
Before going further, a number of other preliminary remarks are in order. When we speak of the
expressive properties of music, these are distinct from the expressive properties of sound. Sounds may
be loud, shrill, acoustically rough or smooth, and so forth. These acoustic qualities have expressive
correlates and may trigger emotional responses, and of course one cannot have music without sound.
But musical expression is more than this: it requires the attention to the music qua music, rather than
as mere sounds. The opening "O Fortuna" of Carmina Burana may shock (and indeed scare) the
listener due to its sudden loudness (especially when the bass drum starts whacking away), but this
shock isn't a musical effect--we get the same reaction when we here a sudden "bang" at a fireworks
file:///g|/Tue/London.htm (1 of 6) [18/07/2000 00:33:51]

display or when a car backfires. By contrast, in hearing the opening of Mozart's 40th symphony as
having a quality of restless melancholy, one attends to both the musical syntax and its sonic
embodiment.
Another caveat: as Hanslick has noted, at times a musical work may arouse feelings in the listener
through ad-hoc associations. In other words, one must be on guard for the "they're playing our song"
phenomenon. These associative properties may be quite strong, and can operate in marked contrast to
the innate expressive qualities of a given piece, as in the paradigmatic case of a happy piece that
arouses sadness because it reminds the listener of a lost love or deceased friend. As will be noted in
some detail below, context plays a pivotal role, and here context can include not only genre, but
extra-musical information such as lyrics, the image track of a film score, and literary programs. I take
it, however, that a primary interest for researchers in the perception and cognition of musical
expression will be in the intrinsic expressive properties of the music itself.
Finally, in philosophical discussions of meaning and expression, there is usually what might be called
"the inter-subjective agreement requirement." Here is an example from visual art. If I show you a
picture of a man on a horse, and you and everyone else says "that's a man on a horse," this confirms
that the picture is a successful representation of a man and a horse. Moreover, I don't have to give you
any cues or hints regarding its representational subject. By the same token, in order for a piece of
music to be "an expression of emotion X" (or "expressive of X") there must be broad consensus
among listeners that the music expresses X, a consensus arrived at without any extra-musical
prompting. One problem for accounts of musical expression is that such inter-subjective agreement
often does not happen: one listener says a given piece is an expression of anger, while another says it
expresses hate, another jealousy, and yet another of sinister passion. What emotion does this piece
express? While anger, hate, jealousy, and sinister passion are related emotions, the piece nonetheless
fails to individuate any one of them in particular. Musical expression is plastic enough so that the
same passage might be expressive of a wide variety of emotional states.
2. Simple Emotions, Higher Emotions, and Moods
In the late 19th century Eduard Hanslick famously denied that music had any ability to express
emotions, and many 20th century aestheticians (and composers, most notably Stravinsky) held this to
be true. Why would one take up such a counter-intuitive view? Well, philosophers often take up
counter-intuitive views, and if you are a philosopher, there are two problems to be surmounted if one
wants to claim that a piece of music expresses a particular emotion. The first is the "who" problem:
whose emotion is being expressed? Emotions are felt by living, sentient creatures, and as Malcolm
Budd has noted, "It cannot be literally true that [a piece of] music embodies emotion, for it is not a
living body" (Budd, Music and the Emotions, p. 37). One is thus tempted to claim that a piece of
music is an expression of its composer's emotion. But when one examines the compositional history
of most works this claim also falls apart, for composers often write sad music, for example, even
when they feel no particular sadness (as in the case of Funeral March from of Beethoven's "Eroica"
symphony). Nor are they in the throes of sadness during the entire course of composing a piece of
music, since the compositional process may last weeks, months, or even years (see, for example, the
Adagio Mesto movement of Brahms' trio for horn, violin, and piano, which is purported to be an elegy
to his mother). Thus if pieces of music are expressions of emotion, they are disembodied, and usually
disconnected from any particular "emotional cause" in the life of their composer.
The second problem is the "why" problem: emotions typically require what are referred to in
philosophical parlance as intentional objects, that is, particular people or events that play a causal role
in trigger an emotional state. Thus we are jealous of a particular person, frustrated at a particular state
of affairs, feel grief at the death of a particular friend or relative, and so forth. One does not, for

example, feel "jealous" in general (though one may have a disposition toward jealousy).
Not all emotions are like jealousy and frustration, as some do not always require intentional objects.
While one can be sad due to particular event, one also can be generally sad, for example, and such
sadness is not dependent upon any particular person, state of affairs, and so forth. As Colin Radford
has pointed out, "not all emotions, or occasions of emotion are rational, i.e., they are not informed by,
explained and justified by appropriate beliefs [that is, intentional objects]" ("Muddy Waters," p. 249).
Radford also explicitly acknowledges that "we naturally call such feelings 'moods.'" ("Muddy
Waters," p. 250). Thus there is a distinction between higher emotions (which require an intentional
object) and simple emotions and moods which may/do not.
There is now general consensus that music can express moods and simple emotions, contra Hanslick.
Some aestheticians, most notably Jerrold Levinson, have claimed that in some music contexts music
can do more, in that it is capable of mimicking the characteristic "look and feel" of at least some of the
higher emotions (see Music, Art, and Metaphysics, chapter 14).
3. How music expresses emotions I: Cognitivism
But just how does music express simple emotions? There are two main points of view on this
question. The first, developed (and much defended) by Peter Kivy, is known in philosophical circles
as "cognitivism" or "cognitivist" theories of musical expression. The second, one with a long
historical pedigree, can be termed "emotivist" or "arousal" theories of musical expression. Taking up
the cognitivist charge, Kivy has repeatedly denied that music really arouses what he has termed the
"garden varieties" or real-world instances of sadness, happiness, anger, and other simple emotions in
the listener (though music may move the listener through its sheer beauty). For even simple emotions,
when fully aroused, usually relate to an intentional object. Thus if we say that a piece of music makes
us sad or angry, what exactly are we sad or angry about--the music? ("that damn Symphonie
Pathetique!") or its composer? ("that damn Tchaikovsky!"). And has already been noted, a piece that
seems expressive of happiness may actually trigger sadness due to extra-musical associations.
For the cognitivist, the expressive properties of music are properties intrinsic to the music, and not, to
quote Kivy, "dispositions to arouse emotions in [the] listener" ("Feeling the Musical Emotions," p. 1).
Kivy takes this position from O. K. Bouwsma, but he also acknowledges psychological antecedents
for this view, in particular Charles Hartshorne's The Philosophy and Psychology of Sensation (1934),
and Kivy cites Hartshorne observation that 'Thus the "gaiety" of yellow (the peculiar highly specific
gaiety) is in the yellowness of the yellow' (see ibid., note 2, p. 1). In making this move, one allows
that music that is expressive of sadness need not make the listener sad.
How exactly does music then express emotions if not by arousing them in the listener? Here Kivy,
Levinson, and many others would agree with this explanation given by Malcom Budd (who takes this
view in large part from the music psychologist Caroll Pratt): "music can be agitated, restless,
triumphant, or calm since it can possess the character of the bodily movements which are involved in
the moods and emotions that are given these names" (Music and the Emotions, p. 47). Likewise Kivy
develops a "physiognomy of musical expression" and thus claims that music is expressive of these
basic emotions by its resemblance to human utterance and behavior. Music thus distills certain aspects
of human expressive behavior, especially that of the voice, and renders those aspects into dynamic
musical shapes.. Levinson's claim that music can express some higher emotions (such as hope) is
based on the claim that some higher emotions have characteristic physiognomies that can be musically
portrayed (see ibid.).
Note, however, on this view that in order for the contours of a musical phrase to express an emotion,

one must recognize that this "musical utterance and behavior" is akin to other, non-musical utterances
and behaviors. Thus musical expression is mediated through our understanding of social behavior in
general, and what might be termed a knowledge of "social musical behavior" in particular. It is for this
reason that one may mistake musical expressions in an alien musical culture, not because we do not
know the musical language, but perhaps primarily because we do not know the normative social
behaviors onto which the musical gestures may be mapped.
To sum up so far: the "cognitivist" theory of emotional expression in music says that a piece of music
expresses a particular emotion if a suitably grounded listener is able to recognize that emotion in the
musical structure by analogy to human social behavior, but she need not assume that this emotion was
felt by the composer (or is felt by the performer), nor does the listener have to experience that emotion
while listening.
4. How music expresses emotions II: Emotivism
For many other aestheticians, cognitivism is a necessary but insufficient account of emotional
expression in music. As Jenefer Robinson has noted, not only does music frequently express
emotional qualities, it also frequently affects us emotionally by evoking or arousing emotions in the
listener ("The Expression and Arousal of Emotion in Music," p. 13). But what kinds of feelings does
music arouse? Are they the same as our "ordinary" emotions, or are they special "musical versions" of
emotions? And what is their relationship to our understanding of musical expression?
A common tack taken by a number of philosophers has been to claim that music arouses our
emotions, but in a special way. For Kendall Walton, who approaches all kinds of aesthetic experience,
and not just music, as a special kind of imaginative activity, expressive music "evokes the imaginative
experience of the emotion expressed: more precisely, music expressive of sadness, say, induces the
listener to imagine herself experiencing sad feelings" (this cogent summary of Walton is from Jenefer
Robinson, op. cit., p. 18). In other words, for Walton our emotions aren't really aroused, but we
imagine they are. For Stephen Davies and Jerrold Levinson, expressive music really does arouse the
listener's emotions, but emotions of a greatly attenuated kind--"sadness lite", for example. As Kivy
has noted with respect to their theories, such emotional arousals "must be weakened, . . . because they
do not have the power to make use behave the way those emotions would do in ordinary
circumstances" ("Feeling the Musical Emotions," p. 11). For Kivy, champion of cognitivism, this is
inadequate. We do not have imaginary or stunted emotional responses when we listen to expressive
music, but real, full blown feelings--albeit feelings of a special kind. For Kivy, what moves us is sheer
musical beauty, and this beauty may be emotionally individuated: "Sad music emotionally moves me,
qua sad music, by its musically beautiful sadness, happy music moves me, qua happy music, by its
musically beautiful happiness, [and so on]" ("Feeling the Musical Emotions," p. 13). For all of these
philosophers, however, what we do not experience when we listener are ordinary feelings of sadness,
happiness, serenity, or so on. Musical emotions are always of a different order.
Jenefer Robinson takes a different approach, one that tries to avoid making musical expression a
special case. She considers most carefully what we really do feel when we hear expressive music, and
then what we make of those feelings: "As I listen to a piece which expresses serenity tinged with
doubt, [for example], I myself do not have to feel serenity tinged with doubt, but the feelings I do
experience, such as relaxation or reassurance, interspersed with uneasiness, alert me to the nature of
the overall emotional expressiveness in the piece of music as a whole" ("The Expression and Arousal
of Emotion in Music," p. 20). Robinson takes care to note that "the emotions aroused in me are not the
emotions expressed by the music" (p. 20), and so for her it is not simply that sad music arouses
sadness. Rather, our basic feelings--or perhaps "reactions" is a better term--of tension, relaxation,
surprise, and so forth, are combined with our awareness of the musical gesture and syntax, and

through this combination we gain a sense of what emotion(s) a piece may express.
5. Conclusion and implications for research
As is now clear, while philosophers of music now generally agree that music can express at least some
emotions, there is much disagreement as to which particular emotions can be expressed, whether or
not such expression depends upon arousing an emotional response in the listener, and if so, what kind
of feelings exactly music does arouse. Nonetheless, philosophical discussions of musical expression
have a number of implications for research in music cognition and perception.
1. Musical expression always involves sonic properties, and to things like loudness and
roughness I would add the rhythmic properties of sounds (as indicative of coordinate
movement, spatial location, and so forth). Moreover, alterations to the "sonic" properties
of a musical passage may be made without changing its basic melodic or harmonic
structure--the same melody and accompaniment played high, fast, and loud may convey a
vastly different expressive character from its low, slow, soft version (the locus classicus
of such variations is the various presentations of the idée fixe in Berlioz's Symhonie
Fantastique).
2. If one uses "real world" musical stimuli, especially well-known repertoire, one will
often be faced with "associative interference," as one cannot control the contexts in
which subjects have first heard and come to know such repertoire. Therefore in many
cases newly composed or otherwise unfamiliar musical stimuli may be preferable, as they
circumvent such interference.
3. While music alone may only express a garden-variety emotion, such as anger, that
same music in a richer semantic context may be properly heard as an expression of
jealousy or hate. Different visual and/or linguistic cues will give different expressive
results. Moreover, a level of musical activity that is most apt for one particular emotion
may be inapt for another. For example, a passage that expresses "anxious anticipation"
very well will not be made more expressive by making it louder, faster, and so forth.
There isn't a simple linear relationship between musical parameters and the robustness of
an emotional expression.
4. Some perfectly good musical expressions of emotion may not arouse those emotions
(or much of anything, for that matter) in the listener. Yet it would be incorrect to call
such passages "inexpressive."
5. Any emotions that are aroused by listening to music, while perhaps similar to "real"
emotions that occur in non-musical contexts, nonetheless have important differences.
Even if context provides an intentional object for an emotion, transforming a yearning,
longing passage into an expression of hope (to take an example from Levinson), it is not
at all clear that the listener should feel hopeful, what she should be hopeful about, and so
forth. Moreover, such hope (and its emotional stimulation) is commingled with other
aesthetic properties--balance, beauty, intensity, coherence--and those properties may (and
most certainly will) also stimulate affective responses of their own.
Works Cited
Budd, M. (1985, 1992). Music and the Emotions: The Philosophical Theories. New York,
Routledge.

Davies, S. (1994). "Kivy on Auditors' Emotions." The Journal of Aesthetics and Art Criticism
52(2): 235-36.
Goldman, A. (1995). "Emotions in Music (A Postscript)." The Journal of Aesthetics and Art
Criticism 53(1): 59-69.
Graham, G. (1995). "The Value of Music." The Journal of Aesthetics and Art Criticism 53(2):
139-53.
Kivy, P. (1989). Sound Sentiment. Philadelphia, Temple University Press.
Kivy, P. (1993). "Auditor's Emotions: Contention, Concession and Compromise." The Journal
of Aesthetics and Art Criticism 51(1): 1-12.
Kivy, P. (1994). "Armistice, But No Surrender: Davies on Kivy." The Journal of Aesthetics and
Art Criticism 52(2): 236-37.
Kivy, P. (1999). "Feeling the Musical Emotions." British Journal of Aesthetics 39(1): 1-13.
Levinson, J. (1990). Music, Art, and Metaphysics. Ithaca, Cornell University Press.
Martin, R. L. (1995). "Musical "Topics" and Expression in Music." The Journal of Aesthetics
and Art Criticism 53(4): 417-24.
Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago, University of Chicago Press.
Radford, C. (1989). "Emotions and Music: A Reply to the Cognitivists." The Journal of
Aesthetics and Art Criticism 47(1): 69-76.
Radford, C. (1991). "Muddy Waters." The Journal of Aesthetics and Art Criticism 49(3):
247-52.
Robinson, J. (1994). "The Expression and Arousal of Emotion in Music." The Journal of
Aesthetics and Art Criticism 52(1): 13-22.
Back to index

TueAM6_3 Addis
Proceedings paper
EMOTION IN MUSIC AND LANGUAGE

Laird Addis
Department of Philosophy
The University of Iowa
Iowa City, Iowa 52242, USA
In a recent book (Addis, 1999), I developed a theory of emotion in music that, in the spirit of Susanne
Langer’s work of midcentury (Langer, 1942), maintains that passages of music represent emotions to
the listener. On my version, this fact about music and humans rests on certain facts of our human
nature and on what I call the ontological affinity of consciousness and sound. In short, that music does
represent emotions and possibly other states of consciousness to us is grounded in our being the kind
of species we are and the basic natures of mind and music. The representation, therefore, is not purely
conventional like that of language, nor is it purely natural like that of thought itself. But because it is
more nearly natural than conventional, I describe it, somewhat infelicitously, as a quasi-natural
representation.
These two features--ontological affinity and quasi-natural representation--are, or may be, sufficient to
account for the power of music in human affairs. But I also follow Langer in holding that, as a further
aspect of its power, music is able to represent--or "express," as might be the more natural word in this
context--certain aspects of reality that language cannot express. More precisely put, we may say that
there are some nuances or subtleties of emotion that music can, but language cannot, capture. This is
sometimes called the ineffability thesis even though, strictly speaking, that is a theory only about what
language cannot do and not one of one music can do. Langer’s version of the ineffability thesis is
much more radical than mine in that she holds that there are forms of states of affairs in the world
itself that are in principle unrepresentable in language because they are not of the subject/predicate
form. My much milder version is only that, while all states of affairs including those involving
emotion are of the subject/predicate form, no language contains the predicates (or a means of
generating the predicates) that would be adequate to the expression or description of all the subtle
differences of mood and emotion that humans are capable of experiencing or conceiving. Nor is it that
music can, even in this domain, do everything that language cannot do, but only that it can do some of
what language cannot do.
A friendly reader and critic, Bruno Repp, while not necessarily disputing my conclusion, has
suggested that I have not yet made my case with respect to the limits of language as compared to
music. My failure, he says, derives from the fact that (and here I quote from a private communication,
with his permission) "whenever you refer to language in relation to music, with regard to ineffability
for example, you ... seem to have in mind the grammatical and semantic properties of language, even
when referring to poetry. There is no reference to the prosody of spoken language--its rhythm,
intonation, volume, timbre, tempo, and so on. It seems to me that there are close analogies between
file:///g|/Tue/Addis.htm (1 of 6) [18/07/2000 00:33:53]

TueAM6_3 Addis
language prosody and music, but you did not touch at all on these analogies in your arguments"
(Repp, 1999).
To a failure to consider matters I should have considered, I do plead guilty. But assuming that the
results I would have come to then had I considered the relevant analogies and the properties of
language that Repp mentions are the same as those I shall arrive at in this paper, my conclusions
would have been unchanged. At the same time, consideration of these matters will, if I am successful,
enrich our understanding of the multiple ways in which emotion "connects" (to use an intentionally
vague word) with both language and music. And I also want to use the occasion to speak of some
broader issues concerning what language is in its spoken and written forms, what analogies there may
be between reading and performing, and other matters. I shall eventually conclude that there are three
major ways in which emotion connects with both music and language, but that their relative
importance to each other is quite different between language and music.
Let us begin by fixing firmly in mind a distinction that is crucial to the understanding of both
language and music but which is frequently ignored and sometimes even denied, at least for music.
Consider the following passage that might appear in a probably very bad short story: "Upon reading
the editor’s letter of rejection, Ernest cursed, crushed the letter in his hand, threw it into the fire, and
burst into tears." The emotions represented in this passage are clearly disappointment and anger, but
the emotions you felt on hearing and understanding the passage were more likely mild amusement,
possibly slight pleasure, or maybe just indifference. The point is, of course, that the emotions
represented by the passage are not the same emotions as those felt by the listener. Exactly the same is
true of music. The emotions I feel when I hear and understand sad music may be--if, for example, my
daugher is the performer and doing it well--pride, happiness, and excitement. I recognize the music as
expressive of sadness, but I certainly do not feel sad in so doing. We have before us, then, the
fundamental distinction between the emotions that a given piece of music expresses or represents on
the one hand and the emotions that the music, as heard by a particular person on a particular occasion,
arouses in that person. My theory, like Langer’s, is primarily about how music expresses, and not how
it arouses, emotion. Still, we now have before us two of the three major ways in which music and
language connect with emotion: by expressing it and by arousing it. But before we come to the third
connection, let us consider the nature of language and its relation to emotion.
Language is at once one of the most familiar and one of the most recondite of phenomena for human
beings. Language is an intimate part of our lives, playing its role in most of our waking hours in either
spoken or written form, and is the primary mode of communication and perhaps of interaction
between and among the members of our species. What could be "closer" to us than language? Putting
aside the fascinating intricacies and mysteries of language learning and the role of language in our
evolutionary heritage, I want to attend momentarily to the seemingly mundane question of what it is
to be a word. The simple, tempting answer is that it is to be a sound in the case of spoken language
and a shape in the case of written language; and for many purposes that answer, false though it be, is
adequate. That it is a false answer, or at least an incomplete one, can be seen immediately by asking
ourselves why it is that certain sounds and shapes are, but others are not, words. The answer, of
course, is that the privileged ones are the ones that have meaning; and if we ask what it is for a sound
or a shape to have meaning, we see that in no case is it some intrinsic property of the sound or shape
but instead something in us, and essentially so. Thus a word is, in some way or other, a combination
of publicly observable sound or shape and some feature of human consciousness. Simply as sound or
shape, nothing is a word, that is, a part of language. What I want to stress here is that the notion of
language apart from consciousness is unintelligible; nothing is language or a part of language except
as including consciousness in some way.

TueAM6_3 Addis
Now let us turn to music again and consider briefly what a score is. Not all music, probably not even
most music that humans have ever produced, involves scores just as most instances of spoken
language have no prior written form. We needn’t ponder the many senses of the word ‘music’ to agree
that in probably its most important sense, music is sounds of a certain sort. In that sense, a score is not
music; and its creator, the composer, does not create music. Speaking in this way, we might better say
that the composer creates instructions for producing instances of a piece of music much as the person
who writes down a recipe for a new soup provides instructions for making instances of a soup
without, thereby, making soup. When the production of music does result from the prior construction
of a score--that is, a physical object that, according to certain conventions "tells" the performer to
produce certain sounds--then music is, as Nelson Goodman puts it, a "two-stage" art (Goodman,
1976); and we recognize the creators of both stages--composer and performer--as artists, each with a
creative task to be undertaken. We might note that what is achieved by the use of scores could in
principle be accomplished by written or even spoken language, for there is nothing in what a score
"tells" a performer that cannot be said in language. Keeping this fact in mind will diminish the
temptation to think of a score as music in any literal sense.
What I have just said about scores may be obvious and uncontroversial, but I want now to argue
something that is by no means uncontroversial--that what we call written language is not really
language at all in just the sense that a score is not really music. I have long harbored the view or
suspicion that not just drama but also poetry and even the novel in the literary arts are best conceived
as performing arts (maybe all arts are performing arts!) not because I believe with the
deconstructionists and other postmodernists, as I most emphatically do not, that there is no fact of the
matter with respect to meaning, but because a poem or a novel exists as such, that is, as something
with meaning, only when it is being read or recited. The book as physical object is not a novel--it is
just ink shapes on paper--but, instead, instructions for creating the novel which exists only when
someone is in the act of reading or reciting the novel. There are also the limiting cases of reading
silently and reciting in one’s mind which are analogous, respectively, to going through a score and
hearing the music in one’s mind, as we say, and running a piece through one’s mind. These activities
are extensions of speaking language and performing music, not of written language and score, and it is
only spoken language, including sign language, that is literal language just as sounds alone are literal
music.
If we do include reading to oneself, silently or not, and hearing music in one’s head as limiting cases
of speaking and making music, then the sonata and the poem may already have existed when the
composer and the poet were putting ink marks on paper. Be that as it may, we can affirm that nothing
is taken away from the significance of the novelist or poet, any more than from the composer, if we
insist that in a certain sense all they do, at least publicly, is to put certain shapes on paper or to cause
them to be so put. For in doing so, they make possible, through elaborate conventions, for certain art
objects, literary and musical, to come into existence, and that is achievement of the highest order for
our species.
No score is a piece of music; no book (as physical object) is a novel. Music and novels exist only as
inherently temporal objects, as sounds or the auditory images of sounds and the meanings that attach
to those sounds, either conventionally or naturally. Thus I come to the intermediate conclusion,
perhaps already vaguely known to us, that the issues of the connection of emotion to music and
language pertain only, or almost only, to music as sounds and to language as spoken, not to music as
score or language as written. And with that we must turn now to those issues directly.
Suppose, to modify our earlier example, that Ernest is not a character in a story but someone whom a
friend of yours likes very much but whom you, secretly, intensely dislike. Having just observed

TueAM6_3 Addis
Ernest, your friend says to you, in a tearful voice, that upon reading the letter from the publisher,
Ernest cursed, crushed the letter and threw it into the fire, and burst into tears. In this case, as before,
we can distinguish the emotions represented by the content of those words--the disappointment and
anger of Ernest--from the emotions aroused in the hearer of the words. The latter, because you dislike
Ernest and also consider him to be a terrible writer, we may imagine to be pleasure and satisfaction.
Thus again we have our distinction between emotion expressed or represented and emotion aroused.
But what about the emotion of the speaker? When I described your friend’s voice as a tearful
one--alluding thereby to its pace, volume, pitch, intonation and whatever other such properties make
for a tearful voice--I was indicating that the speaker was expressing some emotion--sadness, we may
gather--by the manner in which those words were uttered. Notice that I have used the language of
expression in two of the three aspects, and we must understand clearly that the sense in which the
words expressed Ernest’s anger is very different from the sense in which they expressed your friend’s
sadness. For your friend felt sad and that feeling was manifested in how she spoke. But words, either
as sounds or as sounds plus meaning, don’t feel anything. So let us drop the language of expression
and speak simply, as I already have, of emotion represented (Ernest’s anger), emotion aroused (your
pleasure), and emotion manifested (your friend’s sadness). And notice, before we turn again to music,
that those features of language to which Repp called attention and which I ignored in the book
contribute not to the emotion represented by the speaker’s words but to the manifestation of the
speaker’s own emotion. And of course it is extremely important in human affairs to be able to discern
another person’s emotional state from that person’s manner of speaking, the inability to do so being
one of the main symptoms of autism.
How, if at all, does this three-fold distinction apply to music? If it does apply, is it in a way similar or
analogous to that of language? When an orchestra performs the funeral march of Beethoven’s 3rd
Symphony, sadness is being represented. What emotions are aroused in the listeners--happiness, pride,
nostalgia, envy, maybe even also sadness--will depend on the particular listener’s circumstances. Is
there also emotion manifested, presumably by the performers? Does it matter, insofar as this is a
performance of music? The power and accuracy of the representation of emotion by the music will
vary from performance to performance as those performances themselves differ. And the relevant
differences in performance will, obviously, depend mostly on properties of the performer, although
also on the nature of the instruments, the temperature and humidity, the accoustical properties of the
place of performance, and other factors. Of the factors that are properties of the performer, some will
be that person’s emotional states. Thus, some differences in performances are due to differences in
emotional states of performers; and all performances, we may surmise, depend in part on performers’
emotional states, at least in the sense that if their emotional states were radically different from what
they are, the performances would be different.
But does the performance manifest the performer’s emotional state? That is, can a normal or typical
listener, whatever that might exactly be, make a reasonable conjecture as to the performer’s emotional
state from the character of the performance, much as you could discern your friend’s sadness from the
character of the utterance of her words to you? If you know the performer well, and especially if you
include visual as well as auditory properties as those of the performance, you may well be able to
discern the performer’s emotional state with some measure of reliability. But if we know nothing of
the performer and must rely only on what we see and hear in the performance, and especially if only
on what we hear (which is, arguably, all that really constitutes the performance), we can tell very
little; if we do make conjectures, we may learn later that they were often mistaken.
But now I ask again whether or not the manifestation of emotion or even its existence in the performer
is of any musical significance? It is one of the leftover myths of Romanticism that the important thing
about musical performance is that the performer is expressing him- or herself in performance. It is a

TueAM6_3 Addis
myth, not because it is false that the performer is expressing in performance, but because it is trivial
and irrelevant. It is trivial because almost everything a person does voluntarily is an expression,
intentional or not, of some aspect of oneself. It is irrelevant because the value and meaning of the
music and its performance lies in its observable characteristics and not in how it came about; it is only
a causal question of how the inner states of the performer were relevant to the observable
characteristics of the performance and only a biographical question as to what those inner states were.
Yet the myth is so powerful that most people, including most musicians, are unable even to conceive
any way to think of music except as expression by the performer whereas before the eighteenth
century almost no one would even have understood the idea of music as personal expression. That fits
well with--indeed, it partly constitutes--not only the ideals of Romanticism that continue to plague our
musical culture but also with the radical individualism of our ideological culture.
Be all that as it may, the lesson I wish to draw from these reflections is that while the phenomenon of
manifestation of emotion certainly occurs in the performance of music, it is essentially irrelevant to
grasping the character and meaning of the music and to the realm of music generally. Again, as a
matter of causal fact, what comes out of us depends on what is inside us including our emotional
states. But music must never be understood as about the emotions of the performer (except per
accidens) nor, except in the trivial sense, as an expression of those emotions. We rightly praise the
composer and the performers for their role in bringing about music; but what emotional states they
happened to be in while exercising their roles is only of biographical, never musical, interest.
If we now ask why there should be such an asymmetry between music and language with respect to
the manifestation of emotion, the easy and obvious answer is that in the typical use of spoken
language, the state of mind of the speaker is just what the listener is presumably interested in--the
speaker’s beliefs, intentions, values, and emotional states. The content of what is said goes mainly to
belief, the manner of speech mainly to emotional state, and intentions and values somewhere in
between, so to speak. But in music, what corresponds to manner of speaking--pace, intonation,
volume, pitch, and so on--are themselves part of the content and therefore of the meaning of the
music. This fact also explains why it is so difficult, in observing a performance, to distinguish the
properties of it that may be manifestations of the performer’s emotions from those that, intentionally
or not on the part of the performer, contribute to the emotion represented by the music and why,
except in extreme cases, we are ordinarily entitled to take the properties of the performance as
representing and not manifesting emotion.
At this point someone may object that the relevant analogue in language to the performance of music
is not everyday conversation of the sort of my example but instead what are also regarded as
performances such as public readings of poetry. My answer is that, to the extent to which in such
cases the manner of speech does contribute to the meaning of the poem being read, it is a musical
performance. The Greeks, if I may pretend to be learned for a moment, characterized any art in which
the Muses presided as mousik_ techn_, which is the origin of our very word ‘music’. But perhaps I
can soften the seemingly stipulative character of this reply by observing that, in this highly atypical
use of spoken language, it is very difficult, as in musical performance, to disentangle which aspects of
the reader’s utterances are to be taken as part of the content of the poem and which the manifestation
of the reader’s emotional states while, unlike the case of music, it becomes very important to do so.
But nothing important rests on where exactly, in what we now may take to be something of a
continuum between what we call language and what we call music, we place poetry reading, acting,
lying, ceremonial speaking, and all of the other many ways in which we use language. But we may
certainly say, and even stress, that when a manner of speech contributes more to the meaning of the
words than to the manifestation of the speaker’s emotional state, we have a special use of language
that, in a very important respect, is more like music than the typical use of language.

TueAM6_3 Addis
We return finally to the matter of ineffability and my claim that music can represent to us some
differences of emotion and mood that language cannot. Repp correctly pointed out that I had ignored
the prosody, in a broad use of that word, of language in making my claim and wondered, at least
implicitly, whether or not taking it into account would tell against the claim of ineffability. What we
now see, if I am correct, is that while those prosodic features do have a connection with emotion in
the use of language, they are typically that of manifestation rather than representation of emotion.
Because of this fact and because, as we also saw, the prosody of music does contribute primarily to its
content and therefore its meaning, the case for the power of language to represent anything music can
represent is not strengthened by taking those features of language into account. This, obviously, does
not prove that music actually can represent something that language cannot, but it removes one
apparent consideration to the contrary.
In fact, I have not here made any substantive argument that music really represents anything at all,
much less that what it represents is mood and emotion. Those arguments are in the book. But perhaps
enough has been said to conclude that in whatever the meaning of music is to be found, it is not in the
emotions of either the listeners to, or the performers of, music. This may sound odd, even
counterintuitive, given the way most people casually think and talk about music. But I am convinced,
and hope in the foregoing remarks partly to have demonstrated, that the understanding of music and
its power in human affairs requires that we look beyond the comparatively superficial facts of the felt
emotions of the participants in music to something that is represented to us by music.
REFERENCES
Addis, Laird (1999). Of Mind and Music. Ithaca: Cornell University Press.
Langer, Susanne (1942). Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art.
Cambridge: Harvard University Press.
Goodman, Nelson (1976). Languages of Art: An Approach to a Theory of Symbols. Indianapolis:
Hackett.
Repp, Bruno (1999). E-mail message to Laird Addis.
Back to index

L'amplificazione delle discontinuitÃ in stimoli acustici
Proceedings paper
Discontinuity Amplification in melodic perception

Fabio Cifariello Ciardi
Conservatorio di Musica di Perugia, Econa
f.cifariellociardi@edisonstudio.it
1. Introduction
Our ability to parse the acoustic array into its many meaningful sound objects rely on our capacity to process sensorial information.Variables that influence perception
of sonic entities in every day life can be broadly organized into three sets. The former refers to subject-dependent factors as attention, memory, knowledge as well as
gestures and postures. The second group is environment-dependent factors as masking, interference that can arise from interactions among different sonic events. The
latter includes the cognitive correlates of the physical parameters that define each sonic event.
The possibility to interact with these variables facilitates our ability to answer to specific tasks in cognitive processing. We can modify our attention or posture, and we
can move our body with the purpose of optimizing hearing conditions. In the same way, we can change cognitive strategies in order to fulfill different needs and
improve our adaptability to different situations. Under certain circumstances we can improve perception by partially modifying the acoustic environment around the
sonic event inorder to attenuate the interference from other events. Still, we can directly alter the sonic event itself by changing some of its parameters.
By means of these actions we can affect the segmentation of the sonic surface into cognitive units useful for further cognitive processing regardless any cultural
difference (Dowilng, Jay & Harwood, 1986). Such units interact in terms of structuring forces in the processes underlying the definition of cognitive correlates of a
sonic event.
A typical examples of sound parameters partial modification can be found in musical practice when musicians gradually modify the result of their sight-reading by
adding nuances and micro-variations to the their performance. The role of such variations has been studied in several computer models of musical performance
(Frieberg, A. & Sundberg J., 1995). Generally, these models allow more convincing results for tonal than for non-tonal music. Weaker results for several compositions
of twenty century may be linked with the lack of coherence between the structuring rules defined by the composer and strategies the listeners apply during the
performance of the composer's work (Imberty, 1996).
In order to face the issue it might be useful to rely on concepts that are likely to be applied in any acoustic context rather than notions linked with specific musical
grammars (e.g. the link between harmony and tonal music).
Within this framework the aim of this study is to investigate the concepts of continuity and discontinuity. The interest in these categories is twofold.
1. Continuity and discontinuity may be considered as perceptual qualities that derive from variations on many of the sonic event dimensions
file:///g|/Tue/Ciardi.htm (1 of 11) [18/07/2000 00:33:59]

2) The same categories are likely to be applied over different time spans.
The second part of the paper will describe the basic principles underlying a model for discontinuity amplification of short monophonic fragments. We suppose that
listener segmentation of the sonic event may be affected by means of frequency and duration micro-variations that modify discontinuity perception. In the last part
experimental results from research on melodic fragments modified according to this model will be briefly outlined. A deeper analysis of the results is included
elsewhere in these proceedings (Olivetti Belardinelli et al., 2000).
1. A Continuity/Discontinuity framework
Several studies stress the relation between the variation on certain dimensions of a sonic event (frequency, amplitude, duration) and listener ability to organize the
acoustic surface into meaningful "chunks" or segments.
Segmentation of a sonic event have been linked with amplitude (Fraisse, 1974; Vos, 1977; Gabrielsson, 1973) frequency (Tekman, H.G., 1998), duration (Vos, 1977)
and timbre variations (Dowling, 1973).
As well as pitch and loudness are perceptual correlates of frequency and amplitude, discontinuity and continuity may be perceptually correlated with the
presence/absence of perceptible variations on these or others dimensions.
Therefore, from this point of view, subjects' ability to extract cognitive units during the listening experience may be related to their ability in the detection of
discontinuities and continuities on multiple acoustic dimensions.
As a consequence of their cognitive nature, continuities and discontinuities may also emerge in absence of any notable acoustic variation. While istening to a regular
sequence of pulses, listeners often add discontinuities that lead to the perception of "strong" and "weak" pulses. Moreover, discontinuities on one dimension may
emerge from the variation other than the first one. Frequency, amplitude and duration strongly interact in simple acoustic stimuli (Tekman, H.G., 1997).
The perception of continuities within a stream of sensorial information leads towards the compression and, consequently, to the reduction of the information.
In this case the result of cognitive processing is influenced by subject skillfulness in shadowing non-relevant information. An example of such selective listening is the
classical Cherry's "Cocktail party effect" (reviewed in Wood & Cowan, 1995). Selective listening is also crucial in auditory scene analysis when we face the problem of
separatating sounds reaching hears simultanously (Bregman, 1990). In other words: the more information reduction lacks to be selective, the more continuity perception
leads to a generalized deterioration of the information. On the other hand, selective information reduction facilitate cognitive processing for those entities that exhibit
temporal coherence: attenuation of meaningless variations permits the perception of the event as a "whole". Moreover, unitary perception of one dimension may ease
other dimensions processing (e.g. continuity in pulse may facilitate melodic or harmonic processing). For instance, because of possible interference and distortions,
amplitude variations of speaking voices are often compressed in radio transmission in order to guarantee a better compression of the spoken text.
Subjects tend to preserve perceived continuities as long as a new variation in the information stream is detected. This aptitude is coherent with listener tendency to
produce expectations that are influenced by past information. (Dowilng, Jay & Harwood, 1986)
On the other hand, discontinuity perception determines an increase of information. In this case the result of cognitive processing rely on the effectiveness of subject's
strategies used for information organization. As an example, the more different pitches the composer uses, the strong correlation among pitches should be perceived by
the listener in order to process the acoustic information.
Correlation is more evident when discontinuities on two different dimensions take place at the same time (e.g. a simultaneous change in pitch and duration). When that
happen information is amplified by means of distinctiveness: stressed variations will lead to the perception of the sonic object in term of multiple units clearly marked

by simultaneous discontinuities.
On the contrary, when several discontinuities are perceived within a short time-span, ambiguity may arise from mutual masking, interactions or distortions.
Complexity of twenty-century music often induces listener to focus on continuities. When this is the case continuity perception may sometimes emerge as a cognitive
reaction to discontinuities redundancy. As an example, the very complex superimposition of different rhythmic structure in Xenakis, Ligeti, Penderecki music forces
listeners to process acoustic information in terms of texture more than in terms of polyphonic contrapoint. On the opposite side, continuity abundance may force
listeners to concentrate on few detected discontinuities. As an example, an expressively call for a sharper discontinuity perception seems to arise from the over-repeated
patterns of Reich and Riley "minimal" music. An even more extreme case might be J. Cage 4'33'': during the silent performance of the pianist listeners are forced to
switch their attention to the aleatory small discontinuities produced by the public itself.
Musical practice show several examples in which specific segmentation may be emphasized by means of an amplification of the continuities and discontinuities
underlining the musical surface.
In music performance interpretation of a given score may be suggested through the acoustic realization of a segmentation hypothesis influenced by instrumental
peculiarities and esthetical/historical reasons. In a broad sense, also the variation strategies defined by the composer during the different stages of the creative plan may
be interpreted as continuity /discontinuity amplification of an initial idea.
Amplification may arise from the increment of the difference between two values on the same dimension (e.g. accentuation of dynamic contrasts). More complex
amplifications may be obtained when a discontinuity on one dimension is perceived by means of other dimension variations. For instance, professional baroque music
players know very well how to amplify the discontinuity produced by a dynamic accent just with a slight prolongation of tone duration (Harnoncourt, 1984).
Since in most occidental music duration are explicit and dynamic indication are much more vague, discontinuities added by performers mainly arise from variation on
amplitudes (accentuation, dynamics) and, to a slighter extend, on duration (articulation). These variations often determine complex timbre alterations specifically
related with the acoustic of musical instruments. Other discontinuities may be acquired through temporal deviations from a regular pulse (accelerando / rallentando,
rubato).
Commonplace knowledge assumes that the structuring of patterns is affected by the quick amplitude discontinuity (i.e. accentuation) of one or of several elements.
These discontinuities are situated most spontaneously at the beginning or at the end of the pattern (Fraisse, 1974, 1982).
Rhythmic grouping is also influenced by a slight increase in duration. According to Vos (1977) tones that are separated by short intervals are perceptually grouped
together. Other studies have shown that easier-to-detect segments have short duration at the beginning of the pattern and long duration at the end. Generally,
experimental data confirm most of the assumptions reported by Leopold Mozart, Carl Philip Emanuel Bach and others eighteen and nineteen centuries music theorists.
2. Principles for discontinuity amplification in melodic processing

Assuming these results within the discontinuity/continuity framework we may hypothesize that strong discontinuities on a certain point of the sonic surface will lead
listeners to perceive that point as the beginning or the end of a segment.
Starting from this supposition the model we present in this article is aimed at amplifying discontinuities in short melodic by means of amplitude and temporal
micro-variations.
The input of the model is a melodic fragment described only in terms of frequencies and duration (i.e. a MIDI file with timbre and dynamic being constant). The model
procedures has been divided into an analysis phase and an amplification phase. The analysis phase include 1) detection and evaluation of a limited number of

continuities and discontinuities, 2) production of a "well-formed" set of segmentations (i.e. segmentations that are not inconsistent with the previously detected
discontinuities), 3) hierarchical organization of the segments on two layers. In the amplification phase dynamic and temporal micro-variation are added to the notes of
the original stimulus in order to emphasize a specific segmentation. The output of the model (Fig.1) is a score where each melodic fragment is presented in two
versions. The former presents the results of the first two steps of the analysis phase. The latter present the final two-layer segmentation and the dynamic (accents) and
temporal (articulations) micro-variations to be perceived in term of discontinuity amplification.
Continuity/Discontinuity detection and evaluation
Considered discontinuities are linked to the detection of pitch, duration and contour variations while reviewed continuities are linked to the perception of regular pulses.
According to literature on pattern organization (Lerdahl & Jakendoff, 1983 among others) detection criteria are based on Gestalt grouping principles (i.e. Proximity,
Similarity, and Good Continuation). Evaluation is mainly based on constraints empirically acquired from musical practice.
Possible differences in continuity/discontinuity strength have been reduced on a strong/weak scale. Strong and weak discontinuity are marked respectively with 'x' and
'o' below the tone that indicate the discontinuity. Vice versa, strong continuities are marked with 'o' above the note where continuity has been detected whereas weak
continuities are marked with 'x' (that is continuity/discontinuity notation is discontinuity-oriented).
Continuities and discontinuities may be detected on different temporal spans (i.e. segmentation layers) that are perceived as hierarchically organized in binary or ternary
temporal structures (Lerdahl & Jakendoff, 1983). Our analysis phase takes into account the discontinuities only on the lowest temporal level (i.e. discontinuities
between contiguous tones). On the other hand, continuities are considered on a two-layer basis.
Pitch discontinuities detection is based on the following frequency proximity rule. Being two pitches n1, n2, a discontinuity will be marked on n2 if the interval |n1-n2|
is above a given threshold value.
On an empirical basis the model take the major second as threshold value (i.e. pitch variation below the minor third threshold do not determine relevant discontinuities)
and assign a weak discontinuity (o) if |n1-n2| > major second or a strong discontinuity (x) if |n1-n2| > major third.
Duration discontinuities are detected by means of duration similarities rule and onset proximity rule. According to duration similarities rule being two tones with
duration d1 and d2, a discontinuity will be marked on d2 if the difference |d1-d2| is above a given threshold value. In our model duration threshold is 1/3 of the shorter
tone. A weak discontinuity (o) is assigned if d2<d1; a strong discontinuity (x) is assigned if d2>d1.
Onset proximity rule state that being three tones with onset time t1, t2, t3, a discontinuity will be marked on t2 if the difference between delta times t2-t1 and t3-t2 is
above a given threshold value. For the present purposes onset threshold has been empirically set to 1/3 of the shorter delta time. Since a pause physically produce an
evident discontinuity pauses above the threshold value always determine a strong discontinuity marking (x).
Contour discontinuities detection is based on Good Continuation principle. Being three pitches n1, n2, n3 a discontinuity will be marked on n3 if n3 show a change in
the melodic contour defined by n1 and n2. Within this context a contour change has been marked with a weak discontinuity (o) except for n2â‰ n1 and n3=n2 (i.e. the
repetition of the same tone). In this case a strong discontinuity (x) has been assigned on n3.
Regular pulses convey continuity over several temporal spans.
The model takes into account only two regular pulses. The first one (p1) is binary or ternary multiple of the shortest perceptible duration above a defined threshold.
Threshold is necessary in order to separate short duration that operates as basic element of a temporal hierarchy from short duration that is inconsistent with rhythmic
structure of higher level (e.g. grace notes and embellishments within a defined metrical context). Because of such lack of coherence the latter is more likely to be
perceived as a discontinuity.
Binary or ternary preference is assigned by researching the best-fitting correlation between pulses and onsets of each tone.
The second pulse (p2) that has been considered in the model is the binary or ternary multiple of p1. In this case, an evaluation of the correlation between pulses and
previously defined discontinuities is invoked in order to assign the preference: the more coincidences between the pulses and the tones marked during the discontinuity
detection phase, the more probable the p2 pulse will be.
Pulses are marked above notes. Detected changes in pulse subdivision have been marked as 'x' otherwise a 'o' has been annotated. A change in pulse that occurs just on
the last note of the excerpt due to an evaluation of its length has not been considered.
Segment definition
Early research (reviewed in Fraisse, 1982) has shown that extreme temporal limit for the perception of a single group can not be more than 5 sec. According to these
results our model allows continuity/discontinuity amplification only on three temporal layers. The longest (L3) coincides with the beginning and the end of 7-9 sec.
melodic fragment, and has not been marked on the score. The second (L2) divide the excerpt in two or three segments and is marked. The latter (L1) divide each L2
segment in 2-3 sub-segments. A third sub-level (L0) is not considered in the amplification phase but is used to generate segments that will be subsequently grouped on
the L1 layer.
The segment production step is threefold. Firstly, all possible L0 segments are produced by means a small set of rules. Secondly, produced segments are reduced by
means of constrains. Thirdly, segments are grouped on L1 and L2 layer through a set of other constrains.
Rules used for L0 segments are the followings.
1) Each segment must contain at least two distinct sounds.
2) Each segment must begin on a marked tone and should end on marked or unmarked tone.
Constrains used to reduce the number of L0 segments may be biased either by discontinuity or continuity. In our hypotesis continuity and discontinuity behave as
powerful structuring forces in subjective auditory organization during music composition, performance and listening. Emphasis on one or the other relies upon
individual tendencies but has its roots in historical and esthetical traditions. For instance, today baroque music performances emphasize discontinuities whereas
continuities were much more stressed in early twenty-century interpretations (Harnoncourt, 1984).
Constrains based on discontinuity stress the contrast among inner differences thus induce further subdivisions of the excerpt. On the other hand, more
continuity-oriented constrains often lead to the perception of a unified auditory image.
In the present version our model is more influenced by discontinuity constrains. That is preferred segments to be grouped in level L1 are those that begin and end with
tones marked with a discontinuity.
Each group of two or three L0 "well-formed" segments will produce a segment on L1 layer. Segments of similar duration have been preferred among others. That
means that segments of different length are likely to be assigned to different layers more than grouped on the same temporal span. L1 segments are combined into L2
segments according to the same principles.
Discontinuity amplification
In the discontinuity amplification phase amplitudes and duration discontinuities are added on each well-formed segment of L1 and L2 layers. Discontinuity positioning
could not be defined by strict rules but depends, once again, on a fairly variable combination of discontinuity and continuity constrains.
Nevertheless, once defined, introduced discontinuities must cohere over the different layers. In other words, the positioning of the discontinuity on a higher layer
segment must coincide with one of the discontinuities introduced on the sub-divisions of that segment.
Discontinuity strength varies according to the layer of the segment: the higher is the segmentation layer the stronger the discontinuity will be.
In our model, in order to make subjects fell metrical continuity within each melodic excerpt, amplitude discontinuities are added on notes that are marked both with
discontinuities and pulse continuities. Amplitude discontinuities are marked with an accent (>) on the output score. Duration discontinuity is added by shortening the
last note of each segment and is indicated with a staccato (.). The dimension of the sign varies according to the strength of the discontinuity.
3. Experimental results
An indirect validation of the model can be found in the experimental results from researches on recognition memory for unknown musical fragments (Olivetti
Belardinelli et al., 2000). Experimental materials for these researches derive from 48 melodic fragments devised by the author for experiment on recognition memory
(Olivetti Belardinelli, Cifariello Ciardi, Rossi-Arnaud, 1998). The original MIDI files used to record the original set of stimuli have been modified according to the
described model and recorded using a Digidesign-Sample Cell piano tone. Introduced amplitude and duration discontinuities were realized with a 70% increase in
velocity (i.e. a MIDI correlated for loudness) and a 40% decrease in duration on L2 segments. Discontinuities on L1 segments were realized with a 50% increase in
velocity (i.e. a MIDI correlated for loudness) and a 20% decrease in duration. Experimental data show that subjects are more likely to remember stimuli affected by
discontinuity amplification.
4. Conclusions
We are cautious about conclusions from the above mentioned experiments because not directly designed to test the model. This issues aside, results seem to suggest that
a discontinuity amplification may improve melodic processing regardless of any musical language (e.g. tonal, atonal, serial). Moreover, being acoustic sequential events
organized according to criteria of spectral continuity (Mc Adams, 1984), a continuity/discontinuity framework would be of interest because it provides hints for a
generalized analysis of sonic surfaces.
Clearly, future research must provide experimental evidences for many of the empirical constrains included in the model. Moreover, since analysis phase is based on
relatively few sonic variations further studies has to determine to which extend discontinuities emerging from more complex parameters interactions (e.g. timbre and
dissonance variations, symmetry detection) affect listener auditory organization.




5. References
Bregman, A.S. (1990). Auditory scene analysis. Cambridge, MA, MIT Press.
Deutsch, D. (1982). Grouping Mechanisms in Music in D.Deutsch (Eds.). The Psychology of Music. New York, NY, Academic Press.
Dowilng, W.J. (1973) The perception of Interlaved melodies. Cognitive Psychology, 5, 322 - 337.
Dowilng, W.J. Harwood, D. L. (1986), Music Cognition, New York, NY, Academic Press.
Fraisse, P. (1974). Psychologie du ritme. Paris: Presses Univeritaries de France.
Fraisse, P. (1982). Rhythm and Tempo in D.Deutsch (Eds.)The Psychology of Music. New York, NY, Academic Press.
Frieberg, A. & Sundberg J. (Eds.) (1995). Grammars for music performance. Proceedings of the KTH Symposium. Stockholm: KTH
Harnoncourt, N. (1984). Der musikalische Dialog. Salzburg: Residenz Verlag.
Imberty, M. (1996). Ordine e disordine. Un punto di vista psico-cognitivo sulla creazione musicale, in Di Matteo R. (a cura di) Psicologia Cognitiva e Composizione
Musicale. Roma: edizioni Kappa.
Lerdahl, F., Jakendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA, MIT Press.
Olivetti Belardinelli, M., Cifariello Ciardi F., Rossi-Arnaud, C. (1998). Recognition memory for previously novel musical themes in children, in Proceedings of XV
Congress of the International Association of Empirical Aesthetics, Roma: Edizioni Universitarie Romane, Roma.
Olivetti Belardinelli, M., Rossi-Arnaud, Pitti G., Vecchio S. (2000). Looking for the anchore points for musical memory. ICMPC2000 proceedings.
Tekman, H.G. (1997). Interaction of Percieved of Intensity, Duration, and Pitch in Pure Tone Sequences. Music Perception, 14, 281-294.
Tekman, H.G. (1998). Effects of Melodic Accents on Perception of Intensity. Music Perception, 15, 391-401.
Wood, N.L., Cowan, N. (1995) The Cocktail Party Phenomenon Revised: Attentio and Memory in the Classic selective Listening Procedure of Cherry (1953). Journal
of Experimental Psychology: General, 124, 243-262.
Back to index

Identity In Chronic Neurological Disability: Finding An Able ‘Self’ In Music Therapy
Proceedings paper

Wendy L. Magee PhD SRAsT(M)
Music Therapy Department
Royal Hospital for Neuro-disability
West Hill
London SW15 3 SW
Department of Music
University of Sheffield
Sheffield S10 2TN
Jane W. Davidson PhD
Department of Music
Sheffield S10 2TN
Background.
Acquired disability stemming from chronic degenerative illness fundamentally affects the concepts of
‘self’ and ‘identity’. Furthermore, ‘identity’ emerges as a complex construct, related to both physical
and interpersonal ways of being in the world, with temporal properties, resulting in emotional
consequences. There is a significant interaction between changes in relationships, changes to the
body, and how this affects feelings about the self, as ‘demonstrating that one is dependent on others
can strongly contribute to a spoiled identity’ (Brooks and Matson, 1987, p.92). A process entitled
'reconstitution of self' has been described, and this process aims to address the ‘damaged’ identity
resulting from acquired illness (Conrad, 1987; Corbin and Strauss, 1987). This occurs by transcending
the body and coming to terms with loss to facilitate the new concept of ‘self’ around the changes
which have taken place. However, an individual who is isolated from the external world may not be
able to claim readily alternative identities, and individuals living with chronic illness have been found
to gradually scale down their self expectations, resulting in the identity of a ‘salvaged self’ (Charmaz,
1987). Experiencing progressive illness often means reducing ‘identity’ goals, aiming for a lower
level of achievement and a less preferred identity (Charmaz, 1987).
As a concept relating to chronic illness, 'control' reappears continually, particularly in relation to
illnesses which have unknown aetiologies, uncertain illness trajectories and, as yet, no cure. Control
over defining images of ‘self’ and over one’s life is in fact central to the experience of chronic illness.
The concept of control and its role in maintaining self-esteem for the individual living with chronic
illness has also been given prominence (Corbin &Strauss, 1987; Charmaz, 1987; Robinson, 1988).
file:///g|/Tue/Magee.htm (1 of 7) [18/07/2000 00:34:01]

Actions aimed at achieving a sense of control over life in spite of the changes brought by illness have
been entitled ‘accommodation’ (Corbin and Strauss, 1987: 250). After the loss of self and wholeness
which chronic illness causes, a new self concept can only be reconstructed with the possibility of
discovery of new actions, thereby transcending the body. However, in reality the experience of living
with chronic degenerative neurological illness involves any combination of the loss of physical,
sensory or cognitive abilities, and loss of control over one's present and future. Loss of control in
effect raises questions about whether ill people will live, or whether they want to (Charmaz, 1991).
The ‘social’ component also must remain a central issue in work with the chronically ill, as prolonged
immersion in illness takes its toll upon social relationships and self (Charmaz, 1991). Social isolation
translates directly into emotional isolation and loneliness.
Music therapy is the planned and intentional use of music to the meet an individual's social,
emotional, physical, psychological and spiritual needs within an evolving therapeutic relationship. In
the therapy session, the therapist and client explore the client's world together, basing all interaction
on the client's musical utterances or musical preferences. This forms the basis for the therapeutic
relationship.
The growing body of music therapy literature with neuro-degenerative populations claims that music
therapy intervention with individuals with chronic neuro-disability can effect change on self-concepts,
for example ‘self-esteem’, ‘self-image’ or ‘self-worth’ (McMaster, 1991; Purdie & Baldwin, 1994,
Magee 1999a,b&c). In a group study of music therapy with patients with Multiple Sclerosis, themes
which emerged in the therapy were disability, uncertainty, anxiety, depression, and loss of self-esteem
(Lengdobler and Kiessling, 1989). Challenging identity through music therapy to form a new aesthetic
identity, which transcends the physical, also has been discussed in the role of music therapy in chronic
illness and with the dying (Aldridge, 1995 & 1996). There remains, however, no empirical
explanation of how music therapy may impact upon identity constructs in a way to promote positive
self-concepts.
Aims.
The case study presented here has been selected from a larger study which examined broader clinical
issues in music therapy with individuals with chronic neurological illness (Magee 1998). For the
purposes of this paper, the research focus was to examine how the experience of clinical
improvisation in music therapy changed self-concepts for an individual with complex
neuro-disabilities caused by Multiple Sclerosis.
Method.
A single case study has been selected from a larger group study which recruited six adults with
Multiple Sclerosis at a residential and day-care facility for complex neuro-disability. Research
participants were selected using purposive sampling procedures from multidisciplinary referrals and
self-referrals to a music therapy research study. An assessment determined whether music therapy was
a relevant intervention for individuals' particular needs. The participants gave written consent to
participate in this research, receiving individual music therapy as part of a wider clinical programme.
The music therapist was the primary researcher and so worked as a participant researcher for the
study.
The music therapy sessions took place on a weekly basis for approximately six months for each
participant. The session format included active participation in exploring instruments, joint clinical
improvisation with the therapist and singing songs of the participant's choice which held personal
meaning to the research participant. Discussion of the musical material or personal material relating to

it was included in the session if the participant indicated a desire to do so.

Primary data were collected in the form of focussed interviews held after sessions by the
therapist/researcher. Secondary sources of data included the verbal, musical and behavioural
responses from sessions, as well as open coding analytical notes made during transcription of the
interviews. Three forms of data therefore emerged from the process.
A modified grounded theory paradigm was used to analyse data employing the steps of open and axial
coding (Strauss & Corbin, 1990). Trustworthiness was gained through prolonged involvement,
persistent observation, long-standing clinical experience with this population, and peer debriefing
with the multidisciplinary team. Triangulation was implemented on several levels. Ongoing analysis
of the clinical material was taken to an independent music therapy supervisor whose theoretical
framework differed from the therapist/researcher’s, offering alternative interpretations of events to
those made by the researcher and thereby enhancing objectivity. This process was also implemented
with selections of the interview analyses, using an independent auditor familiar with therapeutic
theory. Case-study design was used to report the findings.
Results.
Within the process of open coding, phenomena emerged which pertained to self-concepts and feelings
of identity. These were grouped together under a category entitled ‘Identity: changes in self-concept'.
Phenomena pertaining to this category possessed temporal properties as individuals compared what
they were able to do since acquiring disability to what they were able to achieve before. Therefore,
these concepts tended to be bipolar in nature i.e. having a positive and negative weighting. For
example, this category included concepts such as disability, dependence, loss of skills, failure and a
general sense of loss. These phenomena were negatively weighted. In contrast, other self-concepts
pertaining to identity encompassed ability, independence, developing skill, achievement and success,
ownership and creativity, which were positively weighted. Concepts pertaining to identity were
directly related to the musical experience.
A single-case study will be used to illustrate the results of axial coding, revealing how the experience
of clinical improvisation caused one participant's self-concepts to shift from negatively weighted
self-concepts to positively weighted self-concepts, thereby facilitating a more able sense of self.
Case-study: 'Jessie'.
Jessie was a woman of ethnic origin who was in her 50's at the time of this study. She has had started
showing symptoms of Multiple Sclerosis 13 years prior to this study and presented with the chronic
form of the illness. Prior to her illness she had trained and worked as a health care professional. She
often referred to her professional past with a great sense of pride about how she had helped others,
particularly those who were vulnerable. She had minimal contact with family who lived in another
part of Britain.
At the time of this study she was wheelchair dependent. Although she had functional use of her hands
and arms, she was fully dependent for all activities of daily living and general mobility because of
profound visual impairment. Her independence was, therefore, severely limited. She also experienced
delusional beliefs which were attributed to organic brain damage from her disease process. The
isolation caused by being disabled and blind was exacerbated by her mental health problems as she
refused all offers of therapeutic group activities and even recreational events taking place in the
facility. She spent most of her day sitting in her wheelchair by her bedside, not moving physically
outside the limits of her wheelchair, listening to the radio. Neuropsychology tests revealed difficulties
with conceptual thinking and with memory. She also was assessed to have a 'mild' level of anxiety and

be within a 'moderate' range of depression.

As a result of her complex neuro-disabilities, Jessie was extremely isolated and severely restricted in
her independence by visual impairment. Because she was mentally alert, she had insight into her
situation which resulted in periods of low mood, combined with fatigue and pain. Despite her long
residence in Britain, she still referred to her country of birth as ‘home’, and often expressed a wish to
return there and experience the familiar things of the culture in which she had grown up.
Axial coding of Jessie's data revealed that the changes to her identity as a consequence of disability
involved feelings of loss of ability, dependence, and isolation. All of these concepts possessed
temporal properties i.e. she experienced changes over time. Under the key properties pertaining to the
category 'Identity: changes in self-concept', Jessie rarely experienced the positive aspects of the
concept 'ability'. She derived great esteem from relaying her responsibilities and actions when she had
been working, particularly in her ability to help and care for others. Her professional life had clearly
given her a sense of achievement. This contrasted starkly with the isolation of her current experience
and the role reversal as she found herself dependent on others. Her reaction to the loss of concepts
such as ‘ability’, ‘independence’, ‘ownership’ and ‘skill’ was to withdraw herself further from any
opportunities to challenge this experience. She did this by actively refusing participation in any event
such as sitting in the day room, trying new therapy groups, or attending recreational events provided
at the facility. Her repeated refusal of everything resulted in an increase in her passive experience, and
reduced opportunities for novel experiences which might have challenged her ongoing sense of
disability, loss, dependence and offered her a chance to experience challenge or success in an activity.
In actively refusing any contact with others, Jessie in one sense increased her control over the
situation, although the outcomes of such an action were increased isolation and a reinforced sense of
her disabled identity overall. Jessie had a severely damaged sense of identity due to the devastating
effects of her illness. Her loss was experienced on many different levels: emotionally, interpersonally
and within her own identity.
Within Jessie’s experience of clinical improvisation, however, self-concepts related to identity were
challenged directly and repeatedly. Emergent concepts from her data in her descriptions of clinical
improvisation include ownership, higher levels of ability, increased feelings of skill, feelings of
achievement, independence and feelings of success. The emotional experience also emerged as
feelings of motivation and an increased sense of arousal. All of these emergent concepts were
important, as they contrasted dramatically with the lack of opportunities in her life to experience
feelings of skill. Improvising was perceived as a skill which developed over time and increased
feelings of motivation. Feelings of success occurred as a consequence of experiencing independence
and ownership. Additionally, the experience of improvising was a highly interactive one for Jessie, as
it was something which she shared with another. The outcome her experience of clinical
improvisation was one where she had a positive sense of self, which differed so markedly from her
usual experience.
Her overriding therapeutic experience within music therapy was the heightened sense of able identity
achieved through clinical improvisation. The most influential factor in her experience of music
therapy was the disabled identity she held of herself at the outset of sessions. The experience of
improvisation then facilitated a great change for her in how she perceived herself as it presented a
challenge to her disabled identity. Jessie gained a different experience of her abilities within
improvisation. She felt increased ability through the creative process, and through being challenged in
this way, felt ‘skilled’. Furthermore, independence and ownership within the improvisation increased
her feelings of success. Most importantly, through the interactive experience of clinical improvisation
she felt validated in her emotional experience.

Conclusions.
Jessie's need for meaningful interaction on an emotional level, in which her emotional responses to
loss were heard and accepted. This was her core experience in the clinical improvisations. Through
her role within the improvisation, the therapist served to validate not only Jessie’s emotional
experience as expressed through music, but also served as a ‘performance validator’, essential in
redefining concepts of identity (Corbin & Strauss, 1987). For Jessie, the validation of her emotional
experience reflects the parallel made by Aldridge between the activity of mutual music making within
music therapy and the ‘affirmation of worth’ which validates the individual’s experience of hope
(Aldridge, 1995: 106).
The manner in which Jessie played out control in her life is also a significant feature in this case
study. Jessie gained control over her loss and dependence by actively refusing novel activities which
may have allowed her opportunities to develop new skills. Her ‘accommodation’, therefore, increased
her control in one sense but also served to reinforce her isolation and disabled identity. Clinical
improvisation within a therapeutic relationship provided her with the opportunity to experience
control and discovery of new skills through the interactive elements of spontaneous, non-verbal
music-making. Although her sense of identity was so severely damaged, the seeds of the process
described as ‘identity reconstitution’ (Corbin & Strauss, 1987) were evident in Jessie’s heightened
emotional responses within improvisation, as she moved towards a greater sense of wholeness.
Jessie reflects the picture of the chronically ill individual who is ‘so immersed in illness that they
cannot readily claim other identities in the external world’ or ‘move on’ from their preoccupation with
loss (Charmaz, 1987). The external world did not exist for Jessie, either through visual images,
physical presence, or access. Therefore opportunities to claim a more able identity were very limited
and immensely difficult for her. Brooks and Matson (1987) further describe the process of isolation
for the chronically ill individual, which stems from a shift in self perception, strained social
relationships and changed relationships with intimates caused by the increasing dependence on others.
For Jessie, the interpersonal connections which took place within improvisation provided the support
through the musical relationship, and reassurance through the reflection and development of her
musical ideas. She was unable to gain reassurance in this way in her verbal interactions.
Jessie had little opportunity for change, development or progress in her life due to her disabilities and
coping mechanisms. Frequently within sessions she expressed the wish to die, however, her
experience of improvisation did give her hope, and the sense of development. Charmaz (1987)
suggests that motivation for the chronically ill individual is a result of developing a personal identity
which encompasses future selves, reflecting hopes and aspirations. Dimensions for fostering hope
have been allied with the music therapy context, both in improvisation and the use of pre-composed
music (Aldridge, 1995&1996). Through her own creative process and the shift in identity she
achieved through improvisation, Jessie gained motivation and increased levels of arousal. This was
observable in all aspects of her behaviour during and at the end of sessions, contrasting greatly with
the depressed affective state and reduced energy levels with which she presented every week prior to
sessions when collected from the ward.
The results of this study show that through the active music therapy process isolation is reduced,
thereby facilitating the individual to challenge their concept of self. Clinical improvisation centres
around the interactive relationship on an equal basis through the physical act of playing. This
addresses fundamental issues concerning dependency. Dependency is another crucial issue in forming
concepts of self and identity, as individuals with chronic illness express a greater fear of dependence,
debility and abandonment than of death itself (Charmaz, 1991). The experience of clinical

improvisation stimulated shifts in identity and is therefore an effective means of addressing the
‘spoiled identity’. This case study illustrates that the interactive nature of clinical improvisation as an
intervention with individuals with chronic disability and illness may provide validation of positive
feelings of self-esteem and identity. Clinical improvisation is able to facilitate the emergence of new
and undiscovered skills, develop a wholeness of self, thereby shifting identity to reach a preferred
sense of self.
References.
Aldridge, D. (1995). Spirituality, Hope and Music Therapy in Palliative Care. The Arts in
Psychotherapy, 22(2), 103-109.
Aldridge, D. (1996). Music Therapy Research and Practice in Medicine. London: Jessica Kingsley
Publishers.
Brooks, N. & Matson, R. (1987). Managing Multiple Sclerosis. In Roth, J. & Conrad, P (Eds.),
Research In the Sociology of Health Care: A Research Annual. The Experience and Management of
Chronic Illness, 6, 73-106. London: JAI Press Inc.
Corbin, J. & Strauss, A. (1987). Accompaniments of Chronic Illness: Changes in Body, Self,
Biography, and Biographical Time. In Roth, J. & Conrad, P (Eds.), Research In the Sociology of
Health Care: A Research Annual. The Experience and Management of Chronic Illness, 6, 249-281.
London: JAI Press Inc.
Charmaz, K. (1987). Struggling for a self: Identity levels of the chronically ill. In: Roth, J. & Conrad,
P (Eds.), Research In the Sociology of Health Care: A Research Annual. The Experience and
Management of Chronic Illness, 6, 283-321. London: JAI Press Inc.
Charmaz, K. (1991). Good Days, Bad Days. The Self in Chronic Illness and Time. New Brunswick:
Rutgers University Press.
Conrad, P. (1987). The Experience of Illness: Recent and New Directions. In Roth, J. & Conrad, P
(Eds.), Research In the Sociology of Health Care: A Research Annual. The Experience and
Management of Chronic Illness, 6, 1-31. London: JAI Press Inc.
Lengdobler, H., & Kiessling, W.R. (1989). ‘Gruppenmusiktherapie bei multipler Sklerose: Ein erster
Erfahrungsbericht’. Psychotherapie, Psychosomatik, Medizin und Psychologie, 39, 369-373.
Magee, W. (1998) ‘Singing my life, playing my self’. Investigating the use of familiar pre-composed
music and unfamiliar improvised music in clinical music therapy with individuals with chronic
neurological illness. Unpublished doctoral dissertation, University of Sheffield, UK, #9898.
Magee, W. (1999a) ‘Music Therapy in Chronic Degenerative Illness: Reflecting the Dynamic Sense
of Self’. In Ed. D. Aldridge, Music Therapy in Palliative Care, 82-94. London: Jessica Kingsley
Publishers.
Magee, W. (1999b) ‘Singing my life, playing my self’: Song Based and Improvisatory Methods of
Music Therapy with Individuals with Neurological Impairments. In T. Wigram & J. De Backer, (Eds.)
Clinical Applications of Music Therapy in Developmental Disability, Paediatrics and Neurology,
201-223. London: Jessica Kingsley Publishers.
Magee, W. (1999c) ‘Musiktherapie bei chronisch degenerativen Krankheiten: Eine Wiederspiegelung
des dynamischen Selbst’. In Aldridge, D. (Ed.) Kairos IV, Berne: Huber Verlag.

McMaster, N. (1991). Reclaiming A Positive Identity: Music Therapy In The Aftermath Of A Stroke.
In: Bruscia, K.E. (Ed.), Case Studies in Music Therapy, 547-560. Philadelphia: Barcelona Publishers.
Purdie, H. & Baldwin, S. (1994). Music Therapy: Challenging Low Self-Esteem in People With a
Stroke. British Journal of Music Therapy, 8(2), 19-24.
Robinson, I. (1988). Multiple Sclerosis. London: Routledge.
Strauss, A. & Corbin, J. (1990). Basics of Qualitative Research. Grounded Theory Procedures and
Techniques. Newbury Park: Sage Publications, Inc.
Authors’ note.
Wendy L. Magee BMus PhD ARCM SRAsT(M) is Head of Music Therapy at the Royal Hospital for
Neuro-disability, London, holding a clinical post as a music therapist working with adults with
acquired and complex neuro-disability, and developing research projects with this population. This
research is part of doctoral research undertaken whilst registered at the Department of Music,
University of Sheffield. Jane W. Davidson BA PGCE MA PhD Cert. Counselling is Senior Lecturer
in Music at the Department of Music, University of Sheffield. She is editor of the international journal
Psychology of Music and has researched on a wide range of topics from self and identity in singers
through to expressive body movement and piano performance, having over 50 publications to her
name in international peer-reviewed journals. Besides researching, she teaches a wide range of
courses and is an active performer, artistic director and producer.
The authors would like to acknowledge the Living Again Trust, the John Ellerman Foundation, the
Juliette Alvin Trust and the Music Therapy Charity who all contributed to funding this project. The
author also would like to thank the research participants who took part in this study. The Royal
Hospital for Neuro-disability received a proportion of its funding to support this paper from the NHS
Executive. The views expressed in this publication are those of the authors and not necessarily those
of the NHS Executive.
Address for correspondence: Dr. Wendy L. Magee, Music Therapy Department, Royal Hospital for
Neuro-disability, West Hill, London SW15 3SW, UK
Back to index

EFFECTS OF CONTEXTUAL TIMING VARIABILITY ON TIME PERCEPTION AND SENSORIMOTOR SYNCHRONIZATION
Proceedings paper
EFFECTS OF CONTEXTUAL TIMING VARIABILITY ON TIME

PERCEPTION AND SENSORIMOTOR SYNCHRONIZATION
Bruno H. Repp
Haskins Laboratories, New Haven, Connecticut
repp@haskins.yale.edu
The ability to produce regularly timed rhythmic actions and the corresponding ability to perceive (deviations from)
regularity in a sequence of events presuppose a timekeeping mechanism in the brain that oscillates in a periodic
fashion and also can adapt to deviations from perceived isochrony by changing its period and/or relative phase.
Mental timekeepers or oscillators underlying rhythmic action have been discussed by many authors representing
diverse theoretical orientations, such as Michon (1967), Kugler and Turvey (1987), and Vorberg and Wing (1996).
For perception, a corresponding theory of attentional rhythms or oscillators has been proposed by Jones (1976) and
elaborated in many subsequent articles, most notably in Large and Jones (1999).
One important theoretical issue is whether perception and action are subserved by a single general timekeeper or
whether separate, perhaps even task-specific timing processes are involved. Working within a traditional
information-processing framework, Keele, Ivry, and colleagues have presented evidence suggesting a common
timing mechanism in perception and production of simple event sequences (Keele & Hawkins, 1982; Keele,
Pokorny, Corcos, & Ivry, 1985; Ivry & Keele, 1989; Ivry & Hazeltine, 1995). According to a dynamic systems
perspective, however, timing is an emergent property and thus may be specific to different activities (e.g.,
Robertson et al., 1999; Turvey, 1977; Wallace, 1996). By analogy, this view might also predict task-specific
perceptual processes with regard to timing. However, the dynamic systems view also posits a close relationship
between perception and action within the same task situation (see, e.g., Viviani & Stucchi, 1989, 1992a, 1992b).
The two experiments reported here are pertinent to the hypothesis of a common timekeeping mechanism for
perception and action, addressed here by investigating whether contextual timing variation has similar effects on
detection and synchronization accuracy in musical sequences containing small deviations from isochrony.
Experiment 1 was concerned with perception, Experiment 2 with action.
EXPERIMENT 1
Experiment 1 continued a long series of experiments on the detectability of small timing perturbations in
isochronous musical excerpts (Repp (1992a, 1992b, 1995, 1998b, 1998c, 1998d, 1999b, 1999c). Two consistent
findings in that research were that the detectability of a hesitation-a single lengthened inter-onset interval
(IOI)-varies greatly with its position in the musical structure, and that the resulting "detection accuracy profile"
(i.e., percent correct detection as a function of position) is negatively correlated with the typical timing pattern (i.e.,
IOI duration as a function of position) of artists' expressive performances of the musical excerpt. In one variation of
the basic paradigm (Repp, 1998b: Exp. 1), each presentation of the test excerpt was preceded by an expressively
timed performance of the same music. This precursor made the hesitations in the test excerpt more difficult to
detect than in an earlier study in which no precursor had been employed. Moreover, this interference effect did not
seem to decrease in the course of the test excerpt, which lasted close to 20 seconds.
The interference was believed to be due to the temporal variation in the precursor, not to the specific pattern of that
variation. Large and Jones (1999: Exp. 1) recently reported striking context effects of a similar nature in a
nonmusical paradigm (see also Jones & Yee, 1997; Yee, Holleran, & Jones, 1994). Large and Jones attributed these
effects to a widening of the temporal expectancy region of an attentional oscillator, and to a slow rate of adaptation
file:///g|/Tue/Repp.htm (1 of 9) [18/07/2000 00:34:03]

of this mechanism. The findings of Jones and collaborators suggest that the musical precursor effect observed by
Repp (1998b) was due to the same attentional mechanism. However, it seemed desirable to replicate it in a
within-participant design contrasting different precursor conditions. Moreover, if the effect is due to timing
variability as such, then it should not be necessary to employ a musical precursor; a simple tone or click sequence
would do. These considerations led to a design with four precursors: (1) isochronous music; (2) expressively timed
music; (3) isochronous clicks; and (4) "expressively timed" clicks (i.e., with the same timing pattern as the music).
The predictions were that the accuracy of detecting small hesitations in the following musical test excerpts would
be lower after temporally modulated than after isochronous precursors, and that there would be little difference
between the music and click precursor conditions. In addition, by comparing the detection accuracy profiles for the
test excerpts in the different precursor conditions, it could be determined whether or not the precursor effect
declined in the course of the test excerpt.
Method
The musical excerpt was the opening of Chopin's Etude in E major, op. 10, No. 3, the score of which is shown
below. With the exception of the initial eighth-note upbeat (which was excluded from all analyses of timing), all
inter-onset intervals (IOIs) in the excerpt are nominally sixteenth-note intervals, as long as distinctions among
voices are disregarded. In a completely deadpan version of the excerpt, which served as the basis for the
experimental materials, all IOIs were set to 500 ms, and all keypress velocities were set to the same arbitrary MIDI
value of 60.
The test sequences of the main experiment were not entirely isochronous but each contained four lengthened IOIs
("hesitations"). These IOIs were lengthened by the same amount âˆ†t and occurred at unpredictable locations,
separated by at least four unchanged IOIs. All tones sounding during a lengthened IOI were lengthened by âˆ†t,
too, so that legato articulation was maintained. In the course of a block of 9 trials, each of the 36 IOIs in the excerpt
was lengthened once by âˆ†t. Three different blocks with different randomizations of the lengthened IOIs were
created for each âˆ†t value. The âˆ†t values ranged from 80 ms (16%) to 20 ms (4%) in steps of 10 ms.
The expressively timed music precursor was a synthesized expressive performance of the same Chopin excerpt.
The (atypical) timing pattern was derived from a principal component analysis of a large sample of expert
performances (Repp, 1998a). The standard deviation of its IOI durations was 80 ms. The precursor also contained
typical expressive dynamic variation as well as small tone onset asynchronies and pedaling, to enhance its
naturalness. An isochronous music precursor was derived from the expressively timed one by setting all IOIs to
500 ms, leaving all other temporal details relatively invariant (see Repp, 2000b). The expressively timed click
precursor consisted of a sequence of 38 very high-pitched (C8, 4,168 Hz), rapidly decaying digital piano tones of
equal intensity, called here "clicks" for simplicity, which were timed in exactly the same way as the top-line tones
of the expressively timed music. The isochronous click precursor had constant IOIs of 500 ms. All materials were
generated on a Roland RD-250s digital piano under control of a Macintosh Quadra 660AV computer via a MIDI

interface. Participants listened binaurally over Sennheiser HD540 II earphones.

The participants were 12 paid volunteers who represented a wide range of musical training. They came for two
individual sessions on different days. On the first day, they were given written instructions and listened to the
isochronous musical excerpt a few times for familiarization. Then three blocks of 9 test trials each were presented,
without precursors. The task was to press the space bar on the computer keyboard whenever a hesitation was
perceived in the isochronous rhythm of the music. The computer monitor provided immediate feedback for each
correct response and showed counters in which hits and false alarms were added up within each block. Participants
were told to respond quickly (any response slower than 1 s or faster than 100 ms was considered a false alarm) and
to make as few false alarms as possible. The first block used the largest âˆ†t value (80 ms). Based on the total score
for that block, the experimenter chose the âˆ†t value of the second block, and similarly for the third block, the aim
being to find a difficulty level at which the participant detected about 60% of the hesitations. This was done in
order to avoid ceiling or floor effects in the detection accuracy profile; participants' absolute accuracy was not of
interest. The data for these three initial blocks were not analyzed further.
The first half of the main experiment followed. It consisted of 6 blocks in which each of the 9 trials was preceded
by a precursor, either music (for half the participants) or clicks (for the other half). A few seconds of silence
intervened between a precursor and a test trial. The âˆ†t value was the same in all 6 blocks, and the precursor
remained constant within each block. Blocks with isochronous and expressively timed precursors alternated,
according to a counterbalanced design. Blocks with different randomizations of hesitations were likewise balanced
across the two precursor conditions. The fixed âˆ†t values assigned to the 12 participants on the basis of their
performance on the three preliminary blocks of trials without precursors were 60 ms (n = 2), 50 ms (2), 40 ms (2),
30 ms (5), and 20 ms (1).
The second session began with one practice block without precursors, using a âˆ†t value 10 ms longer than in the
test blocks of the preceding session. Then 6 test blocks followed, at the same difficulty level as those in the first
session. Participants who had received music precursors in the first session now received click precursors, and vice
versa.
The main results are summarized in Table 1 in terms of the average percentages of hits. As predicted, hit rates were
higher in the isochronous than in the expressively timed precursor conditions, F(1,11) = 27.7, p < .0004.
Unexpectedly, hit rates were also higher after click precursors than after music precursors, F(1,11) = 10.2, p < .009.
The effect of precursor timing was larger for music than for click precursors, but the interaction fell short of
significance, F(1,11) = 4.1, p < .07.
Table 1. Average percentages of hits in the four precursor conditions of Experiment 1. Difference scores are shown
with standard errors in parentheses.
Precursor type Precursor timing Difference
isochronous expressive (iso - exp)
Clicks 69.4 62.7 6.7 (1.6)
Music 64.5 52.9 11.6 (2.5)
As in previous studies of a similar nature, the detectability of hesitations varied greatly across positions in the
music, F(35,385) = 14.3, p < .0001. However, none of the interactions of position with precursor type or timing
approached significance, indicating that the precursor effects did not decrease across positions in the test excerpt.
EXPERIMENT 2
The results of Experiment 1, in conjunction with those of Repp (1998b) and Large and Jones (1999), show that a
temporally modulated context reduces listeners' ability to detect small deviations from isochrony in a test sequence.
The underlying mechanism suggested by Large and Jones is an attentional oscillator whose temporal expectancy
region widens after exposure to temporal variability, making it more tolerant of small deviations from temporal

expectancies. The width of a temporal expectancy window is formally equivalent to the probability distribution of
specific temporal expectations, so that a wider window implies greater variability of the underlying timekeeper.
Therefore, if the adaptive attentional oscillator is identical with the timekeeping mechanism governing regular
motor activity, such as finger tapping in synchrony with an isochronous sequence, then one should expect
temporally modulated precursors to increase the variability of subsequent finger tapping> These precursors should
also reduce the timekeeper's sensitivity to small deviations from isochrony in a stimulus sequence, as reflected in
the speed of motor compensation (phase error correction) following such timing perturbations.
Repp (2000a: Exp. 4) conducted an experiment with the Chopin Etude excerpt in which each test trial was preceded
by an expressively timed precursor. The test trials contained subliminal hesitations (lengthened IOIs). Motor
compensation for these perturbations was just as rapid following the expressively timed precursor as in a condition
without a precursor (Repp, 2000a: Exp. 3). The variability of the tap timing was likewise unaffected by the
presence of the precursor. These results suggested a possible dissociation between the timekeeping processes
involved in perception and in motor control. However, the precursors were merely listened to and thus did not
require any overt motor activity. Experiment 2 investigated whether the added requirement of tapping in synchrony
with the expressively timed precursors would result in increased variability and in slower phase correction in
subsequent synchronized tapping with isochronous musical test excerpts containing small hesitations.
Methods
The materials were a subset of those of Experiment 1. Only musical precursors were used, either isochronous or
expressively timed, together with test excerpts containing hesitations of 20 ms. Three blocks with different
randomizations of the hesitations were used. Each block comprised 9 trials, each of which contained 4 hesitations.
The 12 participants were mostly musically trained undergraduates. They tapped with their preferred hand on a
white key of a Fatar Studio 37 MIDI controller (a silent three-octave piano keyboard) which they held on their lap.
The key depressions were recorded via a MIDI interface by a MAX patcher that also controlled presentation of the
musical excerpts. Otherwise, the equipment was the same as in Experiment 1.
Participants were given a few practice trials, followed by three blocks of test trials without precursors. Tapping
started with the first downbeat (the second tone) in each excerpt and continued in synchrony with the sixteenth
notes, for a total of 37 taps. Participants were not informed about the hesitations which generally were near or
below their detection threshold. The main part of the experiment consisted of six blocks of test trials, with each trial
being preceded by a precursor. Precursors were constant within each block but alternated between blocks, with
some participants starting with the isochronous precursor and others with the expressively timed one. Participants
were requested to tap in synchrony with each precursor.
The variability of the taps was assessed in four different ways: The average standard deviations of tap-tone
asynchronies and of inter-tap intervals (ITIs) were computed both within and between trials. In each case, the initial
three taps (during which participants adjusted to the sequence tempo) and the final tap in each trial were
disregarded. Within-trial standard deviations were also calculated for the asynchronies and ITIs generated in
synchronizing with the precursors themselves.
The results are summarized in Table 2. The finding of main interest is the difference in tap variability following
expressively timed and isochronous precursors, shown in the last column of the table. As can be seen, this
difference was usually positive, suggesting higher variability following temporally modulated precursors, but it was
very small. Nevertheless, it reached significance for within-trial ITIs, F(1,11) = 6.5, p < .03, and for between-trial
ITIs, F(1,11) = 5.6, p < .04. The lower part of Table 2 also reveals that variability in tapping to the isochronous
precursors was comparable to that in tapping to the test excerpts, whereas variability of tapping to expressively
timed precursors was very high.
Table 2. (a) Average standard deviations (in ms) of asynchronies and ITIs within trials (ASY-w and ITI-w,
respectively) and between trials (ASY-b and ITI-b, respectively) for test excerpts in the three precursor conditions,
and the difference between the two precursor conditions, with standard errors in parentheses. (b) Average standard
deviations of within-trial asynchronies and ITIs for tapping in synchrony with the precursors themselves.
(a) Synchronization with test excerpts following precursors

Precursor condition Difference
no precursor isochronous expressive (exp - iso)
ASY-w 23.9 22.5 23.1 0.6 (0.4)
ITI-w 21.7 22.6 23.7 1.1 (0.5)
ASY-b 27.5 24.5 24.5 -0.1 (0.4)
ITI-b 19.9 20.5 21.3 0.8 (0.4)
(b) Synchronization with precursors themselves
ASY-w 23.3 74.3
ITI-w 22.2 91.7
The origin of the small differences in ITI variability between precursor conditions could have been due to either
random or systematic variation, or both. Previous investigations (Repp, 1999a, 1999b, 1999c, 2000a) have shown
that the asynchronies and ITIs of taps accompanying perfectly isochronous music exhibit a systematic pattern of
deviations from isochrony. Therefore, these patterns were determined and compared between the precursor
conditions. Before computing the average asynchronies and ITIs, however, the asynchronies of taps coinciding
with hesitations as well as the two subsequent asynchronies were removed from the data, following the procedure
of Repp (1999b: Exp. 2). The averages across the 9 trials in a block thus were based on 6 data points per position.
As expected, there was highly significant variation across positions of both asynchronies, F(33,363) = 8.3, p <
.0001, and ITIs, F(32,352) = 13.2, p < .0001. Most interestingly, there was a small but significant Condition x
Position interaction for both asynchronies, F(33,363) = 1.5, p < .04, and ITIs, F(32,352) = 1.8, p < .006: The
profiles were initially more strongly modulated after modulated precursors than after isochronous precursors, and
there was also a larger difference in absolute asynchronies at the beginning. That the interaction derived from the
initial portions of the profiles was confirmed in ANOVAs that omitted the initial 10 data points (not counting the
tuning-in portion) and in which the Condition x Position interaction was far from significance (p > .5).
These results suggested that the small difference in overall ITI variability between the precursor conditions (Table
2) was due to this initial difference in the amplitude of systematic ITI modulation. The standard deviations of the
within-trial and between-trial ITIs were therefore recalculated with the first 10 data points omitted. The resulting
average within-trials values were 22.7 and 22.8 ms for isochronous and modulated precursors, respectively, and the
corresponding between-trial values were 20.5 and 20.7 ms, respectively. Both differences were clearly
nonsignificant. Thus the differences reported in Table 2 were indeed due to the initial portion of the ITI profile
only.
Next, the speed of compensation for hesitations in the test excerpts was examined. The relevant data were the
triplets of asynchronies that had been extracted from the data before computing the average asynchrony and ITI
profiles. There were 36 such triplets in each condition, representing the 36 positions of the perturbation point in the
music. These triplets were further divided into two groups of 16, according to whether the hesitations were of high
or low detectability. (The first two and last two positions were excluded.) The high-low distinction was based on a
median bisection of the average detection accuracy profile obtained in Experiment 1. The results were expressed as
deviations from the average asynchrony profile. The average relative asynchronies were close to -20 ms at the
perturbation point (P) and quickly returned to the zero baseline in the following two positions.
These average "compensation functions" showed two unexpected differences. First, compensation was more rapid
following modulated precursors than following isochronous precursors, F(2,22) = 7.7, p < .003. If anything, the
opposite had been predicted. The second unexpected finding was that compensation was more rapid for
high-detectability than for low-detectability perturbations, F(2,22) = 7.7, p < .003. This result, although plausible,

contradicts an earlier negative result, obtained in a very similar comparison (Repp, 1999b: Exp. 2).
GENERAL DISCUSSION
The present results reveal a partial dissociation of perception and motor control with regard to timing. Experiment
1 replicated the precursor effect found serendipitously by Repp (1998b) and demonstrated its existence more
clearly in a within-participant design: Exposure to a variably timed auditory sequence reduced listeners' sensitivity
to deviations from temporal regularity in a subsequent sequence. Moreover, this effect occurred regardless of the
type of precursor (clicks or music), which suggests that it is not specific to music. The precursor effect is analogous
to similar context effects reported by Large and Jones (1999), who attributed it to the widening of the expectancy
window of a slowly adapting attentional oscillator. However, there was no evidence of a decrease in the precursor
effect in the course of largely isochronous test excerpts lasting close to 20 s. Thus the rate of adaptation seemed to
be very slow indeed.
Very different results were obtained in Experiment 2 with regard to sensorimotor synchronization with musical
stimuli very similar to the ones used in Experiment 1. Previously, Repp (2000a) had found no effects of a
modulated precursor on variability of tap timing or compensation for hesitations when participants merely listened
to the precursor. In Experiment 2, tapping variability was slightly higher following modulated precursors than
isochronous precursors, perhaps due to the requirement that participants tap in synchrony with the precursors, but
this difference disappeared within about 10 taps. Moreover, the difference was due to an increased amplitude of
systematic variability, not of random variability. This suggests a heightened sensitivity to structural musical factors,
such as meter, which was induced by the expressively timed precursor but wore off rapidly during exposure to the
nearly isochronous test excerpt. This process most likely reflects a modulation of the period of the motor
timekeeper, not a widened expectancy window. It cannot account for the precursor effect that lasted throughout the
test excerpt in Experiment 1.
As to compensation for hesitations in the music, which is perhaps more directly relevant to their detectability, it
was found to be more rapid after modulated than after isochronous precursors. This is a paradoxical result which
remains unexplained, but it contradicts the precursor effect on detectability. Thus the negative effect of precursor
timing modulation seems to be largely specific to perception, suggesting that the attentional oscillator governing
time perception is distinct from the mechanism that controls the timing of finger taps.
The dissociation between perception and action would have been even more striking if Experiment 2 had replicated
the finding of Repp (1999b) that compensation for hesitations was independent of their detectability. In Experiment
2, however, compensation was somewhat more rapid in positions of higher detectability. Conscious detection of a
hesitation thus may have accelerated the phase correction process somewhat. This result must be viewed with some
scepticism, however, because it also contradicts previous findings obtained with simple tone sequences (Repp,
2000a: Exp. 5). In any case, the results are consistent with the general finding that phase correction does not
require conscious detection of a perturbation (Repp, 2000a).
In summary, the present results suggest that the perception of timing is governed in part or entirely by processes
that are separate from the timekeeper that controls the timing of action in sensorimotor synchronization. There are
close parallels in the ways perceptual and motor timing mechanisms have been conceptualized (e.g., Large &
Jones, 1999; Mates, 1994a, 1994b), but this functional similarity should not be taken to reflect physiological
identity. The most important difference between perception and action is that perceptual tasks generally require
conscious registration of temporal differences and explicit judgments, whereas motor tasks such as sensorimotor
synchronization often rely largely or entirely on subconscious, automatic regulatory processes. Although the
processes underlying perceptual judgments are just as subconscious as those underlying motor control, the
additional processing required for information to reach awareness and for a deliberate response to be made may
introduce random as well as systematic variation. There are many recent demonstrations of dissociations between
perception and action, especially in tasks based on visual information (e.g., Creem & Proffitt, 1998; Gentilucci et
al., 1996; Haffenden & Goodale, 1998; Klotz & Neumann, 1999; Rumiati & Humphreys, 1998), and the present
research adds to the rapidly mounting evidence that action is often based on information that is not fully processed
perceptually (see Neumann, 1990, for relevant discussion).
Acknowledgments
This research was supported by NIH grant MH-51230. I am grateful to Paul Buechler and Steve Garrett for

extensive assistance. Address correspondence to Bruno H. Repp, Haskins Laboratories, 270 Crown Street, New
Haven, CT 06511-6695 (e-mail: repp@haskins.yale.edu).
References
Creem, S. H., & Proffitt, D. R. (1998). Two memories for geographical slant: Separation and
interdependence of action and awareness. Psychonomic Bulletin & Review, 5, 22-36.
Fraisse, P. (1954). La structuration intensive des rythmes. L'Année Psychologique, 54, 35-52.
Gentilucci, M., Chieffi, S., Daprati, E., Saetti, M. C., & Toni, I. (1996). Visual illusion and action.
neuropsychologia, 34, 369-376.
Haffenden, A., & Goodale, M. A. (1998). The effect of pictorial illusion on prehension and perception.
Journal of Cognitive Neuroscience, 10, 122-136.
Ivry, R. B., & Hazeltine, R. E. (1995). Perception and production of temporal intervals across a range
of durations: Evidence for a common timing mechanism. Journal of Experimental Psychology:
Human Perception and Performance, 21, 3-18.
Ivry, R. B., & Keele, S. W. (1989). Timing functions of the cerebellum. Journal of Cognitive
Neuroscience, 1, 136-152.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and
memory. Psychological Review, 83, 323-355.
Jones, M. R., & Yee, W. (1997). Sensitivity to time change: The role of context and skill.
Keele, S. W., & Hawkins, H. L. (1982). Explorations of individual differences relevant to high level
skill. Journal of Motor Behavior, 14, 3-23.
Keele, S. W., Ivry, R. B., & Pokorny, R. A. (1987). Force control and its relation to timing. Journal of
Motor Behavior, 19, 96-114.
Keele, S. W., Pokorny, R. A., Corcos, D. M., & Ivry, R. B. (1985). Do perception and motor
production share common timing mechanisms: a correlational analysis. Acta Psychologica, 60,
173-191.
Klotz, W., & Neumann, O. (1999). Motor activation without conscious discrimination in metacontrast
masking. Journal of Experimental Psychology: Human Perception and Performance, 25, 976-992.
Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and the self-assembly of rhythmic
movement. Hillsdale, NJ: Erlbaum.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How we track time-varying events.
Psychological Review. 106, 119-159.
Michon, J. A. (1967). Timing in temporal tracking. Assen, NL: van Gorcum.
Neumann, O. (1990). Direct parameter specification and the concept of perception. Psychological
Research, 52, 207-215.
Repp, B. H. (1992a). Detectability of rhythmic perturbations in musical contexts: Bottom-up versus
top-down factors. In C. Auxiette, C. Drake, & C. Gérard (eds.), Proceedings of the Fourth Rhythm
Workshop: Rhythm perception and production (pp. 111-116). Bourges, France: Imprimérie
Municipale.
Repp, B. H. (1992b). Probing the cognitive representation of musical time: Structural constraints on
the perception of timing perturbations. Cognition, 44, 241-281.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A partial
connection between music perception and performance. Perception & Psychophysics, 57, 1217-1232.

Repp, B. H. (1998a). A microcosm of musical expression: I. Quantitative analysis of pianists' timing

in the initial measures of Chopin's Etude in E major. Journal of the Acoustical Society of America,
104, 1085-1100.
Repp, B. H. (1998b). Obligatory "expectations" of expressive timing induced by perception of musical
structure. Psychological Research, 61, 33-43.
Repp, B. H. (1998c). The detectability of local deviations from a typical expressive timing pattern.
Repp, B. H. (1998d). Variations on a theme by Chopin: Relations between perception and production
of deviations from isochrony in music. Journal of Experimental Psychology: Human Perception and
Repp, B. H. (1999a). Control of expressive and metronomic timing in pianists. Journal of Motor
Behavior, 31, 145-164.
Repp, B. H. (1999b). Detecting deviations from metronomic timing in music: Effects of perceptual
structure on the mental timekeeper. Perception & Psychophysics. 61, 529-548.
Repp, B. H. (1999c). Relationships between performance timing, perception of timing perturbations,
and perceptual-motor synchronization in two Chopin preludes. Australian Journal of Psychology. 51,
188-203.
Repp, B. H. (2000a). Compensation for subliminal timing perturbations in perceptual-motor
synchronization. Psychological Research (forthcoming).
Repp, B. H. (2000b). Pattern typicality and dimensional interactions in pianists' imitation of expressive
timing and dynamics. Manuscript submitted for publication.
Robertson, S. D., Zelaznik, H. N., Lantero, D. A., Bojczyk, K. G., Spencer, R. M., Doffin, J. G., &
Schneidt, T. (1999). Correlations for timing consistency among tapping and drawing tasks: Evidence
against a single timing process for motor control. Journal of Experimental Psychology: Human
Rumiati, R. I., & Humphreys, G. W. (1998). Recognition by action: Dissociating visual and semantic
routes to action in normal observers. Journal of Experimental Psychology: Human Perception and
Turvey, M. T. (1977). Preliminaries to a theory of action with reference to vision. In R. Shaw & J.
Bransford (eds.), Perceiving, acting, and knowing (pp. 211-265). Hillsdale, NJ: Erlbaum.
Viviani, P. & Stucchi, N. (1989). The effect of movement velocity on form perception: Geometric
illusions in dynamic displays. Perception & Psychophysics, 46, 266-274.
~Viviani, P. & ~Stucchi, N. (1992a). Motor-perceptual interactions. In G. E. Stelmach & J. Requin
(Eds.), Tutorials in motor behavior II (pp. 229-248). Amsterdam: Elsevier.
~Viviani, P. & ~Stucchi, N. (1992b). Biological movements look uniform: Evidence of
motor-perceptual interactions. Journal of Experimental Psychology: Human Perception and
Vorberg, D., & Wing, A. (1996). Modeling variability and dependence in timing. In H. Heuer & S. W.
Keele (eds.), Handbook of perception and action, vol. 2 (pp. 181-262). London: Academic Press.
Wallace, S. A. (1996). Dynamic pattern perspective of rhythmic movement: An introduction. In H. N.
Zelaznik (ed.), Advances in motor learning and control (pp. 155-194). Champaign, IL: Human
Kinetics.
Yee, W., Holleran, S., & Jones, M. R. (1994). Sensitivity to event timing in regular and irregular
sequences: Influences of musical skill. Perception & Psychophysics, 56, 461-471.

Back to index

TueAM4_4 Cook
Proceedings paper
Chordal Harmoniousness is Determined by Two Distinct Factors:

Interval Dissonance and Chordal Tension
Norman D. Cook
Informatics, Kansai University
Osaka, Japan
1. Background
The perception of the harmoniousness of musical chords cannot be explained solely on the basis of the
intervals contained therein. If that were the case, then augmented and diminished chords would sound
as harmonious as major and minor chords, since the intervals in such resolved and unresolved chords
are all of similar magnitude. Although harmoniousness is undoubtedly influenced by higher
harmonics, features inherent to three-tone chords (with or without higher harmonics) may also play a
significant role. To examine this issue, we have had undergraduate students evaluate the
"pleasantness" of two-tone intervals or three-tone chords.
2. Aims
We hypothesized that, in addition to the dissonance/consonance of musical intervals, a second factor,
"chordal tension" (defined as three-tone combinations in which the middle tone is an equal number of
semitones from the tones lying above and below it), is innately perceived as "unstable" or "tense" –
and influences the perception of harmony. This hypothesis is based fundamentally on the idea that
harmoniousness is due to both "two-body" interval effects and "three-body" chordal effects.
3. Methods
In two separate experiments, 40 and 28 musically-untrained students evaluated (on a six-level scale
ranging from "very unpleasant" to "very pleasant") three-tone chords, containing various degrees of
interval dissonance/consonance and chordal tension. Stimuli were 66 diatonic chords consisting of
three pure sine waves (within the G3-G4 octave). In a third experiment with the same 28 subjects,
evaluations of the pleasantness of two-tone intervals (sine waves between G3 and G4) were also
obtained.
4. Results
In the interval experiment, a deep trough of consonance (a peak of dissonance) at an interval of 1-2
semitones was found, as previously reported by Plomp and Levelt (1965), but there were also
significant (p<0.01) troughs at 6 and 11 semitones. Those additional troughs clearly indicate that
factors other than the dissonance of small intervals play a role in the perception of the pleasantness of
sound combinations.
Multiple regression showed that the perceived harmoniousness was influenced by three distinct
file:///g|/Tue/Cook.htm (1 of 3) [18/07/2000 00:34:04]

TueAM4_4 Cook
factors, two related to the dissonance of intervals and one related to the structure of the chords. The
presence or absence of semitone/whole-tone dissonance was the most significant factor influencing
the subjects’ responses (p<0.0001) and alone accounted for about half of the variance in the data
(R~0.5). In other words, when subjects heard semitone or whole-tone dissonance in a chord, their
evaluation of its harmoniousness plummeted. A second factor related to interval dissonance, i.e., the
total theoretical dissonance of the intervals contained in the chord (Sethares, 1993), was also found to
be a significant factor (p<0.001). It is noteworthy, however, that, when regression was done using
only the total dissonance of the intervals, this factor alone accounted for little of the variance in the
data (R<0.1). Most interestingly, the presence of chordal tension (as defined above) was also found to
play a significant role (p<0.001). This result indicates that when subjects heard three-tone chords
containing two intervals of the same magnitude, their evaluation of its pleasantness decreased.
In order to determine more precisely what the relevant influences on the perception of harmony are,
we examined three distinct factors related to the dissonance of the intervals in the chords. (1) The first
factor was the total theoretical dissonance of the three intervals in each of the three-tone chords, using
the model of Sethares (1993). That model is based approximately on the empirical dissonance curve
obtained by Plomp and Levelt (1965), which indicates a trough of consonance at an interval of about
1-2 semitones. (2) The second factor was the theoretical dissonance of the intervals in the chords, but
including all of the intervals among the first six upper partials of each of the three tones, using a
model originally presented by Kameoka and Kuriyagawa (1969). It should be noted that the chordal
stimuli in our experiment consisted of three pure sine waves, absent of any upper partials. This
theoretical factor was nonetheless calculated for use in the multiple regression analysis because most
musical sounds contain upper partials and, as a consequence, normal listening experience may
produce associations between tones and their higher harmonics, even when the higher harmonics are
absent from the current auditory stimulus. (3) Finally, the empirical dissonance curve obtained in the
interval experiment was used to calculate the total dissonance in the chords. This calculation was done
in order that the factors involved in the perception of the pleasantness of intervals and inherent to our
experimental equipment were included in the evaluation of the perception of the harmoniousness of
chords, using the same equipment. Small differences in the influence of the various interval
dissonance factors on chord perception were indeed found, but the main result was that: (1) the
(empirical or theoretical) dissonance of intervals alone does not explain the harmoniousness of chords,
and (2) an independent factor related to the structure of the three-tone chord ("chordal tension") had a
significant influence on the evaluation of harmoniousness.
5. Conclusions
We conclude that chordal tension is a feature of certain three-tone chords; the tension is perceived by
normal listeners in a manner similar to, but distinct from, the perception of the dissonance of two-tone
intervals. The results suggest that chordal tension is perceived as a psychological Gestalt inherent to
certain three-tone combinations and that the perception of such tension may explain the "instability"
or unresolved nature of augmented and diminished chords without consideration of higher harmonics.
The combination of tones at which resolution of harmonic tension is obtained (and indeed the tones at
which resolution of interval dissonance is obtained) depends heavily upon the scalar intervals used in
the given musical culture, but the need to resolve chordal tension may be a feature as common to all
forms of polyphonic music as is the need to resolve interval dissonance.
References
Kameoka, A. and Kuriyagawa, M. (1969) Consonance theory. Journal of the Acoustical
Society of America 45, 1452-1459; 45, 1460-1469.

TueAM4_4 Cook
Plomp, R. and Levelt, W.J.M. (1965) Total consonance and critical bandwidth. Journal
of the Acoustical Society of America 38, 548-560.
Sethares, W. A. (1993) Local consonance and the relationship between timbre and scale.
Journal of the Acoustical Society of America 94, 1218-1228.
Back to index

Aiello
Proceedings paper
Memorizing Two Piano Pieces: The recommendations of Concert pianists

R. Aiello
Abstract
This study investigates the strategies recommended by concert pianists to memorize two pieces of the
piano literature. Semi-structured interviews were conducted with four classically trained concert
pianists who were asked to describe the recommendations they would give to a proficient piano
student to memorize Chopin Prelude in E minor, Op. 28, No.4, and J.S. Bach Prelude in C major from
Book 1 of The Well Tempered Clavier. For both compositions the pianists recommended a detailed
analysis of the music as the most important variable for acquiring a secure memory.
Their recommendations included:
1. study the overall structure of the piece and divide it into sections;
2. look for specific melodic and harmonic patterns and understand their function within the
composition;
3. block the chords so the piece can be played as a chord progression. In addition, some
suggestions based on the use of visual, auditory, and kinesthetic memory were also given.
Introduction
How do classically trained pianists learn to memorize their repertoire?

What strategies do they use to perform from memory?
Although there is extensive research on the acquisition of expert memory (Chase & Simon, 1973;
Ericsson, Krampe, & Tesch-Romer, 1993) and a number of studies have investigated how musicians
perform and practice their instruments (Davidson, 1994; Gabrielsson, 1999; Gruson, 1988; Hallam,
1995; Miklaszewski, 1989, 1995; Palmer, 1997; Repp, 1990; Shaffer, 1995; Sloboda, 1985;
Williamon, 1999a, 1999b), in recent years relatively few investigations have addressed how
classically trained pianists memorize what they perform (Aiello, 1999, 2000; Aiello & Williamon,
2000; Chaffin & Imreh, 1994, 1996a, 1996b, 1997; Clarke, 1988; Lehmann, 1997; Marcus, 1979;
Noyle, 1987. See also Hallam, 1997).
Learning piano pieces from memory is undertaken by students and professional pianists every day, yet
relatively little is known about this topic both from a pedagogical and a psychological perspective.
The books On memorizing and playing from memory and on the laws of practice generally written by
the pianist and pedagogue Tobias Matthay in 1926 (Matthay, 1926; see also Matthay, 1913) and Piano
Technique written by the pianist Walter Gieseking and his teacher Karl Leimer in 1972 remain among
the best pedagogical sources available (Gieseking & Leimer, 1972). While it is important for piano
students to explore different types of memory strategies on their own, and to create their very own
file:///g|/Tue/Aiello.htm (1 of 7) [18/07/2000 00:34:08]

Aiello
mental image of a piece, it is possible to note that sometimes they are not given specific directions on
how
to memorize their repertoire (Aiello,1999).
Since performing from memory is an important part of the training and the skill required of many
classical pianists, research in this area could be useful to piano students and piano teachers and could
provide valuable information to psychologists. This paper is part of a series of interviews held with
classically trained pianists to gain a better understanding of how they memorize their repertoire.
Method
Semi-structured interviews were conducted with four professional classical concert pianists who had
extensive piano teaching experience. The participants were asked to describe the recommendations
they would give to a proficient piano student to memorize two compositions of different style: Chopin
Prelude in E minor, Op. 28, No.4, and J.S. Bach Prelude in C major from Book 1 of The Well
Tempered Clavier. Specifically, the participants were asked:
1. to describe the strategies they considered valuable to obtain a secure memory of each piece, and
2. to illustrate their suggestions on the score.
All four pianists had performed these pieces from memory sometime during the last few years. By
asking the participants to describe what strategies they would suggest to a capable piano student
instead of what strategies they would use themselves to memorize the pieces it was hoped that they
would use clear and simple descriptions. Therefore their recommendations could be interpreted to
address memory strategies at a basic level. The interviews were held in classrooms or piano studios
and were audio taped.
Results
The data were analyzed according to the principal themes that emerged from the interviews. They are
reported qualitatively.

Aiello
With reference to the Chopin Prelude in E minor, Op. 28, No.4 (see Figure 1) the pianists'
recommendations focused mainly on the analysis of the piece.
They addressed in particular:
1. the overall form of the piece (i.e.,a period made up of two long phrases);
2. the repeated melodic line occurring in measures 1-4 and measures 13-15;
3. the step-wise motion of the right hand in measures 1-9;
4. the harmonic changes in the left hand that take place throughout the piece, and the rate at which
the chords change;
5. the embellishment, the left hand pattern, and the crescendo that occurs on measure 16;
6. the overall climax of measure 17 due to the pattern of the left hand, the right hand reaching its
highest note, and the forte that should be reached here;
7. the very last chord of the piece.
All participants illustrated on the score what they described. One pianist drew a quick sketch and

Aiello
explained how making a drawing would help her memorize this piece. She illustrated how the lines of
her drawing outlined the contour of the two phrases, the frequent changes in the left hand, and the
crescendo and the climax occurring on measures 16 and 17. The other three participants
recommended focusing in particular on the sounds of the chromatic left hand chords played
throughout the piece, and on the rate at which these chords change. One of them suggested
remembering the position of the left hand as the chordal changes occur.
With reference to the J.S. Bach Prelude in C major from Book 1 of The Well Tempered Clavier (see
Figure 2) the main themes that emerged from the participants' recommendations addressed primarily
the structure of the piece. The pianists explained that the entire piece is based on chord
progressionsand that all the chords are arpeggiated throughout this prelude.They suggested blocking
the chords to understand their harmonic function, and to hear the chord progressions clearly. They
pointed out how that, except for measures 33-34, each chord is repeated twice in the same position,
and creates a measure.

Aiello
Three pianists emphasized the importance of hearing the texture of the piece, and two of them spoke
of the feeling the motion in the music. All discussed the importance of the phrasing in this piece, and
how they would create their phrasing. The pianists explained how the phrasing would reflect the
harmonic tensions and resolutions. Three of them stressed the importance of the bass line, and the
fourth spoke of the relevance of the brief coda in measures 33-35. They illustrated on the score what
they described. No references were made to any particular use of visual memory for this piece.
Discussion
For both pieces the pianists' suggestions revealed that their memory strategies were based primarily on
a detailed analysis of the music. Their responses emphasized mostly a cognitive, analytic approach to
the music. The main recommendations they gave were:
1. divide the pieces into sections according to their formal structure;
2. look for specific melodic and harmonic patterns;
3. block the chords to play the pieces as a chord progressions.
Comments such as "Start with the whole so that the parts can make sense"; "Memorize in terms of
sections"; "Focus on all the patterns. Focus on what is different and what is similar in them" are
representative of the recommendations that were made. No pianist suggested memorizing either piece
by rote. They all illustrated on the scores what they described.
The references that these pianists made to the use of visual, kinesthetic, and auditory memory related
to their keen understanding of the scores. For example, the pianist who suggested remembering the
feeling of the left hand in the Chopin Prelude related it to the frequency of the chord changes in this
piece. He explained: "Memory is the balance between mental power and physical dexterity". And the
pianist who drew a quick sketch of this same prelude captured in her simple drawing some of the most
salient musical elements inherent in the score. These concert pianists' reliance on analytic memory is
in agreement with the data reported by Roger Chaffin and Gabriella Imreh who documented how a
concert pianist (the second author) memorized the Presto from J.S. Bach Italian Concerto (Chaffin &
Imreh 1994, 1996a, 1996b, 1997).
Further agreement among the performers in this study and the data reported by Chaffin and Imreh can
also be seen on the emphasis in dividing the scores into sections and creating phrasing that highlights
the structure of the music. It is possible that memorizing atonal music or contemporary pieces would
require the performers to apply different memory strategies than the ones described above to
memorize baroque and romantic music (Aiello, 1999; Marcus,1979). Comments such as: "The process
of discovery in a piece is what helps me creates my memory"; "If you think musically, memory will
follow", and "Understanding music as process helps me remember" provide rich food for thought for
both music teachers and psychologists. It is hoped that future research will address in depth the mental
representations of music performance taking into account different types of music and performers at
different levels of musical skill.
References
Aiello, R. (1999). Strategies for memorizing piano music: Pedagogical implications. Poster
presentation work-in-progress Eastern Division of the Music Educators National Conference,
February 26-28, 1999, New York, New York.

Aiello
Aiello, R. (2000). Playing the piano by heart: From behavior to cognition. Poster session presented at
the Biological Foundations of Music Conference. The Rockefeller University, New York, NY, May
20-22, 2000. To appear in the Annals of the New York Academy of Sciences. In press.
Aiello, R. & Williamon, A. (2000). Memorization. In R. Parncutt, & G. McPherson (Eds.), Science
and psychology of music performance. New York, NY: Oxford University Press. Forthcoming.
Chaffin, R., & Imreh, G. (1994). Memorizing for performance: A case study of expert memory. Paper
presented at the Third Practical Aspects of Memory Conference. University of Maryland.
Chaffin, R., & Imreh, G. (1996a). Effects of difficulty on practice: A case study of a concert pianist.
Poster presented at the Fourth International Conference on Music Perception and Cognition. McGill
University: Montreal, Canada.
Chaffin, R., & Imreh, G. (1996b). Effects of musical complexity on expert practice: A case study of a
concert pianist. Poster presented at the Meeting of the Psychonomic Society. Chicago, Il.
Chaffin, R., & Imreh, G. (1997). Pulling teeth and torture: Musical memory and problem solving.
Thinking and Reasoning, 3, (4): 315-336.
Chase, W.G., & Simon, H.A. (1973). The mind's eye in chess. In W.G. Chase (Ed.), Visual
information processing. New York: Academic Press.
Clarke, E. F. (1988). Generative processes in performance. In J. A. Sloboda, (Ed.), Generative
processes in music: The psychology of performance, improvisation, and composition. (pp.1-26).
Oxford: Clarendon Press.
Davidson, J.W. (1993). Visual perception and performance manner in the movements of solo
Ericsson, K.A., Krampe, R.T. & Tesch-Romer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review, 100, 363-406.
Gabrielsson, A. (1999). Music performance. In D. Deutsch (Ed.), The psychology of music, second
edition, (pp.501-602). San Diego: Academic Press.Gieseking, W., & Leimer, K. (1972). Piano
technique. New York: Dover publications, Inc.
Gruson, L.M. (1988). Rehearsal skill and musical competence: Does practice make perfect? In J.A.
Sloboda (Ed.), Generative processes in music: The psychology of performance, improvisation, and
composition, (pp.90-112). Oxford: Clarendon Press.
Hallam, S. (1995). Professional musicians' approaches to the learning and interpretation of music.
Hallam, S. (1997). The development of memorization strategies in musicians: implications for
education. The British Journal of Music Education, 14, 87-97.
Lehmann, A. (1997). Acquired mental representations in music performance: Anecdotal and
preliminary empirical evidence. In H. Jørgensen, & A. Lehmann (Eds.), Does practice make perfect?
Current theory and research on instrumental music practice (pp. 141-164). Oslo, Norway: Norges
musikkhøgskole.
Marcus, A. (1979). Great pianists speak. Neptune, NJ: Paganiniana Publications, Inc.
Matthay, T. (1913). Musical interpretation: Its laws and principles, and their application in teaching
and performing. Boston, MA: Boston Music Company.
Matthay, T. (1926). On memorizing and playing from memory and on the laws of practice generally.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of
Music, 17, 95-109.
Miklaszewski, K. (1995). Individual differences in preparing a musical composition for public
performance. In M. Manturzewska, K. Miklaszewski & A. Bialkowski (Eds.), Psychology of Music
Today: Proceedings of the International Seminar of Researchers and Lecturers in the Psychology of
Music,(pp.138-147). Warsaw: Fryderyk Chopin Academy of Music.
Noyle, L. (1987). Pianists on playing: Interviews with twelve concert pianists. Metuchen, N.J.: The

Aiello
Scarecrow Press, Inc.

Repp, B. (1990). Patterns of expressive timing in a performances of a Beethoven minuet by nineteen
famous pianists. Journal of the Acoustical Society of America, 88, 622-641.
Shaffer, H. (1995). Musical performance as interpretation. Psychology of Music, 23 (1), 17-38.
Sloboda, J.A. (1985). The musical mind: The cognitive psychology of music. Oxford: Clarendon
Press.
Williamon, A. (1999a) The value of performing from memory. Psychology of Music, 27, 84-95.
Williamon, A. (1999a). Preparing for performance: An examination of musical practice as a function
of expertise. Unpublished doctoral dissertation, University of London, UK.
Back to index

Gunter Kreutz
Proceedings paper
Basic Emotions in Music

Gunter Kreutz
Institut f. Musikpädagogik
JWG-Universität Frankfurt
Germany
Abstract
This study investigates two sources of data on stereotypical emotion categories in every-day contexts and in music. Adult subjects (N=50, 35 female) from a university population rated the
appropriateness of emotion categories (32 items), with respect to (1) whether those emotions can be expressed through music, (2) whether they can be evoked through listening to music, and (3) whether
they can be influenced by music listening. The second set of data was derived from a database, which stores 2933 German and 1785 English lyrics from Songs and Lieder over various centuries.
Frequencies of emotion categories as represented in the lyrics were measured. Word fields instead of single entries were used in the retrieval process. Frequency profiles of emotion categories in lyrics
and subjective ratings were significantly correlated. Subjective ratings of emotion categories and their use in lyrics thus seem to some degree reflect the range, diversity, and constraints of stereotypical
connotations of basic emotions in music. Results suggest that music represents or evokes primarily those emotions, which are either genetically coded or learned early in infancy. Basic emotions in
everyday-life and in music are comparable to the extent that specific aspects of learning of social codes are negligible.
Key Words: music psychology, basic emotions, lyrics
Introduction
In recent years, various studies have addressed the relationship of music and emotion. Since it is known that music may evoke physiological concomitants of emotion (Bartlett, 1996; Panksepp, 1995),
the psychological approaches have taken at least two general perspectives. From the first perspective, investigators have looked at how emotional expression is coded in the musical material (e.g. Clynes,
1982; De Vries, 1991; Gabrielsson, 1995; Juslin, 1998; Kamenetzky, Hill & Trehub, 1998; Mergl, Piesbergen & Tunner, 1998; Rapoport, 1996; Schubert, 1998; Siegwart & Scherer, 1995; Sloboda, 1991).
These studies imply that certain structural features in the music are likely to represent aspects of emotions, or allude to prototypical mood states (Terwogt & van Grinsven, 1991). Other research has
more strongly emphasised extrinsic factors, for example, the social and behavioural context giving rise to emotional impression of music in children and adolescents (Behne, 1997). Strong emotional
experiences of music (Gabrielsson & Lindström, 1994) seem to result from a wide range of extra-musical, situational and biographical factors, possibly indicating some role of psycho-physiological
excitation-transfer effects (Zillman, Hoyt & Day, 1974).
One commonality of these attempts to explain emotionality in music, is the assumption that musical emotions somehow relate to basic every-day emotions. Although music is often thought of as the
most emotional of all art forms (e.g. de la Motte, 1985), few theorists have considered music as an emotional language in its own right (cf. Cooke, 1959). Instead, two influential philosophers, namely
Susanne K. Langer (1956) and Leonard B. Meyer (1956), for instance, have based their approaches to musical emotion on the assumption that they might be related to general patterns of human
behaviour. Langer (1956) has conceived of emotion in the sound of music as analogous to the feeling of emotion. In contrast, Meyer (1956) has stressed the role of expectation as a cognitive factor of
emotionality in music. In general, the complexity of emotion in music arises as it is seen as an expression of every-day emotion in the sense that multiple experiential levels are involved (cf. Dowling &
Harwood, 1986, chapter 8). However, even if it is agreed that emotion in music and every-day experiences are mediated by similar psychological mechanisms, little is understood about the degree to
which music and every-day emotions are congruent.
There is general agreement that music may both communicate and evoke a range of emotions. In addition, music can exert more pronounced or subtler influences on emotional states. Considering type
of music, individual, and situation, shades and nuances of emotions as represented, evoked, or influenced seem to be countless. However, both research in general and music psychology seems to be
based on the assumption that a limited set of dimensions and/or categories might account for the prevalent characteristics of emotional responses. In accepting broad categories of emotion, such as
"joy", "sadness", "anger", or "fear", which determine much of our emotional behaviour and experience in our daily lives, questions arise, whether and to what extent these and other every-day emotion
stereotypes are basic to music (cf. Terwogt & Van Grinsven, 1991).
file:///g|/Tue/Kreutz.htm (1 of 11) [18/07/2000 00:34:14]

Gunter Kreutz
Clynes (1982) proposed a particular set of emotions as basic in his theory of emotional expression and communication in music. He presented evidence that specific emotions might be represented as
motor programs in the brain. Other researchers, who in part followed Clynes' lead (e.g. de Vries 1990; Gabrielsson, 1995; Juslin, 1998) have not left Clynes' conception of basic emotions unchanged.
However, there has not been a thorough discussion of which emotion categories are basic or relevant to music, and why.
In a study which looked at the emotional impressions of 18th century Baroque opera arias on thirty-eight student listeners, Kötter (1996) found that the listeners were by and large able to identify the
correct emotional connotations of aria excerpts. The arias were modelled according to the principles of the Affektenlehre. Thus, the study addressed a much broader range of affects than usually found in
empirical studies of emotion in music. It seems worth to note that even emotions such as shame, pride, and disgust, were correctly identified. Another important point of this study is that data reduction of
ratings led to three factors, one was interpreted as Triebhaftigkeit (F1: drive, activation), another as Freude (F3: joy, valence), which frequently occur in general theories of emotion. Thus, there might be
some common dimensional ground in experiencing every-day and music emotion. However, results might be confounded as it remained unclear to what extent the lyrics could have effected the
responses. Furthermore, Kötter (1996) did not distinguish between emotions represented and evoked.
The present research attempted to assess the extent to which emotion stereotypes are related to music. First, a group of subjects was asked to rate, whether a given emotion could be represented, evoked
or changed by music. Second, a database was consulted to assess the representation of emotion stereotypes in a larger sample of German and English lyrics. Finally, a correlation analysis of the two sets
of data was established in order to determine the extent to which averaged subjective ratings and objective measures of frequencies might be attributed to a similar psychological process.
Method
Construction of emotion categories sample
For the purpose of the data collection procedures the selection of emotion categories was crucial. It is not possible to construct such a sample from a unified theoretic perspective. This difficulty is due
to the fact that researchers apply different approaches and levels of analysis in order to determine basic emotions and thus arrive at widely different conceptions (Ortony & Turner, 1990). As can be seen
from Table 1, which has been freely adapted from Ortony & Turner (1990), hardly two prominent emotion theorists show precise agreement about basic emotions. It can also be seen that categories of
joy, fear, anger, and sadness are clearly in the focus of most theories. It should be noted that not all theories of emotion are based on categories. However, dimensional theories of emotion, for example,
such as the circumplex model proposed by Russel (1980) do not necessarily contradict to categorical approaches, simply because interpretations of the model must rely on categories.
To what extent are general categories of emotion and specifically musical emotions compatible? On one hand, there seems tacit agreement that some but not all emotions work for music. For example,
considering the category disgust in Table 1, it appears to be a broadly accepted basic emotion. However, disgust does hardly play any role in theories of musical emotion. On the other hand, some
concepts, which seem to be broadly used in connection with emotional processes in music, are not as broadly conceived of in general emotion theories. Some music researchers stress expectancy (Meyer,
1956) and tension (e.g. Krumhansl, 1996) as concomitants of musical emotion.
Table 1: Matrix of emotion categories (after Ortony & Turner 1990)
Emotion a b c d e f G h i j k l m n o p q Σ
Joy √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ 16
Fear √ √ √ √ √ √ √ √ √ √ √ √ 13
Anger √ √ √ √ √ √ √ √ √ √ √ √ √ 13
Sadness √ √ √ √ √ √ √ √ √ √ 10
Disgust, aversion √ √ √ √ √ √ √ 7
Surprise √ √ √ √ √ 5
Interest √ √ √ 3
Shame √ √ 2
Expectancy, anticipation √ √ 2
Desire √ √ 2
Contempt √ √ 2
Distress √ √ 2

Gunter Kreutz
Acceptance √ 1
Hate √ √ √ 1
Hope √ 1
Courage √ 1
Panic √ 1
Rage √ 1
Pain √ 1
Guilt √ 1
Reverence √ 1
Sex √ √ 1
Subjection √ 1
Despair √ 1
Wonder √ 1
Tender-em. √ 1
Dejection √ 1
Notes: Letters a) to n) represent the following theories (from Ortony & Turner 1990, p. 316, Table 1): a) Arnold (1960)*; b) Ekman, Friesen & Ellsworth (1982); c) Frijda (1986); d) Gray (1982); e) Izard
(1971); f) James (1884)*; g) McDougall (1926); h) Mowrer (1960); i) Oatley & Johnson-Laird (1987); j) Panksepp (1982); k) Plutchik (1980); l) Tomkins (1984); m) Watson (1930)*; n) Weiner & Graham
(1984); o) Clynes (1982)**; p) De Vries (1990); q) Juslin (1998).
* these authors replace "joy" for "love"
** Clynes differentiates "joy" und "love"
In sum, the question of basic emotions seems a matter of ongoing debate. Therefore, a pragmatic solution in the search for basic emotions in music seems to be the use of a rather broad repertoire to
start out with. It was decided to compile this set from general theories and some theories of emotion in music, which are outlined above. The resulting list of thirty-two emotion categories was used in
the two parts of the empirical study, namely the semantic ratings (data set one) and the survey of a database storing lyrics of songs and Lieder (data set two).
Data set one: Semantic ratings
Subjects: Forty-nine participants (35 female) were recruited from the university student population. Most subjects were at least moderately musically trained. The majority has received instruction on one
or more musical instruments for at least two years.
Questionnaire and procedure: Each subject received a questionnaire consisting of five pages (DIN A4). On the first two pages, subjects had to fill in demographic data and data on musical experience
and some aspects of their music consumption. The third to the fifth page contained a list of thirty-two emotion words. On the top of each page, there was one question each. The questions were in
particular: Can music represent each of the emotions given in the list? Can music evoke each of the emotions given in the list? And finally: Can music influence each of the emotions given in the list? Each list of items contained the
same set, but in different order. Each category was rated on a five-point Likert-type scale, where ratings to the left of the midpoint indicated disagreement and ratings toward the right indicated
agreement. It was further explained that judgements should be based on pure musical sound and not in relation to pictures or lyrics with the music. Subjects were tested as groups in a seminar room.
Some individual subjects filled out the questionnaire at home. In the seminar, filling out the sheets took about twenty minutes. In general, the procedure posed no problems to the subjects.

Gunter Kreutz
Looking at the very left and the very right of the averaged rating profiles (Fig. 1), which is ordered after decreasing agreement of representation, reveals that some emotions were both highly agreeable and
disagreeable as emotions in music. Joy, sadness and desire were rated highest in all three types of question. In addition, love and pain found most agreement. Looking at the very right of this Figure, there is
also strong agreement that contempt, disgust and shame do not contribute to emotion in music.
Figure 1: Mean ratings of categories on the basis of whether music might express, evoke, or influence a given emotion.

Gunter Kreutz
For the averaged ratings, significant correlation coefficients were found (Table 2). However, toward the centre of Figure 1, a number of significant mean differences of individual items must be
observed. The number of these significant differences exceeds the number expected according to test theory for multiple comparisons of means. It is plausible that interest, boredom, mercy, and tiredness are
rather evoked than represented by music (p < .01), whereas items such as pain, loneliness, fear, despair, and pride reveal a reverse tendency (p < .01). A special status must be attributed to surprise. Ratings on
this item are lower for influence as compared to the other two ratings. Those items, which average around the midpoint of the scale and which are characterised by higher variance seem to be particularly
interesting. It can be inferred that some subjects seem to attribute a certain degree of effectiveness of music representing, evoking, or influencing these feelings However, a much larger sample would be
required to address individual differences involved here.
Table 2: Product moment correlation coefficients between the three rating scales.
Evoke Influence
Express 0,80 0,85
Evoke 0,93
Data set two: Lyrics data base survey

The second part of the empirical study consisted of searching a database of song lyrics by key entries. The database is accessible without any restrictions from the internet, and has seen continuous
growth over the last years. In December 1999, the database consisted of a total of 9870 lyrics from 1707 poets. The lyrics were part of songs by 1050 composers. In particular, there were 2933 songs in
German language, and 1745 in English. The lyrics date from medieval chants to high art songs and Lieder from the 19th and 20th century. The database features various search strategies, one of them,
searching by language, was used in this survey. Moreover, several search items can be inserted into the search field in a single step for convenience. As a result the frequencies and a list consisting of the
beginning of each song is given. For the purpose of retrieving frequencies of emotion categories, it seemed advisable to construct word fields in each language rather than using only single terms as in
the previous part of the study. The "Wörterbuch der Synonyme" [Dictionary of synonyms] by (Görner & Kempcke, 1999) was consulted to construct the word field for the German sample and the
"New Oxford dictionary of English" (Pearsall, 1998) was used for the English sample (see Appendix). The first sample of German entries was taken in December 1999, whereas the second sample of
English search terms was taken in March 2000. In the course of constructing word fields, semantic overlaps in a small number of items led to a reduction to twenty-nine items. These changes were
accounted for in the final correlation analysis of semantic ratings and frequency profiles.
There is a striking correlation between the two frequency profiles (Fig. 2), taken from German and English lyrics. Product moment coefficient amounted to r = .97 for the total of items (N=29) and to r
= .96 for those items, which occurred in at least 1 percent of all songs (N=15). The dominance of love, pain, joy, and sadness themes is not surprising as these categories contain archetypical antagonisms of
poetry. An interesting cultural difference is observed for the category desire (German Sehnsucht), which occurs in about every tenth German song, but is much less frequent in English lyrics. Before
interpreting such differences, however, it would be mandatory to more deeply analyse cultural inflections in the verbal communication of specific emotions. This observation points to a general problem.
Clearly, emotion in lyrics arises by means of metaphor and a large number of expressive devices. They do not necessarily surface in explicit labels as searched for in this survey. Therefore, it is unclear,
whether, for example, the relatively marginal representation of generally important emotions such as fear and anger in the lyrics has to do with a significantly stronger metaphoric 'translation', or whether it
must be attributed to a generally diminished interest in communicating these every-day emotions in poetry. This question cannot be decided here.

Gunter Kreutz
Figure 2: Frequencies of emotion words in German (N=2933) and English (N=1785) songs in a lyrics database
The final step of the analysis tried at least to establish, whether the occurrence of emotion categories is quantitatively related to their subjective importance, as was assessed in the first part of this study.
For this purpose, averaged rating profiles and frequencies of emotion categories in the database were correlated. Coefficients of determination (indicating variance explained) were calculated (Table 3). In
general, frequencies are best predicted as a curvilinear, parabolic function of averaged subjective ratings. Moreover, the coefficients are highest, if frequencies are higher than one percent, and
particularly, if the evoke and influence ratings rather than express ratings are considered.

Gunter Kreutz
Table 3: Correlation (coefficients of determination) between averaged ratings and frequencies of emotion categories in a song lyrics database.
Express Evoke Influence
Total R2 (linear) 0.26 / 0.25 0.15 / 0.14 0.21 / 0.19
R2 (parab.) 0.42 / 0.36 0.26 / 0.26 0.32 / 0.32
N > 1% R2. (linear) 0.24 / 0.21 0.30 / 0.21 0.32 / 0.29
R2. (parab.) 0.47 / 0.33 0.69 / 0.54 0.64 / 0.56
Note: Coefficients refer to German / English sample.
General Discussion
Studies in the psychology of music have claimed the importance of language as a referential system for the psychological reality of music (see Fricke, 1999; Kleinen, 1999). The present investigation used
linguistic categories to determine emotion stereotypes in music. The relationship of every-day and music emotions was determined on the basis of two independent sets of data. The first set of data
consisted of subjective ratings on thirty-two emotion words. Even in the absence of sounding music, it became clear that subjects differentiate emotions on the basis of whether they can be expressed,
evoked, or influenced by music. There was a significant effect that music is thought of as expressing rather than evoking a given emotion (F[2,56]=8.3; p<.002). Only few emotions, which indicate
motivation or attitude (interest, boredom, tiredness) stood against this trend. It was found that those emotions received high ratings, which according to Mowrer (1960) require no learning (pain, joy) and
which are presumably acquired before a social identity is fully developed (in particular love, sadness, desire, loneliness, anger). Other highly rated emotions indicated degrees of activation (unrest, relaxation).
Contrary to these, most social emotions (shame, pride, jealousy, disgust) play little or no role in music. However, there are other categorical systems, which may aid to interpret these findings. Mees (1985)
distinguishes relationship, empathy and target as major categories of emotion. The latter are differentiated by evaluation, expectation, attribution and moral emotions. Finally, each group of emotions is composed of
positive and negative opposites. Relationship (love), evaluation (joy/sadness) and expectation (desire) are easy to identify in this system. But there is no clear categorisation for pain, which might not be an emotion
in Mees' theory. Considering the four basic emotions used in Terwogt & Van Grinsven's (1991) study, fear and anger were less identifiable emotions than joy and sadness in music selections. Perhaps
emotions such as joy and sadness, which are both to be considered positive in the sense that they are highly appreciated in musical contexts, are more expected in music than the latter, which seem less
appreciated. In light of the present findings the former two emotions also drop off as evoked or influenced as compared to expressed feelings. It might be that the identification of an emotional state in
music depends to some extent on resonance in or involvement of the listener. Summarising so far, results suggest that there need to be distinctions made between every-day and music emotions. The
particular kind of emotion and the developmental state of their acquisition seem to be important key factors. Music addresses those emotions which might be considered instinctive or which have a
physical basis.
A search of emotion words within a database of about 4700 German and English lyrics generated a frequency profile, which correlated significantly with the semantic ratings. In other words, subjective
ratings predict to some degree the frequency of an emotion category (as represented in word fields) in lyrics over various centuries. Poets/composers seem more devoted to those emotions, which music
might evoke or influence, and they seem less interested in the full range of expressible emotions. There is an undeniable dominance of themes of "love", "pain", "joy", "sadness", and "desire", which
render any other emotion marginal.
In particular, even though love is not considered as an emotion by most authors (cf. Table 1), it certainly indicates a fundamental human need, and an orientation of social being. A basis for this
phenomenon in mere joy seems inadequate. Some sociologically oriented, empirical studies tend to support a close relationship of love, sexuality, and music (Gembris, 1995; Kreutz, 1997, 1999). There is
historical evidence that love has been subject matter of much music production long before this stereotype has come under the rule of commerce in the industrial age. It should be noted here that
poets/composers prefer to address motives (love, pain) in their lyrics rather the commonly associated affects (joy, sadness). This preference most easily explains the inverted-U shape found in the
correlation of the two data sets.
Finally, some limitations and perspectives of this study should be addressed. No sounding music was used here. It remains uncertain, how exactly subjects interpreted the three questions, whether ratings
were based on music in subjects' minds or other associations, or whether familiarity with a given emotion term influenced the ratings. At least, the procedure seemed efficient enough to generate a
plausible pattern of emotion stereotypes in music, which is further corroborated by previous studies and, again, by the amount of correlation between the two independent sets of data in the present
study.
There is no apparent solution to the problem that emotions in poetry and lyrics do not necessarily surface in explicit linguistic labels, but often emerge from the interpretation of symbols, rhetoric

Gunter Kreutz
devices, etc. Moreover, the implied emotionality of songs and lyrics, in particular, can only be accounted for by taking individual perspectives. Yet the magnitude of correlation in the two sets of data
seems the more remarkable, although no complete congruence could be expected from the outset. Future investigation might try to track the changes of emotion patterns in the songs and lyrics over
various periods of time. In this respect, emotion stereotypes could serve as appropriate starting points for the intertwining of historical and systematic approaches to music perception.
References
Bartlett, Dale L. (1996). Physiological Responses to Music and Sound Stimuli. In: D. A. Hodges (ed.), Handbook of Music Psychology, second edition. San Antonio: IMR Press, p. 343-385.
Behne, Klaus-Ernst (1997). Musikerleben in Adolescence. In: I. Deliège & J. Sloboda (eds.), Perception and Cognition of Music. Hove, UK: Psychology Press, p. 143-160.
Clynes, Manfred (1982). Music, Mind and Brain. New York: Plenum Press.
Cooke, Deryck (1959). The language of music. Oxford: Oxford University Press.
De Vries, Bart (1991). Assessment of the Affective Response to Music with Clynes's Sentograph. Psychology of Music, 19, p. 46-64.
Dowling, W. Jay & Harwood, Dane L. (1986). Music Cognition. New York: Academic Press.
Feldmann, Matthias (1998). Erwartungsdiskrepanz und emotionales Erleben von Musik. Hildesheim: Olms.
Fricke, Jobst (1999). Music Seen as a Game Using Cognitive Abilities of Language. In: I. Zannos (ed.), Music and Signs. Semiotic and Cognitive Studies in Music. Bratislava: ASCO Art and Science, p.
59-70.
Gabrielsson, Alf & Lindström, Siv (1994). On Strong Experiences of Music. In: K.-E. Behne, G. Kleinen & H. de la Motte-Haber (eds.), Musikpsychologie, Band 11. Wilhelmshaven: Florian Noetzel, p.
118-139.
Gabrielsson, Alf (1995). Expressive Intention and Performance. In: R. Steinberg (Hrsg.), Music and the Mind Machine. The Psychophysiology and Psychopathology of the Sense of Music. Berlin:
Springer, p. 35-48.
Gembris, Heiner (1995). Musikalische Interessen und Aktivitäten im Erwachsenenalter: Psychosoziale Funktionen in zwischenmenschlichen Beziehungen. In: H. Gembris, R.-D. Kraemer & G. Maas
(eds.), Musikpädagogische Forschungsberichte 1994. Augsburg: Wißner, p. 123-133.
Görner, Herbert & Kempcke, Günter (1999). Wörterbuch Synonyme. München: dtv.
Juslin, Patrik N. (1998). A Functionalist Perspective on Emotional Communication in Music Performance. Acta Univeristatis Upsaliensis (Diss. Summary).
Kamenetsky, Stuart B., Hill, David S. & Trehub, Sandra E. (1997). Effect of tempo and dynamics on the perception of emotion in music. Psychology of Music, 25 (2), p. 149-160.
Kleinen, Günter (1994). Die Psychologische Wirklichkeit von Musik. Wahrnehmung und Deutung im Alltag. Kassel: Gustav Bosse.
Kleinen, Günter (1999). Die Leistung der Sprache für das Verständnis musikalischer Wahrnehmungsprozesse. In: K.-E. Behne, G. Kleinen & H. de la Motte-Haber (eds.), Musikpsychologie, Band 14,
Göttingen: Hogrefe, p. 52-68.
Kötter, Eberhard (1996). Zu Bezügen zwischen den Benennungen von Affekten in der Barockmusik und Begriffen der heutigen Emotionspsychologie. In: K.-E. Behne, G. Kleinen & H. de la
Motte-Haber (eds.), Musikpsychologie, Band 12, Wilhelmshaven: Florian Noetzel, p. 75-88.
Kötter, Eberhard (1998). Zum Einfluß der musikalischen Vorbildung auf die Beurteilung barocker Affekte in Opernarien Händels. In: K.-E. Behne, G. Kleinen & H. de la Motte-Haber (eds.),
Musikpsychologie, Band 13, Göttingen: Hogrefe, p. 55-68.
Kreutz, Gunter (1999). Gender Differences and Sociographical Factors of Erotic Signification in Music. In: I. Zannos (ed.) Music and Signs. Semiotic and Cognitive Studies in Music. Bratislava ASCO
Art and Science, p. 393-407.
Kreutz, Gunter (1997). Musikrezeption zwischen Liebestraum und Love parade: Sexualität und Sinnlichkeit im Erleben von Musik. Medienpsychologie, 4, p. 293-311.
Langer, Susanne K. (1956). Philosophy in a new key. Cambridge: Harvard University Press.

Gunter Kreutz
Krumhansl, Carol L. (1996). A Perceptual Analysis Of Mozart's Piano Sonata K. 282: Segmentation, Tension, and Musical Ideas. Music Perception, 13 (3), p. 401-432.
Mees, Ulrich (1985). Was meinen wir, wenn wir von Gefühlen reden? Zur psychologischen Textur von Emotionswörtern. Sprache & Kognition, 1, p. 2-20.
Mergl, Roland, Piesbergen, Christoph & Tunner, Wolfgang (1998). Musikalisch-improvisatorischer Ausdruck und Erkennen von Gefühlsqualitäten. In: K.-E. Behne, G. Kleinen & H. de la Motte-Haber
(Hrsg.), Musikpsychologie, Band 13. Göttingen: Hogrefe, p. 69-95.
Meyer, Leonard B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Mowrer, Orval H. (1960). Learning theory and behavior. New York: Wiley.
Nielson, F. V. (1983). Oplevelse af musikalsk spanding (Die Wahrnehmung musikalischer Spannung). Copenhagen: Akademisk Forlag.
Ortony, Andrew & Turner, Terence J.(1990). What's Basic About Basic Emotions? Psychological Review, 97(3), p. 315-331.
Panksepp, Jaak (1995). The Emotional Sources of „Chills" Induced by Music. Music Perception, 13 (2), p. 171-207.
Pearsall, Judy. (ed.) (1998). The new Oxford dictionary of English. Oxford : Clarendon Press.
Rapoport, Elizier (1996). Emotional Expression Code in Opera and Lied Singing. Journal of New Music Research, 25, p. 109-149.
Russel, James A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, p. 1162-1178.
Schmidt-Atzert, Lothar & Ströhm, Walter (1983). Ein Beitrag zur Taxonomie der Emotionswörter. Psychologische Beiträge, 25, p. 126-141.
Schubert, Emery (1998). Time Series Analysis of Emotion in Music. Proceedings of the 5th International Conference for Music Perception and Cognition (ICMPC), Seoul National University, 26. bis 30.
August 1998, Seoul, Südkorea, p. 257-263.
Siegwart, Hervine & Scherer, Klaus (1995). Acoustic Concomitants of Emotional Expression in Operatic Singing: the Case of Lucia in Ardi gli incensi. Journal of Voice, 9 (3), p. 249-260.
Sloboda, John A. (1991). Music Structure and Emotional Response. Psychology of Music, 19, p. 110-120.
Terwogt, Mark M. & van Grinsven, Flora (1991). Musical Expression of Moodstates. Psychology of Music, 19, p. 99-109.
Zillman, Dolf, James L. Hoyt & Kenneth D. Day (1974). Strength and effect of aggressive, violent, and erotic communications on subsequent aggressive behavior. Communication Research. 1(3), p.
286-306.
Appendix
N=1865 unique English texts
Love 634 Beloved affection fondness liking passionate beloved Cupid cherish
enamoured amorous
Pain 291 Anguish painful pang harmful suffer distress trouble tears sigh bitter
hurt ache achiness twinge
Joy 378 Joyousness pleasure pleasant glad rejoice gladden happy enjoyment
happiness delight cheerful enjoy fortune fortunate elated elation funnies
gratification blitheness maffick please
Sadness 236 Sad unhappy grieve grief sorrow deplore inept saddish lamentation
lamento mourn
Gunter Kreutz
Longing 70 Desire lust missing
Tiredness 75 Tired weary exhausted hacknayed sleepy
Hope 85 Hopeful confidence expectation trust
Loneliness 42 Lonely solitary isolated deserted unfrequented
Fear 122 Danger afraid alarm fearful dread anxiety reverence risk
Anger 28 Rage angry displeasure enrage
Despair 44 Desperate discouraged
Emptiness 36 Empty unoccupied unfurnished meaningless hollow insincere void

devoid lacking discharged
Pride 64 Proud prideful
Shame 36 Shameful disgrace humiliate humiliation folly regret unfortunate
Mercy 71 Pity compassion
Jealousy 11 Jealous suspicious resentful envious intolerant vigilant
humbleness 17 Humble humbly
Contentment 6 Satisfaction satisfied
Relaxation 1 Relax relaxedness unperturbed
Discomfort 2 Uneasiness uneasy
Surprise 18 Unexpected astonishing unaware
Unrest 5 restlessness disturbance agitation
Disgust 0 Aversion repugnance indignation
Boredom 2 Bored ennui
Contempt 1 Disrespect
Shock 1 Concernment
Tension 23 Tense tenseness stretched strain excitement stress
Persistence 0 Besetting obstinate obstinately pertinacious stiff-necked tenacious tough

Gunter Kreutz
Interest 3 Concern curiosity attention
Note: Numbers refer to absolute frequencies of occurrence
Back to index

posters2
Poster Session 2 Tuesday

Theme 1: Brain mechanisms
Theme 2: Pitch, harmony and tonality
Theme 3: Music in popular culture and everyday life
Deligate Poster Title Theme
Grossbach, M. Are there supra-modal representations of temporally structured stimuli in the brain? (ABSTRACT) 1
Korosuo, K. Neural correlates of perception of tonal hierarchies (ABSTRACT) 1
Rhythm and metre- two dissociated mental representations of temporal structures in music?
Kuck, H. 1
(ABSTRACT)
Insights in to the functional organisation of music processing revealed using continuous
Potter, D. across-subject event-related potential averaging 1
Aoyagi, T. A new method for research on tonal hierarchy (ABSTRACT) 2

Brodsky, W. The "mind's ear": inner hearing of notated music (ABSTRACT) 2
Cooper, W. Local and global representations for music (ABSTRACT) 2
Dowling, W. Detection of unexpected pitches in a musical context (ABSTRACT) 2
Huovinen, E. Tone constellations:stabile melodic intervals determine the best tonal fit 2
Jansen, E. The role of implied harmony in the perception of brief tone sequences 2
Kwak, S. Perception of tonal dissonance in pure-tone compound intervals 2
Matsunaga, R. The role of the temporal ordering of pitches in tonal organisation 2
Orr, M. Recognition memory for melodies: implicit learning, expertise and musical structure (ABSTRACT) 2
Parker, O. Difference limen variations among homogeneous groups of university musicians 2
Pittenger, R. Preference and similarity judgements of mistuned unstable tones (ABSTRACT) 2
A longitudinal study of the acquisition process of absolute pitch: an effect on subject's age on the
Sakakibara, A. 2
process.
Serman, M. Computational modelling of segmentation processes in unaccompanied melodies 2
The effect of context key on melody recognition: the difference between absolute pitch posessors and
Yoshino, I others 2
Bertling, R. Music in everyday life of patients with mental disease (ABSTRACT) 3

Bertling, R. Music in everyday life-a comparison between musicians and non-musicians (ABSTRACT) 3
Djego, S. An exploratory psychological investigation into the meaning of popular music style 3
Gullberg, A. Rock music making - a possibility within a univeristy college of music? (ABSTRACT) 3
Kantor-Martynuska, J. The influence of background music with a distinct or unclear melodic line on reading comprehension 3
Klusak, M. Measuring music preference or preference for measuring? 3
Ogawa, Y. A laboratory study of the evaluation of signal music at railway stations 3
Relating knowledge in liking of a particular musical style to functionality: a study of adolescents and
Schlemmer, H. techno-music 3
Tarrant, M. Music and adolescents' intergroup behaviour (ABSTRACT) 3

Tekman, H. Perception of musical styles, people listening to them, and reasons for listening. 3
Zeranska-Kominek, S. A study of tune identification in oral tradition. (ABSTRACT) 3
Back to index
file:///g|/poster2/posters2.htm [18/07/2000 00:34:16]

Michael Grossbach
ARE THERE SUPRA-MODAL REPRESENTATIONS OF TEMPORALLY STRUCTURED STIMULI IN THE

BRAIN?
Michael Grossbach
Michael.Grossbach@gmx.de
Background:
In a previous study, processing of acoustic local and global time structures

were found to be predominantly activated right fronto-temporal regions.
Additional activation of centro-occipital regions drew our attention to a
possibly not modality-dependent and distributed neural network responsible for
the processing of temporally structured stimuli.
Aims:
To identify and differentiate neuronal networks involved in a putative
supra-modal analyser for processing temporal structures in both visual and
auditory sequences, this study investigates brain activation patterns in
healthy musicians during a same-different task with stimuli of both modalities.
method:
In an auditory/visual discrimination experiment, 10 right handed musician

students had to discriminate pairs of four-second temporally structured
acoustic or optical stimuli. 480 pairs were presented in the auditory or the
visual modality, or were a mix of both (visuo-auditory), respectively. Either
local or global temporal changes were introduced in the second sequence of half
of the trials. Brain activation patterns during processing were recorded using
64 channel DC-EEG.
Results:
Processing of temporal structures caused an increase in brain-activity in both

left and right fronto-temporal areas. In the visual and the auditory domain no
clear hemispheric preponderance for local or global processing emerged,
although in some individuals a dichotomy with respect to lateralisation could
be detected. As expected, visually presented stimuli were accompanied by an
additional activation of primary and secondary visual areas.
Conclusions:
Processing of local and global temporal structures relies on extended networks,

which seem to be individually formed, i.e. separate areas devoted to these
stimuli classes do not exist in the brain. Comparing brain activation during
visual and auditory stimulus presentation, we suggest that musicians tend to
transform visually presented time structures into the auditory domain.
Supported by Grants of the DFG (SPP 1046)
file:///g|/poster2/Grossbac.htm (1 of 2) [18/07/2000 00:34:17]

Michael Grossbach
Back to index
file:///g|/poster2/Grossbac.htm (2 of 2) [18/07/2000 00:34:17]

Ms
NEURAL CORRELATES OF PERCEPTION OF TONAL HIERARCHIES
Ms. Kaisu Korosuo
kaisu.korosuo@helsinki.fi
Background:
According to music theory different tones have different status in tonal

context. For instance the tonic and the dominant are more stable, structurally
significant and final-sounding than other tones. On the basis of this, it is
possible for listeners to construct a so-called tonal hierarchy which defines
the relative importance of pitch classes within a key.
Aims:
The present study aimed to show neurocognitive basis of tonal hierarchies.

method:
The subjects were adult musicians, who participated in two EEG-experiments. In

the first experiment they were watching a silent self-selected movie meanwhile
sound stimuli were presented for them.
In the other experiment, subjects were instructed to press the button every
time they detected a deviant tone.
The sound stimuli consisted of synthetized instrumental sounds of a E major

scale. The standard tone (p=70%) was the tonic and other tones of the scale
were deviants (p=5% for each). To minimize the effects of physical distances,
the second, third and fourth scale degree tones were always presented above the
standard and the fifth, sixth and seventh scale degree tones below the
standard. Before starting the sequence, a scale was always presented twice to
establish the E major context.
Results:
It is possible that some neural correlates for tonal hierarchies could be found
already at non-attended level. Mismatch negartivity (MMN) which reflects
automatic change detection seems to be largest for the dominant tone which had
the highest status in tonal hierarchy among the deviant tones. In attentive
condition dominant tone also differed from ERPs to the other tones.
Conclusions:
The results demonstrate that the neurocognitive basis of tonal hierachies can
be revealed with ERP recordings despite experimental manipulations of the
subject's attentional focus.
Back to index
file:///g|/poster2/Korosuo.htm [18/07/2000 00:34:17]

Helen Kuck
RHYTHM AND METRE - TWO DISSOCIATED MENTAL REPRESENTATIONS OF TEMPORAL

STRUCTURES IN MUSIC?
Helen Kuck
helen_kuck@yahoo.com
Background:
For the visual modality it has been found that global features are processed
within the right hemisphere whereas local features are processed within the
left hemisphere. Transferring these results to the auditory modality we
hypothesised that the processing of rhythm (a local time structure in music)
might also be located within the left hemisphere whereas the processing of
metre (a global time structure) might be lateralizied to the right hemisphere.
No systematic study has yet been carried out to investigate a comparable
hemispheric dissociation in healthy subjects. Preceding studies with epilepsy
patients showed ambiguous and partly contradictory results.
Aims:
We investigated the cortical areas involved in processing rhythmically and

metrically accented auditory stimuli in musicians.
Method:
18 healthy right-handed musicians (8 females) were asked in a discrimination

task to compare 160 pairs of four-second-stimuli. In half of the trials the
pairs were identical, in one quarter the second sequence was changed in the
metric dimension, in the remaining quarter a change in the rhythmic dimension
occurred. During each session slow cortical potentials (DC-EEG) were recorded
from 32 channels to assess cortical activation patterns.
Results:
Contradictory to the hypothesis both rhythm and metre were found to be

processed mainly in the right temporal area, i.e. no hemispheric dissociation
of local and global time structures could be discovered. The processing of
rhythm aditionally involves parietal cortical networks.
Conclusions:
As rhythm and metre are found to be processed in the same networks it remains
unclear, whether these networks are specific for musical time structures or
belong to a supramodal time processing unit. Moreover the additional activation
of parietal areas during processing of rhythm points to higher demands in
temporo-spatial integration of sensory information as compared to the more
global metre task.
file:///g|/poster2/Kuck.htm (1 of 2) [18/07/2000 00:34:18]

Helen Kuck
Back to index
file:///g|/poster2/Kuck.htm (2 of 2) [18/07/2000 00:34:18]

INSIGHTS INTO THE FUNCTIONAL ORGANIZATION OF MUSIC PROCESSING REVEALED USING CONTINUOUS ACROSS-SUBJECT EVENT-RELATED POTENTIAL
Proceedings paper
INSIGHTS INTO THE FUNCTIONAL ORGANIZATION OF MUSIC PROCESSING REVEALED USING CONTINUOUS
ACROSS-SUBJECT EVENT-RELATED POTENTIAL AVERAGING
Douglas D. Potter (1,2), Helen Sharpe (1), Deniz Basbinar (1), Susan Jory (1)
(1) Keele University, UK (2) Now at University of Dundee, UK
d.d.potter@dundee.ac.uk
http://www.dundee.ac.uk/psychology/ddpotter/
Introduction
Recent functional imaging studies have shown that pitch and rhythm processing utilizes resources primarily in the left hemisphere and that timbre processing utilizes resources
primarily in the right hemisphere . We report here preliminary findings using single trial across subject averaging of electrical brain activity associated with passive listening to
unfamiliar pieces of music. The advantage of this technique is that one can look at unique and transient changes in the operation of perception, memory and attention mechanisms
that are occurring over the time scale of seconds and minutes, but with a resolution of milliseconds.
Method.
In this study participants listened to 3 different and unfamiliar pieces of music while the ongoing electroencephalogram was recorded at 500hz from 19 standard 10/20 locations on
the head. A continuous, single trial across subject, topographic map of activation (using an average reference) was generated from these continuous samples.
Results / Discussion
Only brief samples of the continuous recording are presented here. These can be viewed in the attached shockwave files y7s8, f8s4 and f9s4 by clicking on the links when using
internet explorer 5 or another browser with shockwave capability. In simple terms the blue negative patches indicate areas of high negative potential and the red areas high positive
potential. In general terms negative potentials often indicate sustained enhanced activation and positive potentials indicate transient inhibitory processes but such simple
interpretations do not cover the full range of possibilities. In the present images blue and red are best treated as crude indicators of regions in which more activity is occurring or has
occurred. Representative frames from these movies are illustrated below in Figures 1-3. It is clear from the movies that musical pieces with different structures evoke quite different
patterns of activation. However there are common features in these patterns of activation that would be expected given the specific structural features of these stimuli. Both y7s8 and
f9s4 have an abrupt onset to the music and in these movies the first prominent feature is a fronto-central P3a that is associated with the brain rapidly orienting attention to this new
stimulus. In the case of f8s4 the piece starts slowly and so a P3a is not obvious in this recording. In all the movies a more posterior positive feature is observed following the P3a and
this would typically be classed as a P3b. In the examples given here this positive feature is quite variable in distribution. This is a probably a result of the differing structure of the
pieces. In standard experiments that evoke P3b deflections and involve averaging of several trials within subject, the distribution is relatively diffuse. The P3b is believed to be made
up of a number of distributed sources in the cortex as well as the hippocampus. In the present results it appears that more evidence of multiple sources can be discerned possibly as a
result of the trial unique nature of the response. The P3b deflection is generally regarded as marking the operation of certain long-term memory processes.
file:///g|/poster2/ddp/potter.htm (1 of 4) [18/07/2000 00:34:25]

The next class of distinctive features in the movies are those that seem to be related to the processing of music. No specific manipulation of pitch, rhythm or timbre was made in the
present study and comparison of regions of activation will be made with previous neuropsychological and imaging findings. Neuropsychological studies indicate an important role
for the right hemisphere in timbre processing and recent imaging evidence highlights precentral and inferior frontal regions (BA 4,6) in particular . There is clear evidence in f8s4
and f9s4 of activity in this right frontal area.
In the present study there is evidence of 3 discrete loci of activation in the region of the left parietal / temporal / occipital (PTO) junction. This is most clearly seen in f8s4. This
observation is consistent with PET imaging findings and is primarily attributed to analytical pitch processing. A further locus of activation on the right parieto/temporal/occipital
junction can be seen most prominently in y7s8 but also in f8s4 and f9s4. This is associated most clearly with synchronized putative pitch processing activity in the region of the left
PTO junction in f8s4. Activation of this region is consistent with the findings in passive pitch processing conditions . This pattern of bilateral activation in these posterior regions
with a bias towards the left hemisphere is in accordance with recent PET observations of pitch processing . Zatorre and co-workers have reported activation in temporal and
fronto-temporal regions in the processing of pitch. This apparent conflicting result can be resolved by considering the dynamic patterns of activation seen in the present study. In the
movies presented here it can be seen that right frontal regions are active during some of the times that the posterior regions are active, suggesting that the areas associated with timbre
processing participate in pitch processing. In addition Platel and co-workers note that these right frontal regions were also active during pitch and rhythm processing .
The one feature that is not well represented in the present data is the activation of more anterior left frontal regions (BA44/6) when attending to rhythmic structure in music .
However in the musical piece with the strongest rhythm, f9s4, there is evidence of activation at more anterior locations on the left hemisphere. Again, the level of detail seen in these
single trial averages appears to exceed that typically seen using more traditional brain electrical activity averaging techniques.
The final class of feature that can be seen is a distinct pattern of activation over lateral anterior frontal and ventral posterior regions. This is most clearly seen in y7s4. This pattern of
activation may be consistent with PET functional imaging studies that show activation of frontal and cerebellar regions when subjects are unfamiliar with stimulus or task parameters
or frontal activations associated with long term memory retrieval/encoding and working memory operation .
Conclusion
These are preliminary findings and much further work needs to be done to relate specific aspects of these patterns of brain electrical activity to more specific aspects of music
processing and memory function. These initial findings do, nevertheless, seem to fit well with current PET studies of music processing. Continuous visualization of patterns of
event-related potentials to each element in a continuous stimulus stream may offer significant advantages in trying to understand the dynamic interactions of regions of activation as
well as dissociating the relative contribution of overlapping memory and attention processes. While Positron Emission Tomography and functional Magnetic Resonance Imaging
have identified regions of the brain involved in specific aspects of music processing single trial across-subject averaging may allow the possibility of looking in far greater detail at
the response properties of specific regions and may provide important insights into the functional properties of these regions.
440 640 800 1720 2400 2880 3560
Figure 1. y7s8 Single-trial across-subject average (n=32) based on 19 standard 10-20 positions. Potential distribution is projected onto a 3-d model of head viewed from the rear.
Example frames of main event related potential features observed during first 8 seconds in passive listening to music stimulus. Main features are central P3a at 440 msec, parietal
P3b at 640 msec, right posterior occipito-parietal ?P3R? at 800/1720 msec and right ventral-occipital / left frontal activation in 800-3560 examples. Voltage Range +/- 10 microvolts.

520 740 840 1040 3200 3560 3840 3920
Figure 2. f8s4. Parameters are as stated in Figure 1, except that potential distribution projection is onto a convex surface that allows a view of the entire potential map. (n=20)
600 640 760 1280 1600 2240 2360 2560
Figure 3 f9s4. Parameters as stated in Figure 2. (n=18)
References
Auzuo, P., Eustache, F., Etevenon, P., Platel, H., Rioux, P., Lambert , J., & al., e. (1995). Topographic EEG activations during timbre and pitch discrimination tasks using
musical sounds. Neuropsychologia, 33(25-37).
Cabeza, R., & Nyberg, L. (1997). Imaging Cognition: An Empirical Review of PET Studies with Normal Subjects. Journal of Cognitive Neuroscience, 9(1), 1-26.
Kelley, W. M., Miezin, F. M., McDermott, K. B., Buckner, R. L., Raichle, M. E., Cohen, N. J., Ollinger, J. M., Akbudak, E., Conturo, T. E., Snyder, A. Z., & Petersen, S. E.
(1998). Hemispheric asymmetry for verbal and nonverbal memory encoding in human dorsal frontal cortex. Journal of Cognitive Neuroscience, 46-46.
Mazziotta, J. C., M.E., P., Carson, R. E., & Kuhl, D. E. (1982). Tomographic mapping of human cerebral metabolism: auditory stimulation. Neurology, 32, 921-937.
Mazzucchi, A., Marchini, C., Budai, R., & Parma, M. (1982). A case of receptive amusia with prominent timbre perception defect. Journal of Neurology, Neurosurgery and
Psychiatry, 45, 644-647.
Petersen, S. E., van Mier, H., Fiez, J. A., & Raichle, M. E. (1998). The effects of practice on the functional anatomy of task performance. Proceedings of the National
Academy of Sciences of the United States of America, 95(3), 853-860.
Phelps, M. E., & Mazziotta, J. C. (1985). Positron emission tomography: human brain function and neurochemistry. Science, 228, 799-809.
Platel, H., Price, C., Baron, J. C., Wise, R., Lambert, J., Frackowiak, R. S. J., Lechevalier, B., & Eustache, F. (1997). The structural components of music perception - A
functional anatomical study. Brain, 120, 229-243.

Smith, E. E., & Jonides, J. (1999). Neuroscience - Storage and executive processes in the frontal lobes. Science, 283(5408), 1657-1661.
Snyder, A. Z., Abdullaev, Y. G., Posner, M. I., & Raichle, M. E. (1995). Scalp Electrical Potentials Reflect Regional Cerebral Blood- Flow Responses During Processing of
Written Words. Proceedings of the National Academy of Sciences of the United States of America, 92(5), 1689-1693.
Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural Mechanisms Underlying Melodic Perception and Memory for Pitch. The Journal of Neuroscience, 14(4), 1908-1919.
Zatorre, R. J., Evans, A. C., Meyer, E., & Gjedde, A. (1992). Lateralization of Phonetic and Pitch Discrimination in Speech Processing. Science, 256, 846-849.
Back to index

Takahiro Aoyagi
A NEW METHOD FOR RESEARCH ON TONAL HIERARCHY
Mr. Takahiro Aoyagi
aoyagi@usa.com
Background:
Many experimental studies have employed the probe-tone method to describe a

tonal hierarchy. However, this method of data collection can be problematical,
because not every subject may interpret the experimental task in such a manner
that the experimenter expects him/her to do.
Aims:
This paper will analyze the reported and possible problems of the probe-tone
method. Especially, the problems caused by the cultural difference between the
experimenter and subjects will be addressed. Also, I will introduce a new
method of measuring a tonal hierarchy. This new method is designed to be robust
for certain problems, as there is less room for subjective interpretation.
Main contributions:
Until today, there had been only one method of quantifying importance of
various pitches in music. The new method that is to be introduced in this paper
may be employed as an alternative to the probe-tone method when the probe-tone
method may not be a viable method of data collection. Also, this new method may
be used as a confirmatory method for the tonal rating where the probe tone
method can be used.
Implications:
Many scholars have equated the results of probe-tone rating as the tonal
hierarchy. This suggests that we have tendency to resort to just one
methodology without examining its implication extensively. This paper will
suggest the importance of studying music of other cultures because it reveals
that our cultural bias is at work even in experimental studies.
Back to index
file:///g|/poster2/Aoyagi.htm [18/07/2000 00:34:26]

Demonstrating Inner Hearing Among Highly-Trained Expert Musicians
The "Mind's Ear": Inner Hearing of Notated Music

Warren Brodsky, Department of the Arts, Ben-Gurion University of the Negev
Avishai Henik, Department of Behavioral Sciences, Ben Gurion University of the Negev
Bat-sheva Rubinstein, Rubin Music Academy, Tel-Aviv University
Moshe Zorman, The Music Teachers Seminary, Levinsky College of Education
Background. Experimental investigations of Musical Imagery have demonstrated that this

representation is coded temporally, unfolds in real time, and involves explicitly musical stimuli. A
special case whereby imagery is cued from musical notation has been referred to as Inner Hearing.
Nonetheless, only anecdotal evidence suggests that this skill can be developed, and that musicians do
hear the printed page in their "mind's ear."
Aims. The goal of the study was to demonstrate Inner Hearing, and to explore if the representation is
auditory or phonological by nature.
Method. A total of ninety-one musicians participated in one of four experiments. Subjects were
presented with musical notation whereby melodies were embedded into the texts and were
un-discernible through vision. Subjects judged if a theme (presented aurally) matched the melodic
source embedded in the text (previously read silently). Experiments 1-2 presented subjects with
normal non-distracted sight-reading and two concurrent interference conditions (either rhythmic
distraction, external melodic input, or articulatory suppression). Experiment 3 investigated differences
of matching themes when embedded melodies where either presented visually (as texts read silently)
or aurally (as music heard aloud). Experiment 4 considered variances of ability among orchestra
musicians based on their instrument type (winds versus strings).
Results. The study presents four findings: (1) Inner Hearing can be demonstrated in more than 60% of
highly-trained music experts; (2) Inner Hearing is more significantly impaired by articulatory
suppression than either concurrent rhythmic distraction or external melodic input; (3) Themes are
significantly easier to match when embedded melodies are presented aurally as music heard aloud;
and (4) Wind-players demonstrate superior Inner Hearing skills which may relate to inherent demand
characteristics of wind-instruments.
Conclusions. The study presents evidence defining Inner Hearing as a process involving activation of
silent singing cued by musical notation resulting in detectable auditory perception and loading of the
inner ear generated by articulatory kinesthetic-like cues linked to the phonological system.
Back to index
file:///g|/poster2/Brodsky.htm [18/07/2000 00:34:27]

Mr
LOCAL AND GLOBAL REPRESENTATIONS FOR MUSIC
Mr. William B. Cooper
wcooper@utdallas.edu
Background:
Tanaka and Farah (1993) found that parts of a face are better recognised when
presented in the context of a face, than the individual parts of the faces are
recognised when presented in isolation from the face.
Aims:
The current investigation seeks to find evidence indicating that the perception
of complex music, like the perception of faces, is reliant on a global
representation.
method:
Using four-part hymn music, it will be determined whether the accuracy for
identifying changes in an individual melodic line is higher when that melodic
line is imbedded in the context of three other accompanying melodic lines, or
when the individual melodic line is presented alone.
An additional condition of analysis will deterime whether a higher frequency

range of an individual line, respective to the other three, predicts a more
accurate performance on its recognition. Lastly, analogous to the condition of
upside down faces in the Tanaka and Farah study, the current investigatino will
measure the accuracy of identifying changes in an individual melodic line when
that melodic line is imbedded in the context of three tonally distant melodic
lines.
Results:
It is predicted that, like the perception of faces, evidence will be found to

support the notion that the processing of music is reliant more on a global
representation than a local representation. Additionally, based on the findings
of Palmer and Hollerman (1994) who found that the best detection of pitch
changes occured in the highest frequency melodic line, it is predicted that, in
this study, reognition will be more accurate for the highest frequency melodic
line that for the other three melodic lines. Lastly, analogous to the findings
by Tanaka and Farah (1993), it is predicted that by making the imbedding
context less familiar (atonal), the advantage for recognition lent by an
accompanying context will be minimised.
Conclusions:
The question of whether faces are represented by local or global configurations

is widely disbuted in the face procesing literature. Accordingly, it is
worthwhile to investigate whether findings in this field of study extend to
other domains such as musical processing. The current investigation seeks to
make a contribution to the musical processing literature, as well as
supplimenting existing findings that fuel the face processing debate.
file:///g|/poster2/Cooper.htm (1 of 2) [18/07/2000 00:34:27]

Mr
Back to index

Prof
DETECTION OF UNEXPECTED PITCHES IN A MUSICAL CONTEXT
Prof. W. Jay Dowling
jdowling@utdallas.edu
Background:
Previous research has found that when listeners expect target tones in a
particular pitch region, targets falling outside those regions often go
undetected. That is, targets markedly different in pitch, typically by half an
octave, are more difficult to detect.
Aims:
This study aims to test whether, in addition, tones that are unexpected in
terms of musical structure would also be more difficult to detect. By
"unexpected" we mean not conforming to the musical scale structure of a cue
melody presented at the start of each trial. There are two levels of
expectation involved: pitches outside of the key but nevertheless part of the
"tonal material" in the culture (for example, a C# in the key of C), and
pitches outside of the tonal material (for example, the quarter step between C
and C#).
method:
Listeners have hearing thresholds assessed for this situation, and perform a
series of two-alternative-forced-choice detection trials near their individual
thresholds. For each alternative the listener hears all but the final note of a
familiar cue melody normally ending on the tonic, followed (or not) by a target
tone. The listener has to say which alternative contained the target tone. On
most trials the target was in the expected pitch region, and of those trials it
was most often the expected tonic. On some trials the target was approximately
one-half octave higher or lower. One-third of the unexpected trials were
pitches of the musical scale in the key of the melody; one-third were nonscalar
semitones; and one-third were quarter steps.
Results:
Preliminary results indicate that, as before, pitches in unexpected pitch

regions were more difficult to detect than those in the expected region.
Nonscalar semitones (for example, C#) were just as easy to detect as the
expected target (C), while quarter steps were more difficult to detect, even in
the expected pitch region. That is, the C+ quarter step between C and C# was
more difficult to detect than the structurally more-expected pitches on either
side of it.
Conclusions:
Musical structure in the form of the scale framework and the framework of tonal
material in a culture affects even the very early stages of pitch processing.
That is, it does not appear than a pitch is perceived, and then "encoded" as a
scale note or something else. Rather, even the detection of a tone is affected
by whether the pitch processing system expects something in that category.
file:///g|/poster2/Dowling.htm (1 of 2) [18/07/2000 00:34:28]

Prof
Back to index
file:///g|/poster2/Dowling.htm (2 of 2) [18/07/2000 00:34:28]

Tone constellations: Stabile melodic intervals determine the best tonal fit
Proceedings paper
Erkki Huovinen, University of Turku, Finland
Introduction: stabile intervals

Empirical research on tone center perception has often either concentrated on explaining musical
situations where tonality is already established, or it has taken the diatonic set or some other
preconceived structure as a starting point. The first approach is best exemplified by the Tonal
Hierarchy Theory (Krumhansl and Shepard 1979; subsequent literature summarized in Krumhansl
1990), which describes the relative stability of tones in different intervallic relationships to the tonic in
tonally unambiguous contexts. Stability of a tone is taken to mean its fittingness or relatedness to a
given tonal center, and interpreted as the relative structural significance of the tone in question. The
objective is to describe the mental representations of the listeners in terms of the stability of tones. In
the second approach mentioned, tonality is explained by offering a structural description of the
musical phenomena that bring tonality about. This can be done in various different ways, but most
often the concern has been either on triadic or diatonic structures. One approach that has aroused a lot
of discussion is the Rare Interval Hypothesis by Richmond Browne (1981), which states that the tonal
center is most readily found by reference to the rare intervals of the diatonic set, i.e. the tritone and the
two minor seconds. When heard, these intervals by the uniqueness of their positions in the familiar
diatonic set point to certain possible tonal centers and facilitate the finding of the right one.
The usual critique against both of these approaches has consisted in pointing out that they are
inherently static and do not take into account the influence of temporal (dynamic, functional) musical
contexts on tonal center perception. However, the aims of the experiments carried out by the critics
have for the most part been in modifying the original structurally oriented theories. Notably, Butler's
(1989) Theory of Intervallic Rivalry assumes tonal centers to be found on a best-evidence basis in the
temporal context of music, but the evidence is still mainly provided by the rare intervals of the
diatonic set. There are problems with this view, however. Experiments by Wolfgang Auhagen (1994)
have showed that not all of the heard tones are given the same status in the perception of melodies - as
would happen when comparing the auditive impression with a familiar schema like the diatonic set.
Instead, the listener seems to attend to some tones and leave others disregarded.
This points to the question, what are the intervals most naturally attended to? According to the Rare
Interval Hypothesis, for western listeners they would be the rarest intervals of the diatonic set. That
being the case, these would probably also be the easiest intervals to identify. Some interval
recognition experiments, however, contradict this claim: both students of music (Killam, Lorton &
Schubert, 1975; Balzano & Liesch, 1982) and average college students (Jeffries, 1974) seem to
identify most or even all of the other intervals better than the tritone. Now, besides its rarity in the
diatonic set, the tritone is often thought to possess more directly perceivable qualities such as to make
it a special interval. For example, when Paul Hindemith in his principal theoretical work (Hindemith,
1937) offered a ranking of (melodic as well as harmonic) intervals with respect to their supposed
"harmonic force", tritone stood last in the ranking, whereas the position of the most "unambiguous"
file:///g|/poster2/Huovinen.htm (1 of 10) [18/07/2000 00:34:31]

and "powerful" intervals was held by the perfect fifth and the perfect fourth. In this context,
Hindemith is an especially telling example, because he also claimed the most harmonically powerful
intervals to stand out of the musical texture and determine the tonality, and consequently his methods
for musical analysis rely heavily on this conception. Despite these intuitions, there has been a lack of
studies on how the more "stabile" intervals could affect the forming of tonal centers especially in
melodic contexts.
The Tonal Hierarchy Theory in its original form does not say much about what brings the feeling of
tonality about in the first place. In terms of intervals we are only told that certain intervallic
relationships to the pre-established tonal center are preferred over others. Information about the
tonality-forming powers of the intervals cannot therefore be gained by the usual probe-tone method of
the Tonal Hierarchy theorists - by asking experimental subjects how well certain tones fit into an
unambiguously given tonal center. A more appropriate method is the one adopted by Auhagen. In the
study mentioned above, he played short to subjects asking them in each case to produce a suitable
tonal center by operating a tone generator with twelve buttons, one for each pitch class. Auhagen
himself pays more attention to the effects of the temporal ordering of the tones, but his data can also
be used to test more structurally oriented intuitions about the importance of certain intervals to the
phenomenon of tonality. One part of the tone material used consists in 182 different five-tone pitch
strings. It turns out that for these strings, 211 (88%) of the 241 statistically significant tonal centers
reported by Auhagen form a fifth or fourth, that is, an interval of interval-class 5 (hereafter ic5) with
at least one of the other tones in the string. Some of the statistically significant tonal centers produced
by the subjects were tones outside the original five tones played for them, and so the total sum of pitch
classes to consider was 951. Of these, 602 (63%) formed an ic5 with one of the other tones in the
string. The 95% confidence interval for the difference between the proportion of ic5-forming tones
among the tonal centers and the proportion of ic5-forming tones among all tones used is therefore
0.25 ± 0.05. This means that forming an ic5 with some other tone in the string has clearly been a
desirable property for tones candidating to be elected as tonal centers. This would promote
Hindemith's idea of fifths and fourths between some (not necessarily successive) tones of the melody
acting as guides to its tonal structure.
Results such as this point to the possibility of finding in single stabile intervals a positive structural
constraint for tonal center perception. It must be emphasized that stability is here understood as a
property assignable to intervals independent of context and not a property that is assigned to single
tones due to their intervallic relation to a given tonic. Of course, intervallic stability would be too
weak a criterion to work as a sufficient condition for locating the tonal center. This becomes
especially clear in a diatonic context if we regard the intervals of ic5 as stabile: each and every tone of
the diatonic set participates in an interval of ic5 with one or two other tones of the set. But then again,
this is no argument against the conception of ic5 as a stabile, tonality-promoting element. It has been
demonstrated (West & Fryer, 1990) that for the seven diatonic tones presented in random order,
musically trained listeners can besides the major-mode tonic equally well choose the mediant,
subdominant or dominant as tonal centers. This can be taken to show that when the effects of temporal
ordering are suppressed, there is not enough structural differentiation in terms of interval stability in
order that the tonality be unambiguous. It can still be hypothesized that in a melodic context
containing fewer supposedly stabile intervals of the ic5 the listeners would in their tonal decisions
very much orientate by these intervals.
Tone constellations
The issue of choosing a tonal center seems often to be approached by asking "Given certain tones,

which one of them becomes accepted as the tonal center?" This implies a conception of tonality as
something that the producer of the music (such as a composer) builds into the music and that is
thereafter tracked down by the listener. If, however, we think of tonality primarily as something that
the listener puts into the music by way of focusing her attention on a certain pitch or pitch class, then
the question can be turned around. We can ask: "Which tones does the listener accept around a tonal
center?" As a tool for studying the tonal implications of particular interval classes, I would like to
introduce the concept of tone constellation, which can now be defined as the set of tones accepted
around a tonal center by the listener. Tone constellations can be best thought of as pitch-class sets
abstracted out of all tones present in the music and arranged around the chosen tonal center. Different
tone constellations are compared to each other by transposing them so that the tonal center always
corresponds to the same pitch-class, say, the class of all Cs. In this paper, pitch-classes (hereafter pc)
are referred to by the conventional number notation, whereby 0 = C, 1 = C#/Db, 2 = D etc. (see e.g.
Rahn, 1980).
In conjunction with the experimental method promoted by Auhagen, tone constellations allow us to
see whether the listener prefers certain intervallic relationships to the tonal center over others. To this
effect, the tonal material of the melodies used in the experiment is described as pitch-class sets
(hereafter pcsets), and for each subject, each pcset is transposed to move the chosen tonal center to
pc0. For example, if the melody consisted in tones C, E, G and B, one subject might hear C as the
tonal center, whereas someone else might prefer E. The tone constellation for the first subject would
then be [C, E, G, B], or [0,4,7,11], and the constellation for the second subject would be [C, Eb, G,
Ab], or [0, 3, 7, 8]. In each case, pc0 represents the chosen tonal center. Note that both of these tonal
interpretations are perfectly explicable within a traditional tonal theory: the first subject has preferred
a "major" tonality, whereas the second one has opted for a "minor" tonality. By taking the tonal
intuitions of the listener seriously, this method thus allows for individual listening strategies to be
taken into account.
The question concerning the stability of ic5 can now be easily formulated in terms of tone
constellations. We can ask, to what extent do the listeners use tone constellations with pc5 or pc7 in
them? In other words, is the listener likely to shift her tonal focus until a tonal center is found, which
stands in these favorable intervallic relationships to some other tones? It can be hypothesized, for
example, that if the heard melody consists of tones of a pcset that includes only one possibility of
forming an interval of a fourth or a fifth, then the listener will be likely to choose such a tone for the
tonal center that is a part of this stabile interval. This hypothesis was tested in an experiment that will
be described below.
Procedure
The tone material used in the experiment consisted in all five-tone pcsets whose interval vectors have
number 1 as their second last component. Interval vector (Forte, 1973), which is also sometimes more
properly called the interval-class vector (Morris, 1987) is a description of the total intervallic content
of a pcset, which enumerates the number of possible instances of each ic within the pcset. The interval
vector [254361], for example, describes the intervallic content of the diatonic set, where there are two
possible instances of ic1 (minor seconds/major sevenths), five instances of ic2 (major seconds/minor
sevenths) etc. The pcsets used in the experiment to be described had number 1 as the second last
component of their corresponding interval vectors, indicating only one possible instance of ic5
(perfect fourths/perfect fifths). In all, there are 20 such pcsets, and two of the have always a common
interval vector. Below there is a list of the interval vectors, the corresponding pcsets used, and for
reference also the customary "Forte names" and the prime forms of the sets (see Forte, 1973). Note

that every pcset used in the experiment included pc0 and pc7, which together make up the only
possible ic5.
[332110] {0,7,9,10,11} 5-2 A, {0,1,2,3,5}
{0,7,8,9,10} 5-2 B, {0,2,3,4,5}
[322210] {0,7,8,10,11} 5-3 A, {0,1,2,4,5}
{0,7,8,9,11} 5-3 B, {0,1,3,4,5}
[322111] {0,6,7,8,9} 5-4 A, {0,1,2,3,6}
{0,1,7,10,11} 5-4 B, {0,3,4,5,6}
[231211] {0,6,7,8,10} 5-9 A, {0,1,2,4,6}
{0,1,7,9,11} 5-9 B, {0,2,4,5,6}
[223111] {0,6,7,9,10} 5-10 A, {0,1,3,4,6}
{0,1,7,9,10} 5-10 B, {0,2,3,5,6}
[221311] {0,1,3,7,11} 5-13 A, {0,1,2,4,8}
{0,4,6,7,8} 5-13 B, {0,2,3,4,8}
[213211] {0,1,3,4,7} 5-16 A, {0,1,3,4,7}
{0,3,4,6,7} 5-16 B, {0,3,4,6,7}
[122311] {0,3,7,9,11} 5-26 A, {0,2,4,5,8}
{0,4,7,8,10} 5-26 B, {0,3,4,6,8}
[122212] {0,4,6,7,10} 5-28 A, {0,2,3,6,8}
{0,1,3,7,9} 5-28 B, {0,2,5,6,8}
[114112] {0,3,6,7,9} 5-31 A, {0,1,3,6,9}
{0,1,4,7,10} 5-31 B, {0,2,3,6,9}
Each pcset was used as material for one melody which incorporated high speed and large intervals in
random-like succession to make functional clues (such as the temporal order of the tones)
insignificant in assessing the tonal center. First, the 20 pcsets were arranged in such an order that by
transposition there would be no common tones between successive trials, if possible. The melodies
were then composed in 5/8-time and they were all 18 measures in length, where every measure

consisted of all the pcs of the appropriate pcset. The registral space for each melody was principally
G3-F#5, and the registral position and the order of the tones were changed measure for measure so
that no pc could appear in the same registral position and/or the same part of the measure in more than
two consecutive measures. Three consecutive notes were not allowed to form any inversion of a major
or minor triad, and the same pitch class was not allowed to appear two times in succession at measure
boundaries. To avoid registral accents (see Huron & Royal, 1996) pcs7-0 that would otherwise have
predominated the low register were here and there moved up to G5-C6, and likewise pcs1-6 that
would otherwise have predominated the high register were moved down to C#3-F#3. This was carried
out according to the rule that if such a half-octave (pcs1-6 or pcs7-0) included two tones of the
pentachord, one of these would in turn be moved out of the principal registral space every four
measures, and if the half-octave included three of these tones, one of these would be likewise
transposed every three measures. (In some cases, the pentachords had to be transposed a little to
achieve this division of the five tones in groups of two and three at the boundary of the two
half-octaves.)
The melodies were then played using a Power Macintosh 7200/90 computer with Finale 3.5.1
software, with a flute-like sound chosen from the internal sound bank of the software program. The
duration of each tone was fixed to 150 ms, which made the duration of one measure 750 ms and the
total duration of each melody 13.5 s. The high speed of presentation combined with the large interval
skips and unbiased ordering of the tones were meant to hinder conscious choices of tonal center based
on familiar temporal and registral successions. Further, the activation of metrical accents and order
effects were minimized by providing the first two measures with a continuous crescendo from zero
amplitude and likewise the last two measures with a continuous diminuendo ending in zero amplitude.
The melodies were finally recorded on minidisc by a Sony MZ-R35 minidisc recorder and reproduced
in the experiment through a stereo system (amplifier Pioneer SA-510, loudspeakers Infinity
J814-200640) at a comfortable amplitude level.
The test subjects were 73 music professionals, students and amateurs aged between 17 and 53 years
(M = 27.43, SD = 9.13). All of them were tested individually. The subjects were asked to listen for a
tonal center in the melodies, where the concept of "tonal center" was explained to mean "the most
stabile tone in relation to the other tones". There were two practice trials before the 20 primary trials.
During each trial, the subject listened to the melody as many times as he/she wanted in order to be
able to sing, hum or whistle the tone felt to be most suitable as tonal center. The experimenter, who
was out of sight from the subject, checked all the answers on a keyboard with headphones before
writing the answers down

For analysis, the answers were transposed back to the pitch level corresponding the list of pc sets
given above. This way, the tonal importance of ic5 could be examined by looking at the answers in
favor of pc0 and pc7. It was found that ic5 was a highly significant (p = 0.0001) factor in determining
the choice of tonal center: 56.2 % of the chosen tonal centers (SD for the 20 melodies = 0.114)
corresponded to either pc0 or pc7, the percentage for pc0 being 35.3% (SD = 0.111), and for pc7,
21.0% (SD = 0.105). The greater popularity of pc0 over pc7 seems to indicate that the more stabile
tone in ic5 would be the lower tone of a perfect fifth or the higher tone of the perfect fourth. This
could be taken to imply that the idea of interval roots has indeed some relevance also with respect to
melodic intervals. This is also exactly what Hindemith (1937) claimed: that the lower tone of the fifth
and/or the upper tone of the fourth would most likely be the ones singled out to act as perceptual
anchors, even if the crucial melodic intervals are formed by tones separated from each other by other

tones.
These results already show, however, that there is a lot more individual variation in the decisions
concerning tonality than one might think. In the context determined by the pcsets described above, ic5
was indeed found to act as a relatively strong criterion in determining tonality, but it still explains only
a part of the results, as is clear from the figures given above. Here the concept of tone constellation
comes in handy. Tone constellations are arrived at by transposing the pcsets for each answer type so
that the chosen tonal center becomes pc0. For example, the set {0,1,3,7,9} has received answers for
nine of the possible twelve pitch classes. Apart from the obvious pc0 (31 answers) there was a
considerable concentration of answers on pc3 (14 answers). By transposing the pcset a minor third
down we find the corresponding tone constellation {0,4,6,9,10}, which now represents the auditive
impression of 19% of the participants. When all answer types have been treated similarly, the result
will be a list of tone constellations with their respective frequencies of occurrence.
To simplify the analysis, subjects with divergent answers were left out. Originally it was hoped that
the relative length (13.5 s) of the test melodies would constrain the subjects to find a tonal center
among the tones actually heard and not try "resolving" perhaps dissonant-sounding melodies into
tones outside the used pcset. To a great extent, this was what also happened. Only 11 of the 73
participants gave an answer outside the pcset more than four times. As it was hard in some of these
cases to determine whether the subject was applying a consistent strategy or whether he/she just had
"a bad ear", these cases were left out from the final analysis. For the remaining 62 subjects, only 5.2%
of the answers fell outside the heard pitch classes.
When the tone constellations for the remaining subjects had been calculated, it was found that most
subjects had a tendency to include one or more particular pcs in their tone constellations. In other
words, certain intervallic relationships to the tonal center were often systematically favored by the
subject over others. This can be seen by listing those pcs that appear in, say, at least 10 of the 20
constellations of each subject. The four pcs to rise above this limit most often were pc7, pc3 (both in
35% out of the 62 cases), pc9 (32%), and pc4 (29%). What is of interest here is the relatively small
overlap between these four groups of listeners. This is illustrated in figure 1, which combines the
groups preferring pc3 and pc4. The three strategies considered are thus those in which the tone
constellations included (1) pcs [0,3] or [0,4], (2) pcs [0,7], and (3) pcs [0,9]. Together these three
strategies cover 89% of the cases, but the respective groups of subjects appear to be rather distinct.
This is to say that although the subjects often appear to use individually consistent tonal hierarchies,
these hierarchies can differ significantly from each other. What is more, the different strategies were
not found to correlate with differences in age or musical education. These results thus show that there
might be a lot more individual variation in listeners' mental representations of tonality than is
sometimes thought to be the case.

Figure 1. The three most favored intervallic strategies were to include in the tone constellation an
interval of major or minor third, fifth, or major ninth above the tonal center (or their complementary
intervals below it). The ellipses represent the groups of subjects who used one of these strategies in at
least half of the 20 tone constellations. The percentages show the relatively small overlap between
users of the different strategies.
These figures should still be regarded with caution, however, because no mention has been made of
the relative frequencies of occurrence for intervals in the pcsets used. The interval vectors shown
above reveal that different interval-classes are more common in the pcsets than others, and as a
consequence it is no wonder that certain pcs pop up in the tone constellations more often than others,
while listeners mainly choose tonal centers from the tones in the pcset. For this reason, the relative
frequencies of occurrence for intervals in each subjects constellations have to be normalized as if the
starting point had been equal for all intervals. This has been done here by first determining a mean
interval vector, which describes the average interval content of the pcsets used, and then dividing the
frequencies of occurrence for each interval by the appropriate number in the mean interval vector. For
the 20 sets used in the experiment, the mean interval vector is [2, 2, 2.2, 1.8, 1, 1]. The vector reveals,
among other things, that the popularity of pc3 and pc9 in the tone constellations was at least partly
due to their prominence in the tone material itself. If the frequencies of occurrence counted for pc3
and for pc9 are divided by 2.2 (the number for ic3 in the vector) and the similar operation is carried
out for other pcs, the results are seen in a different light. When all results are modified in this way, we
come up with measures for the relative desirability of pcs in tone constellations, which describe the
test subjects' strategies of tonal focusing in a context determined by the 20 pcsets, but normalized as if
the interval vector of the tone material was [111111]. Note that in case of the last component of the
interval vector the results must in addition be divided by 2, since a single ic6 in the pcset results in the
inclusion of pc6 in two (not one) of the five possible constellations built around the five pcs of the set.
The interval vector can easily be given a probabilistic interpretation (Lewin, 1977). On the assumption
that the subjects choose their answers solely from the five pcs heard, each pc (except ic6) has in the
normalized situation the probability 0.2 of being included in a tone constellation. Now it is also
reasonable to define a limit for statistical significance with respect to how often the subject has chosen
a certain pc in her tone constellations. If we choose 0.01 as the critical value, it turns out that, by
binomial distribution, the listener has to opt for a given pc nine or more times out of 20 (p = 0.00998)

in order for the pc to be a statistically significant inclusion in the constellations. It turns out that for 38
subjects of the 62 under consideration, the desirability of at least one pc rises above this limit.
However, after the normalization process these desirable pcs only include pc7 (for 52% of the
subjects) and pc5 (for 15% of the subjects). The overlap of the respective subject groups is still
relatively small: the two significantly desirable pcs coincided at the same subject in only 3 cases out
of 62.
Figure 2 illustrates the relative desirability of the pcs 1-11 as inclusions in the listeners' tone
constellations. For each pc, the broken line represents the factual average proportion of inclusion in
tone constellations (the results before normalization); this corresponds to the situation illustrated
earlier in Figure 1. The continuous line in turn represents the average desirability (the normalized
results). It is easy to see that the broken line somewhat misrepresents the situation as it gives the
impression that pc3 and pc4, for example, would have been especially favored and pc5 the least
wanted companion to the tonal center. However, the relative absence of pc5 in the actual tone
constellations is due to the structure of the pcsets used, all of which included only one possibility for
an interval of ic5 in them. By choosing pc7 in their constellations the listeners automatically cancelled
out pc5, and vice versa. This is also reflected in the relatively large dispersion of the desirability
measures of these two pcs, which only repeats what was said in the preceding paragraph: many
subjects had a strong tendency to choose mostly either pc7 or pc5 in their tone constellations. The
relatively high average desirability of pc5 shows that pc5 was not at all neglected in the process of
tonal focusing as the non-normalized results appear to imply. In contrary, it was used as much as the
structure of the pcsets allowed it to be. The methodological lesson to be learned is that unless the
intervallic possibilities of the tone material are properly taken into account, the results of studies in the
perception of tonality will fail to reflect the nature of the decision-making process of tonal focusing.
What the listener hears is not always what she seeks to hear. What she strives for is not always what
she gets.
Figure 2. The average proportion of inclusion in tone constellations for pcs 1-11 is given by the

broken line, and the relative desirability by the continuous line with standard deviation error bars.
It is true that the measures of desirability outlined above do not reflect the actual auditive impressions
of the listeners, which are indeed more faithfully conveyed by the results before normalization. What
they do reflect is the listeners' attitudes towards particular intervallic relationships to the tonal center.
While pc3, pc4, pc6, pc8 and pc9 have been readily accepted by the subjects to appear alongside a
tonal center of pc0, there has still not been much attempt to force them into the tone constellations any
more than the tone material naturally gives occasion to. The intervals belonging to ic5, in turn, have
proved their supposed stability by exerting a considerable pull towards themselves in the listeners'
process of tonal focusing. For typical western listeners, these stabile intervals help to determine the
best tonal fit, the best way to make sense out of new and unfamiliar melodies. However, this is only
one side of the coin. Most of the rules for intervallic strategies of tonal hearing in melodies may no
doubt be functionally oriented, that is, more concerned with the temporal and registral relationships of
the tones. The present study nevertheless indicates a desirable structural condition for these
functionally organized melodies to fulfill: if possible, the tonal center must be part of a stabile interval
belonging to ic5. This structural interval will then act as the basis on which particular intervallic
contexts can develop.
On the whole, tonal focusing is a complex phenomenon that may be subject to a substantial amount of
variability among listeners. Individuals may have different criteria for intervallic stability, that is,
different criteria as to which intervallic relationships to the tonal center are to be considered tonally
fitting. Approaching the problem in terms of tone constellations helps to pay attention to these
individual strategies of tonal hearing.
References
Auhagen, W. (1994). Experimentelle Untersuchungen zur auditiven Tonalitätsbestimmung in
Melodien. Teil 1: Text. Kassel: Gustav Bosse Verlag.
Balzano, G. J. & Liesch, B. W. (1982). The role of chroma and scalestep in the recognition of
musical intervals in and out of context. Psychomusicology, 2, 3-31.
Browne, R. 1981. Tonal implications of the diatonic set. In Theory Only, 5(6-7), 3-21.
Butler, D. (1989). Describing the perception of tonality in music: A critique of the tonal
hierarchy theory and a proposal for a theory of intervallic rivalry. Music Perception, 6,
219-242.
Forte, A. (1973). The structure of atonal music. New Haven, CT: Yale University Press.
Hindemith, P. (1937). Unterweisung im Tonsatz. I. Theoretischer Teil. Mainz: B. Schott's
Söhne.
Huron, D. & Royal, M. (1996). What is melodic accent? Converging evidence from musical
practice. Music Perception, 13, 489-516.
Jeffries, T. B. (1974). Relationship of interval frequency count to ratings of melodic intervals.
Journal of Experimental Psychology, 102, 903-905.
Killam, R. N; Lorton, P. V. Jr. & Schubert, E. D. (1975). Interval recognition: Identification of

harmonic and melodic intervals. Journal of Music Theory, 19, 212-234.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University
Press.
Krumhansl, C. L. & Shepard, R. (1979). Quantification of the hierarchy of tonal functions
within a diatonic context. Journal of Experimental Psychology: Human Perception and
Lewin, D. (1977). Forte's interval vector, my interval function, and Regener's common-note
function. Journal of Music Theory, 21, 194-237.
Morris, R. D. (1987). Composition with pitch-classes: A theory of compositional design. New
Haven and London: Yale University Press.
Rahn, J. (1980). Basic atonal theory. New York: Schirmer Books.
West, R. & Fryer, R. (1990). Ratings of suitability of probe tones as tonics after random
orderings of notes of the diatonic scale. Music Perception, 7, 253-258.
Back to index

Note: In the setup for the article this is the proposed order of experiments
Proceedings paper
THE ROLE OF IMPLIED HARMONY IN THE PERCEPTION OF BRIEF TONE

SEQUENCES
by
Erik Jansen and Dirk-Jan Povel
Nijmegen Institute for Cognition and Information
Nijmegen, The Netherlands
Introduction
Aims of the study

The ultimate aim of the present research project is to develop a perceptual model for the on-line processing of
music. In the project music perception is conceived as an incremental process involving several stages and
distinctive subprocesses in which structural regularities in the input are detected and musical knowledge from
long-term memory is applied to the analysis of the incoming tones in order to arrive at a suitable interpretation of
the musical material. The model distinguishes three clusters of processes. The first cluster concerns the induction of
interpretative frames, such as meter for the temporal domain and key for the harmonic and melodic domains. The
second cluster regards the chunking of a few tones in the input into perceptual units, as a result of top-down
processes (recognition of elements stored in long term memory) or of bottom-up processes (coding of input
elements in terms of their structural regularities). The third cluster of processes concerns the combination of the
chunks of tones into longer sequences (Povel & Jansen, 2000). This model is conceived of as a cognitive model and
therefore applies concepts and notions common to cognitive psychology.
The present paper focuses on some of these processes, namely the recognition of chords and the perceptual
organization of those recognized chords. In particular, we investigate how the relation between successive
recognized chords influences the perception of single tone melodies.
Harmonic factors in melody perception
It is generally accepted that once a key is induced some tones and chords in that key are more stable than others
(e.g. Krumhansl, 2000; Lerdahl, 1988). Tones which are less stable elicit the perceptual tendency to resolve to more
stable tones (Bharucha, 1984; Povel, 1996; Povel & Jansen, 1998). While the latter studies attempt to explain the
process of perceiving tones relative to other tones with different stabilities, other research is concerned with the
processing of tones in a melody in terms of the accompanying chords (e.g. Thompson, 1993; Bigand, 1997). It can
be argued that local harmony which is explicitly established by the accompaniment of a melody, functions as
contextual information determining the stabilities of tones in the melody. However, literature on the perception of
implicit harmony in unaccompanied melodic sequences is relatively scarce. Some of the literature is discussed
below.
Cuddy, Cohen, and Mewhort (1981) showed that the degree of harmonic structure plays an important role in the
perception of tone sequences. Other research has investigated the connection between harmony and melody in the
file:///g|/poster2/Jansen.htm (1 of 8) [18/07/2000 00:34:34]

perception of accompanied and unaccompanied tone sequences. This has resulted in a dispute about whether
melody and harmony are mutually influencing dimensions or perceptually independent and additive components of
the musical stimuli (Povel & Van Egmond, 1993; Thompson, 1993). Thompson (1993) proposed the notion of a
partly hierarchical connection between key, harmony and melody.
Other studies have concentrated on the perception of specific groups of tones or chord changes in melodies. Povel
and Jansen (1998), for instance, showed that listeners are capable of recognizing arpeggiated chords in a series of
tones. Listeners, who judged a number of tone sequences on their musical goodness, generally gave higher ratings
to sequences that only contained chord tones, or sequences containing a non-chord tone that is linked (anchored) to
a closely following chord tone. From the study it was concluded that chord recognition and anchoring are important
mechanisms in the perception of melodic sequences.
Platt and Racine (1994) showed that listeners are capable of detecting a chord change occurring in a sequence of
arpeggiated tones from a single triad. Other researchers have shown that listeners perform less well with tasks in
which they have to detect melodic alterations when these alterations conform to the implied harmony than when
they violate it (Trainor & Trehub, 1994; Holleran, Jones & Butler, 1995). In addition, the ability to detect violations
of implied harmony has been reported to develop at a later age than the ability to detect violations against the key
(Trainor & Trehub, 1994), suggesting the involvement of a skill acquired through exposure, rather than being an
inherent characteristic of key or diatonic structure.
From a somewhat different perspective, Schmuckler (1989) investigated melodic and harmonic expectations by
collecting listeners' responses for a number of probe tones and arpeggiated probe chords at several sequential
positions in a musical phrase. The results of the harmonic probe chords were shown to be in accordance with
Piston's "Table of usual root progressions" (Piston, 1973). This table denotes, for the triads on each of the diatonic
scale degrees, which other chords follow a) most often, b) less often, or c) seldom.
In sum, several studies have pointed out the relevance of chords and chord changes for the perception of melodic
sequences. However, these studies have not addressed one possible implication for music perception: If harmony
underlying melody conveys perceptually important information, listeners should demonstrate the (explicit or
implicit) evaluation of chord progressions in processing single tone sequences. If this can be shown, it follows that
listeners are capable of recognizing a number of tones as a chord, then keeping a representation of this chord in
short term memory, while the next few tones are being recognized as another chord, enabling the final step of
evaluation of the chord progression.
Present approach
The present approach focuses on the question whether listeners evaluate the quality of a chord change. The aim of
this study is to investigate the role of implied harmony in the perception of a melodic line by using a paradigm in
which listeners rate the melodic goodness of tone sequences. Two contrasting hypotheses were formulated:
The first hypothesis states that listeners do not mentally represent the relation between chords that are induced when
listening to a tone sequence. This implies that after having heard the sequence only the last chord is perceptually
relevant. This hypothesis is based on the outcome of studies, that have shown, that music perception is a relatively
local process (e.g. Povel & Jansen, 1998; Bigand & Parncutt, 1999). The second hypothesis states that listeners do
represent and evaluate the relation between a succession of implied chords.
These hypotheses were tested in an experiment using a number of tone sequences only containing sequential
intervals larger than a major second which may be conjoined into chords. Four categories of tone sequences were
formed, corresponding to four different harmonic progressions employing the I, IV, and V triads of the major
diatonic scale. Listeners rated the melodic goodness of the sequences.
The predictions of the two contrasting hypotheses are:
I. On the hypothesis that listeners base their goodness rating on the last chord independently of the first chord, it is
predicted that sequences ending on the dominant triad (V) are rated higher than sequences ending on the
subdominant triad (IV).
II. On the hypothesis that listeners base their goodness rating on the relation between the two chords in the
sequence, it is predicted that there is a context effect of the first chord on the final chord, parallel to the usualness of

the chord relation. These predictions are based on the Table of usual root progressions by Piston (1973).
Also the effect of contour structure is investigated. It is hypothesized that simple contour structure increases the
perceptual goodness of tone sequences. Complexity of contour structure is therefore defined in terms of the number
of contour changes within segmented subgroups in the sequences (as explained below). Thus, it is predicted that
simpler contours are rated higher than complex ones.
Experiment
Method
Participants
Twenty-five listeners, students and staff of the Psychology Department of the University of Nijmegen, with various
degrees of musical experience participated in the experiment. Most of the participants practiced or had practiced a
musical instrument (ranging from 3 to 25 yrs), in most cases the violin, closely followed by piano and guitar.
Listeners were from different musical backgrounds, mainly tonal classical and mainstream pop music. Age ranged
between 18 and 38 yrs, with a median of 25.
Stimuli
Thirty-two 6-tone sequences only containing intervals larger than a major second were constructed by crossing 4
harmonic progressions with 8 contours, grouped into 4 contour categories, as described below. The stimuli are
shown in Table 1. The design was a 2-factor within-subjects design: Progression (4 levels) x Contour (4 levels).
1) Harmonic categories
Each 6-tone sequence consisted of an arpeggiated triad followed by another arpeggiated triad. These triads were the
Tonic (I), the Subdominant (IV), or the Dominant triad (V) within the same key. Pitch height differences between
the pcs present in the tone sequences were kept as small as possible.
The first triad was either a I, IV, or V chord. The choice of the second triad was based on the Table of usual root
progressions by Piston (1973). If the first chord was a I, the second chord was either a IV or a V, producing the
progressions I-IV and I-V, both quite usual (Category 1) progressions according to Piston's table. If the first chord
was a IV the second chord was a V, resulting in the chord progression IV-V, which is also a Category 1 progression.
If the first chord was a V, the second was a IV, producing a V-IV progression, which is less common (Category 2)
according to Piston's table.
2) Contour configuration
To investigate the effect of contour (here defined as the pattern of contour changes), the four harmonic categories
were crossed with 4 contour categories. These 4 groups of two sequences each differed with respect to the number
of directional changes in the entire sequence and on the relative position of the change(s) within the tones of the
first and the second triad. Group 1 contained 2 direction changes in total, but none within each of the two triads (in
fact, both triads were parallel upward or downward). In Group 2 the sequences had 3 changes, the first triad forming
a linear contour motion, and the final triad containing one change. In Group 3 this was reversed: 3 changes in the
entire sequence, the first triad containing a contour change, the final triad following a linear motion. In Group 4, a
total of four contour changes resulted in both triads containing a contour change (note that no linear motion was
present in this category).
Differences in melodic accents (defined here as peaks in the pitch contour) and pattern of intervals were minimized.
3) Timing and articulation
Presentation of the stimuli was manipulated to induce a segmentation in two groups of three tones (see also Jansen
& Povel, 1999), by stressing the first and the fourth tone. Timing and articulation of the tones was based on the
recorded key-press velocities, tone durations and IOIs of a triple meter pattern played by one of the authors, such
that tone IOIs were approximately 500 ms. The parameter values of the tones 1 to 6 are as follows: Velocities
(1-127) were 80, 66, 59, 80, 66, and 59; IOIs (in ms) were 514, 483, 496, 503, and 509; and durations (in ms) of the
tones were 514, 483, 315, 503, 509, and 297.

Each tone sequence was preceded by a cadence V7-I to induce a major key. In order to strengthen the induction of
triple meter, the chord V7 was played three times and the chord I once using the same timing and articulation of the
first four stimulus tones. The cadence was followed by a 1041 ms silence before the tone sequence was sounded, in
order to comply with the induced meter.
Apparatus
The stimuli were generated by a Yamaha PSR-620 keyboard set to the sound Jazz Organ 1. The volume of the
keyboard's stereo internal speakers was set to a comfortable listening level. Both stimulus presentation and response
collection were controlled by a Macintosh 4400 Power PC computer, running a custom written computer program.
Procedure
Participants were seated at a desk facing the computer screen. The computer screen showed a horizontal array of 7
radio-buttons numbered 1 to 7 (left to right), and two buttons below. The numbered radio-buttons served as a
7-point scale ranging from "bad" (1) to "good" (7) and a response was given by clicking one of these buttons. By
pressing the bottom-left button, labeled "play" or "repeat" (depending on whether a stimulus had already been
listened to for the first time or not), a stimulus was presented. The bottom-right button, labeled "next" was pressed
to proceed to the next trial.
To start a trial the participant clicked the "play"-button to listen to a stimulus. Next, the listener answered the
question "How good is this tone sequence as a melody?" by clicking one of the 7 buttons representing the scale.
Participants could repeat a stimulus by pressing the play button again. The number of repetitions was not restricted,
nor was the time to provide a response. Finally, after having provided a response the participant clicked the
bottom-right button to continue with the next trial.
The experimental trials were preceded by a number of training trials, during and after which the participant was
allowed to ask questions regarding the procedure. The experiment proper followed, in which the test sequences
were presented in a different random order for each participant. The pitch height of the trials was quasi-randomly
varied between 2 semitones below and 3 semitones above C4, with consecutive trials never in the same
transposition. After the experiment, the participant was asked to comment on the response strategy followed or
anything that came to mind.

Results
The mean intersubject correlation was .323 (p<.0001). A MANOVA was performed to examine the effects of the
factors Progression and Contour (4 by 4 Repeated Measures). Both the main effect for Progression (F(3,22)=7.254,
p=.0015) and for Contour were significant (F(3,22)=33.764, p<.0001). A statistical test of the interaction between
Progression and Contour was also significant (F(9,16)=3.186, p=.0209).
The means for the 4 categories of Progression (I-IV, I-V, IV-V, and V-IV), were 4.57, 4.26, 4.53, and 3.84,
respectively (see Figure 1a). After Bonferroni correction for taking all possible pairwise comparisons, the difference
between I-IV and V-IV (F(1,24)=15.90; p=.0005), and the difference between IV-V and V-IV (F(1,24)=15.31;
p=.0007) were statistically significant. A planned comparison combining groups with the same final triad (I-IV and
V-IV, vs. I-V and IV-V) was not significant (F(1,24)=1.836; p=.1880), while a comparison between Category 1
(I-IV, I-V, and IV-V) and Category 2 (V-IV) progressions of Piston's table was indeed significant (F(1,24)=15.351;
p=.0006).
The means for the 4 Contour categories (groups 1-4) were 5.50, 3.76, 4.14, and 3.82, respectively (see Figure 1b).
Pairwise planned comparisons of the differences between these means showed that Contour group 1 differed
significantly from the other three groups: Group 2 (F(1,24)=84.871; p<.0001), Group 3 (F(1,24)=87.698; p<.0001),
and Group 4 (F(1,24)=71.483; p<.0001), again after Bonferroni correction.

The interaction of the factors (shown in Figure 2) was inspected further, both visually and by means of interaction
contrasts. For each Progression the effect of Contour group is approximately the same. The differences between
Contour group 1 vs. groups 2, 3, and 4 are statistically significant within each progression. Interaction contrasts
showed that for the progression V-IV, also the differences between Contour Groups 3 vs. 2 (F(1,24)=6.919;
p=.0147), and Groups 3 vs. 4 (F(1,24)=5.744; p=.024) are significant.
Discussion
The results show that both implied harmonic progression and contour structure played a role in the perception of the
6-tone sequences used in the Experiment. Overall, apart from some large pairwise differences, the effects found are
subtle rather than pronounced. The interaction of progression with contour shows that both factors are engaged in an
interdependent relation. Interpretations of the effects in terms of the hypotheses are given below.
The ratings for the harmonic progressions show that listeners, at least in part, form their goodness judgment of a
melodic sequence on its implied harmonic structure. In particular, listeners appear not to base their response solely
on the recognition of the final triad in a sequence, as shown by a nonsignificant contrast between categories with the
same final chord, but their responses can only be explained by including the influence of the first chord and thus
assuming the perception of chord changes, as is concluded from a significant contrast between Category 1 and
Category 2 progressions in terms of Piston's table. Therefore the first hypothesis stated in the introduction is
rejected in favour of the second hypothesis: listeners are capable of holding an implied chord in STM, while
recognizing the next chord, and they evaluate the perceptual quality of the harmonic transition.
The pairwise differences between progressions were unexpected in one respect: The generally accepted fundamental
nature of I-V suggests that it would be rated highest of all progressions, yet it was rated lowest of the Category 1
progressions. Furthermore, the difference between I-V and V-IV did not reach significance. A tentative explanation
for this finding may be that I-V tends to be a too direct transition, whereas I- IV and IV-V tend to be more subtle,
and therefore perceptually preferred.
The results for the contour variable show that the sequences with the most simple contour structure (contour
category 1; parallel linear motion) are rated highest by far. The data show a tendency for a single linear motion in
the final triad to be judged slightly higher than when it appears in the initial triad, although this difference did not
reach statistical significance. Thus, the present results show that listeners are sensitive to contour information and
tend to rate the musical goodness of simpler contours higher than more complex contours.
The deviant effects for contour on the V-IV progression (as indicated by the statistical interaction between
progression and contour), may be interpreted as follows. Music perception can be seen as a process aiming at
constructing a suitable musical interpretation of the auditory signal, employing a number of perceptual mechanisms
to analyse the content of the input for musical features. The strategy that is followed seems to be solutionist rather
than perfectionist: the ultimate aim for the process is to produce an interpretation at all, and any mechanism
available to interpret the elements, will achieve that aim. In such a framework, mechanisms fitting well are most
likely preferred over mechanisms fitting less well. In this light, the larger effects for contour within the lowest rated
progression V-IV can be explained by the supposition that if an harmonic analysis is less appropriate, contour
analysis has a more pronounced influence on the responses.
The present results are in accordance with Schmuckler (1989), although his second experiment investigated
harmonic expectation rather than perception. Taking into account the fact that we used only 4 different progressions
instead of all possible chord changes, the absolute difference on a 7-point scale between Category 1 changes
(average of 4.45) and Category 2 changes (3.84; absolute difference of .61) is comparable to the difference found by
Schmuckler (Category 1: 4.92; Category 2: 4.25; difference of .67). In contrast to Schmuckler's study, ours does not
explicitly enforce listeners to evaluate a chord change as in a probe tone/chord study, but examines whether they
evaluate the progression spontaneously. Nevertheless, the agreement between our results and Schmuckler's suggests
that the perception of harmony in melodies follows the same rules as harmonic expectancy for melodies. It also
stresses the importance of the concept of expectation for harmonic factors in melody perception.
The present study might be criticized for its use of artificial stimuli, as a consequence of which its results do not
generalize to the process of perceiving real music. This would be a legitimate criticism, if the present study had
pretentions to generalize directly to music listening in everyday life. However, one should not confuse the object of
a study with its method. Of course, the major goal of our research is to examine the process of music listening as is,

but the experimental method is only exploited to its full potentials if the rules of conducting experimental research
are followed. In case this means: constructing musical stimuli in which the variables of interest are manipulated
categorically. Only such a method allows the systematic investigation of the hypothesized mechanisms of music
perception. Once the mechanism has been firmly established experimentally, predictions concerning the perception
of more realistic musical stimuli can be derived and tested.
In closing, the results of this study show that the construction of a mental representation of a melody is based on a
description of the sequential structure on the one hand, and a representation of the underlying harmony on the other.
This finding is in line with the notion that the perceptual analysis of musical information involves both general
auditory principles, which guide the process by grouping the elements and providing sequential structure
representations, as well as strictly musical principles which analyze harmonic structure in the process of listening to
music.
References
Bharucha, J.J. (1984). Anchoring effects in music: The resolution of dissonance. Cognitive Psychology,
16, 485-518.
Bigand, E. (1997). Perceiving musical stability: The effect of tonal structure, rhythm, and musical
expertise. Journal of Experimental Psychology: Human Perception and Performance, 23, 808-822.
Bigand, E., & Parncutt, R. (1999). Perceiving musical tension in long chord sequences. Psychological
Research, 62, 237-254.
Cuddy, L.L., Cohen, A.J. and Mewhort, D.J.K. (1981). Perception of structure in short melodic
sequences. Journal of Experimental Psychology: Human Perception and Performance, 7, 869-882.
Holleran, S., Jones, M.R., & Butler, D. (1995). Perceiving implied harmony: the influence of melodic
and harmonic context. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21,
737-753.
Jansen, E.L., & Povel, D.J. (1999). Mechanisms in the perception of accented melodic sequences.
Proceedings of the 1999 Conference of the Society for Music Perception and Cognition. Evanston, Ill.,
p. 29.
Krumhansl, C.L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126, 159-179.
Piston, W. (1973). Harmony (7th ed.). London, Victor Gollancz Ltd.
Platt, J.R., & Racine, R.J. (1994). Detection of implied harmony changes in triadic melodies. Music
Povel, D.J. (1996). Exploring the elementary harmonic forces in the tonal system. Psychological
Research, 58, 274-283.
Povel, D.J., & Van Egmond, R. (1993). The function of accompanying chords in the recognition of
melodic fragments. Music Perception, 11, 101-115.
Povel, D.J., & Jansen, E.L. (2000). Towards an on-line model of music perception. This proceedings.
Povel D. J., & Jansen, E.L. (1998). Perceptual Mechanisms in Music Perception. Internal Report NICI.
Schmuckler, M. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music
Thompson, W.F. (1993). Modeling perceived relationships between melody, harmony, and key.

Trainor, L.J., & Trehub, S.E. (1994). Key membership and implied harmony in Western tonal music:
Developmental perspectives. Perception and Psychophysics, 56, 125-132.
Back to index

(Effect of Aural Harmonics on Dissonance Perception)
Proceedings paper
[Editor's note: This paper contains a number of symbols and characters from a font set that we were unable to print or read. In most cases these are mathematical,
sound pressure level or Hz symbols and the meaning should be clear. Additionally some aspects of the figures, particularly graph axis labels, were unstable. We have
reproduced the paper as well as we are able but advise checking with the original document which is stored as originals\posters2\kwak.doc]
PERCEPTION OF TONAL DISSONANCE IN PURE-TONE COMPOUND INTERVALS

Sang Yeop Kwak and Suk Won Yi
Department of Musicology, Seoul National University
â... . Introduction
Most authors of today seem to acknowledge the statement, "no roughness is produced by pure-tone pairs exceeding critical band" which Terhardt(1974a) insisted. He indicated that V-shaped
curve presented by Plomp and Levelt(1965) and Kameoka and Kuriyagawa(1969a, 1969b) exhibits no singular points corresponding to simple frequency ratios. In addition, he said that "hence it
must be concluded that frequency distance rather than frequency ratio is the decisive parameter of the consonance of pure-tone intervals". However, there must be some corrections in Terhardt's
statement for the following five reasons:
1 Past experimental studies on the tonal consonance and dissonance were mostly limited to simple intervals;
2 In spite of pure-tone pairs exceeding critical bandwidth, roughness can be produced by aural harmonics(harmonic distortion) in mistuned consonance(1:2, 1:3);
3 The direct interaction model presented by Plomp's study(1967) on the beats of mistuned consonances needs to be corrected because it lacks the comprehension of the different meaning between
the threshold of pitch perception and of loudness perception;
4 As an appropriate traveling wave model on pure-tone pairs evoking roughness, composite model which is compromised between Plomp's direct interaction model and Clack's aural harmonics
model must be considered;
5 Aural harmonics are significant parameters in the perception of tonal dissonance.
â...¡. Tonal dissonance in pure-tone pairs

1. Review on the experimental results by Plomp and Levelt(1965)
The tonal dissonance of pure-tone pairs is closely related to the frequency difference between two pure tones. To clarify the assumption that the tonal dissonances of pure-tone pairs depend upon
frequency difference, Plomp and Levelt performed the following experiments.
Fig. 1. Plomp and Levelt's experimental results of tonal consonance and dissonance perception for a number of pure-tone pairs. The abscissa indicates the frequency difference of two pure tones,
and the ordinate indicates the degree of tonal consonance and dissonance based upon the 7-point scale. The musical notes and arrows above each graphs are by the author. (Adapted from Plomp &
Levelt, 1965.) (Critical bandwidth(CB): 125ãŽ•-CB 100ãŽ•, 250ãŽ•-CB 100ãŽ•, 500ãŽ•-CB 115, 1000ãŽ•-CB 160ãŽ•, 2000ãŽ•-CB 300ãŽ•)
file:///g|/poster2/Kwak.htm (1 of 12) [18/07/2000 00:34:53]

As predicted by Plomp and Levelt, the results in the graphs above might be shown as good findings reflecting their assumption that the tonal dissonances of pure-tone pairs depend upon the
frequency difference. However, if the fine aspects of the above graphs are considered, several questions can rise relating to such a conclusion.
â' Paradoxes of Plomp and Levelt's experiments
Their experiments adopted SPL(sound pressure level, unit: ã•ˆ) as intensity scale, one of several sound intensity scales. As a physical scale, SPL is different from loudness level(unit: phon) which
is a sensational scale. In the case of frequency difference 240ãŽ• in graph a, for example, the subjects come to hear the tone pair in which the higher tone is much louder than the lower tone. To
put it concretely, if the frequency difference of a tone pair holding the geometric mean frequency 125ãŽ• becomes 240ãŽ•, the frequency of lower tone is 53ãŽ• and higher tone, 293ãŽ•. Because
the intensity of tone pairs used in their experiments was 65ã•ˆ(SPL), consequently, the lower tone 53ãŽ• 65ã•ˆ(SPL) has about 35phons loudness level, and the higher tone 293ãŽ• 65ã•ˆ, about
62phons(Fastl, Jaroszewski, Schorer, and Zwicker, 1990). In this situation, if the loudness level of the lower tone, 35phons is increased to 62phons, the resulting degree of tonal dissonance ends
up increasing(Rakowski, Miskiewicz, and Rosciszewska, 1998).
The number of tone pairs used in each of their experiments was 12âˆ1/414 pairs. Considering the position of dots on the graphs, in the context of musical intervals, it shows that simple intervals
were adopted far more than compound intervals, i.e., the number of simple intervals were 10âˆ1/412 pairs while the compound intervals, 2âˆ1/44. Moreover, about a half of the total number of
tone pairs used in their experiments were simple intervals within the critical bandwidth. The logarithmic coordinates, which have the geometric mean frequency as a constant and the frequency
differences as a variable, are useful for comprehensively observing the aspects of consonance and dissonance between the tone pairs. However, as shown in Plomp and Levelt's graphs, the simple
interval-based logarithmic coordinates exaggeratively reveal only the results of tone pairs which have relatively small frequency differences. In the case of the tone pairs within critical bandwidth,
the aspect of results gets even more exaggerated. Based upon this reason, while the logarithmic coordinates by Plomp and Levelt are available for investigating the aspects of consonance and
dissonance between tone pairs within the critical band, they are invalid for considering the fine aspects between tone pairs which have relatively large frequency differences, e.g., the compound
intervals(sign of a quadrangle in the graphs).

â'¡ Graphs a(125ãŽ•) and e(2000ãŽ•)

The results of graphs a and e are well coincided with their assumption that the degree of dissonance is closely related to the frequency differences of pure-tone pairs. There are no singular points
on the solid lines. As the reasons of such fact, the following cue could be taken: the frequency and loudness relation between aural harmonics of the lower tone, nf1 and the higher tone f2.
Depending on the increase of the frequency differences, the higher tone f2 can be interfered with the larger multiple aural harmonics of the lower tone f1. However, to evoke the dissonance
sensation or a considerable roughness, it is required that the loudness levels of f1 and f2 have to be similar to some extent. In graph a, loudness level of the lower tone f1 gradually reduces
depending on the frequency decrease. Thus, the loudness level of its aural harmonics nf1 is reduced too. On the contrary, the loudness level of the higher tone f2 gradually increases depending on
the frequency increase. After all, in graph a, the difference between loudness level of nf1 and f2 gradually increases based upon the increase of frequency difference between f1 and f2, in which
roughness as the cue of dissonance sensation is difficult to be produced.
In graph e which has rather high geometric mean frequency, likewise, no aural harmonics of the lower tone produce a considerable interference with the higher tone f2. For example, in 1500ãŽ•
frequency difference on graph e, each of f1 and f2 is 1386ãŽ• and 2886ãŽ•. If it is possible to produce maximal roughness between 2886ãŽ•(f2) and 2772ãŽ•(2f1, i.e., second aural harmonic of
f1), the loudness levels of both tones have to be same. However, in graph e, while the loudness level of f2(2886ãŽ•) is about 70phons, that of aural harmonic 2f1(2772ãŽ•) is no more than
20phons(Clack, 1972; Fastl et al., 1990)
â'¢ Graphs b(250ãŽ•), c(500ãŽ•), and d(1000ãŽ•)
It seems that the aspects of graphs b, c, and d are slightly different from that of graphs a and e. In common, also in graphs b, c, and d, the monotonous V-shaped curve aspect where the frequency
differences of pure-tone pairs are within critical bandwidth, are broken after the point of frequency difference exceeding critical bandwidth.
In graph b, the first peak of consonance is about at 53ãŽ• frequency difference. At this point, the higher tone is about 278ãŽ• and the lower tone, 225ãŽ•(278/225≈ 5/4). Now, look at the point of
frequency difference 210ãŽ•. At this point, the higher tone is about 376ãŽ• and the lower tone, 166ãŽ•(376/166≈ 9/4). In terms of musical intervals, the former is major 3rd and the latter, major
9th(or major compound 2nd). Based upon the 7-point in graphs(ordinate), major 9th in the geometric mean frequency 250ãŽ• is more dissonant than major 3rd, though it has a larger frequency
difference. This aspect appears also in graph d.
In graph d, the first peak of consonance is about at 165ãŽ• frequency difference, in which the higher tone is 1086ãŽ• and the lower tone, 921ãŽ•(1086/921≈ 20/17). This tone pair is a slightly
narrower interval than minor 3rd(6/5), and its consonance value in the graph indicates about 6.5. Now, look at the point of frequency differences, 330ãŽ• and 660ãŽ•. In the frequency difference
330ãŽ•, each of the higher and lower tone is 1179ãŽ• and 849ãŽ•(1179/849≈ 7/5), and in 660ãŽ•, each tone is 1383ãŽ• and 723ãŽ•(1383/723≈ 15/8). Approximately, the former is augmented
4th and the latter major 7th, in which each of the consonance values is about 5.55 and 5.75. In the geometric mean frequency 1000ãŽ•, the consonance values for the frequency differences,
165ãŽ•, 330ãŽ•, and 660ãŽ• are in turn 6.5, 5.55, and 5.75., by which is shown that there is almost no relation between the aspects of tonal consonance and dissonance of the tone pairs exceeding
critical band and the frequency difference.
Also in graph c, it is shown that there is no relationship between the aspect of consonance and dissonance and the frequency difference after the frequency difference 105ãŽ•. If more number of
tone pairs exceeding critical bandwidth had been used in the experiments, such an aspect would have been shown more clearly. Anyhow, all of the graphs b, c, and d support that the V-shaped
curve aspect by Plomp and Levelt is valid only in the tone pairs within the critical bandwidth.
2. The results of Kameoka and Kuriyagawa(1969a, 1969b)

An extensive experimental study of tonal consonance and dissonance for pure tone and complex tone was performed by Kameoka and Kuriyagawa(1969a, 1969b). In their notable two papers,
there are several conclusions as follows:
-Dyads consisting of two partials with equal sound pressure level show simple V-curve characteristics.
-Over an octave separation, the interaction between two partials becomes negligibly small.
-For a given level difference, a dyad with a spectrum form (L1>L2) is more dissonant than that with its opposite form (L1<L2). This asymmetrical property is explained by the pure tone masking
and nervous response patterns.
â' V-curve
As shown in Fig. 1, the degree of dissonance for pure tone pairs decisively depends upon the frequency difference. Especially, the maximal dissonance of two pure tones was produced at the
frequency difference corresponding to the position of about a quarter of critical bandwidth. After this position, the consonance value smoothly increases and almost recovers at the frequency
difference corresponding to critical bandwidth. This aspect is also presented by Kameoka and Kuriyagawa(1969a, 1969b). However, there are some differences between Fig. 1 by Plomp and
Levelt(1965) and graphs by Kameoka and Kuriyagawa(1969a). These differences are very important for considering discrepancy between both of the authors' experiments. At first, in comparison
with the graphs by Plomp and Levelt(Fig. 1), the graphs by Kameoka and Kuriyagawa are more effective in investigating the aspect of consonance and dissonance between the pure-tone pairs
within the critical band. They adopted the frequency deviation rate as a parameter to acquire the perfect V-shaped curve, and in addition, applied the simple interval-based logarithmic scale to
their graphs. Thereby, it seems that they could systematically confute the 25% theory of the maximal dissonance by Plomp and Levelt(1965). According to Kameoka and Kuriyagawa, the most
dissonant frequency difference varies with the frequency range. It also varies with the sound pressure level of two pure tones.
Above all, however, one of the most interesting aspects in graphs between both of authors is found at the range of compound intervals whose aspects of consonance and dissonance are apparently
opposite each other. In graph by Kameoka and Kuriyagawa(1969a) the transition aspect of consonance and dissonance is indicated over three octaves. According to this graph, the maximal
dissonance in f1=440ãŽ• is produced at the interval consisting of f1 440ãŽ• and f2 484ãŽ•, where the frequency deviation rate is 10%. This result agrees well with that of graph e in Fig. 1.
However, compare the aspect of consonance and dissonance for compound intervals in graph by Kameoka and Kuriyagawa(1969a) with that of Fig. 1. In Kameoka and Kuriyagawa's graph, the
aspect of compound intervals is a V-shaped curve, whereas in Fig. 1, it is a Î›-shaped curve!
â'¡ The change of dissonance degree depending on the level difference between two pure tones
Through their extensive experimental study, Kameoka and Kuriyagawa proved the fact that dissonance degree can be systematically changed by the level difference between two pure tones.
When the level of lower tone(f1) is L1 and that of higher tone(f2), L2, the tone pair with a spectrum form L1>L2 is more dissonant than that with the opposite spectrum form L1<L2. With regard
to the reason why the spectrum form L1>L2 is more dissonant than the opposite form L1<L2, Kameoka and Kuriyagawa suggested that such an asymmetrical property is explained by the
pure-tone masking effect and nervous response patterns, but these cues seem to be somewhat obscure.
The magnitude of roughness produced between two pure tones, f1 and f2 attains to maximal value when the levels of f1 and f2 are same(Terhardt, 1974b). On the contrary, the conclusion by
Kameoka and Kuriyagawa shows that L1>L2 is more dissonant than L1=L2. This fact is in conflict with the existing experimental results on the relation between amplitude modulation and the
magnitude of roughness. However, if it is considered that two pure tones within critical band are not f1 and f2 but nf1(aural harmonics of f1) and f2, their conclusion is correct. The loudness
difference between nf1 and f2 in L1>L2 is smaller than that in L1=L2. If nf1 and f2 are within the critical band, a pure-tone f2 causes roughness through the interference(or intermodulation) with
not f1 but nf1. Therefore, with respect to the existing experimental results for the relationship between amplitude modulation and the magnitude of roughness, the magnitude of roughness between
nf1 and f2 in L1>L2 is larger than that in L1=L2.
3. The assumption of Terhardt(1974a, 1974b, 1976)

According to Terhardt's definition, psychoacoustic consonance is the undisturbed simultaneous sounding of pure tones(1974a). In this context, a pure-tone interval, perfect 5th at the lower
frequency range can be a psychoacoustic dissonance. On the contrary, a pure-tone interval, major 7th which is considered to be dissonant in music theory, can be considered as a consonance in the
concept of psychoacoustic consonance. In a pure-tone interval, major 7th, no roughness as a cue of tonal dissonance is produced since two pure tones exceed the critical band.
"In music theory, an interval with a more complex frequency ratio is considered to be dissonant even in the cases where no roughness is produced, e.g., in the case of a pure-tone pair with a frequency distance exceeding critical bandwidth"(1974a).
According to the critical band theory, roughness as a cue of psychoacoustic dissonance should not be produced at the pure-tone intervals exceeding critical bandwidth. However, as shown in Fig.
1, there is a certain systematic aspect of incomplete dissonance in spite of the compound interval exceeding the critical band. This is an indirect evidence for the fact that even when two pure
tones exceed the critical band, roughness can be produced by intermodulation with aural harmonics. The assumption of Terhardt(1974a) such as "in the cases of a pure-tone pair exceeding critical
bandwidth, no roughness is produced" could be right or wrong according to the following premises. If his assumption was involving adequate recognitions for the systematic existence of aural
harmonics which are physiological products, and for the assumption that an aural harmonic could be a pure tone, it is not a wrong assumption. However, if his assumption was overlooking the
existence of aural harmonics because of focusing only the maximal dissonance phenomena, it ends up to be incorrect assumption in which the systematic aspect of 'incomplete dissonance' e.g., for
pure-tone compound intervals is unfairly excluded.
â...¢. Experiments
1. Purpose
This study investigates the aspects of incomplete dissonance of pure-tone pairs, especially, pure-tone compound intervals(CIs) exceeding critical bandwidth. Because the past researches are
mostly limited to simple intervals(SIs), the experimental approach to pure-tone CIs is very suggestive. To verify the fact that aural harmonics affect the perception of incomplete tonal dissonance
in pure-tone CIs, it is essential to make a thorough investigation of the experimental results in both of the same and different loudness conditions for several intervals such as augmented 4th,
minor 6th, major 7th, minor 9th(compound 2nd), augmented 11th(compound 4th), and minor 13th(compound 6th).
2. Method
Through experiment 1(L1=55ã•ˆ, L2=55ã•ˆ) and 2(L1=55ã•ˆ, L2=40ã•ˆ), the total 3978 intervals were randomly presented to the 13 subjects(6 males, 7 females, average age=26.5). The subjects
were chosen regardless of the degree of musical training. The frequency of invariable lower tone f1 was 262ãŽ•, and the variable higher tone f2 included 24 tones which are in an equal
temperament scale within the distance of two octaves from the lower tone 262ãŽ•. Based upon 7-point scale, each subjects had to judge the degree of tonal consonance and dissonance of

presented pure-tone intervals. The total number of intervals presented to an individual subject was 306(experiment 1=153, experiment 2=153). The intervals were presented in mono to both ears
through the headphone and pure-tone stimuli were generated by a sine wave generator in E-synth(E-mu Systems). Each of the experiment 1 and 2 consists of three tasks varying in the interval
types. In task A, 13 SIs were randomly presented three times(total 39 SIs). In task B, 13 CIs were presented as in task A(total 39 CIs). In task C, 25 intervals within two octaves(the fundamental
262ãŽ•, C4) were presented in the same method(total 75 SIs&CIs).
3. Results
â' In the condition of L1=55ã•ˆ, L2=40ã•ˆ, augmented 11th(compound tritone) was perceived to be more dissonant than tritone(F(1,12)=14.74, P<.01).
â'¡ In L1=55ã•ˆ, L2=40ã•ˆ, minor 13th(minor compound 6th) was perceived to be more dissonant than minor 6th(F(1,12)=8.14, P<.025).
â'¢ Augmented 11th in L1=55ã•ˆ, L2=40ã•ˆ was perceived to be more dissonant than that in L1=55ã•ˆ, L2=55ã•ˆ(F(1,12)=9.90, P<.01).
â'£ Minor 13th in L1=55ã•ˆ, L2=40ã•ˆ was perceived to be more dissonant than that in L1=55ã•ˆ, L2=55ã•ˆ(F(1,12)=12.28, P<.01).
The above results are interesting in the historical context with relation to whether or not aural harmonics exist. Incomplete dissonance in the CIs can be discussed in parallel with the problem of
beat and roughness sensation in the mistuned consonances. Paradoxically, past studies for beats of mistuned consonances(Tonndorf, 1959; Plomp, 1967) suggest that beats can be produced by
pure-tone pairs exceeding the critical band depending on the frequency ratio between two pure tones. Because, physically, beat is closely related with roughness, a lengthy discussion for the
results â' , â'¡, â'¢, â'£ presented by this study would be inevitable when relating to the following three subjects: the origin of beats of mistuned consonances, Clack's experiments for aural
harmonics, and the aspects of traveling wave in basilar membrane by pure-tone pairs exceeding critical band.
â...£. Discussion
1. The origin of beats of mistuned consonances

Traditionally, the origin of beats of mistuned consonances has been considered to be aural harmonics produced in auditory system(Stevens & Davis, 1938). In 1967, however, Plomp argued that
aural harmonics are not the fundamental evidence for beats of mistuned consonances. In Plomp's paper(1967), three evidences are suggested as the reason for aural harmonics not being the origin
of beats of mistuned consonances.
Evidence (a): Beats are produced even in the presence of masking noise for aural harmonics and combination tones. In the experiments for mistuned consonances 200:301 and 500:1001, he presented the broad-band noise, which lacks the frequency bands
200-301ãŽ• and 500-1001ãŽ• individually, in order to exclude the possibility for beat by aural harmonics and combination tones.
Evidence (b): The subjects who could not identify aural harmonics in the test of aural harmonics audibility, could hear the beats of mistuned consonances. For example, in the test of aural harmonics audibility for 125ãŽ•(SL 65ã•ˆ), the subjects who could not
identify aural harmonics, 250, 375, 500ãŽ•, etc., could hear the beat produced between 125ãŽ•(SPL 100ã•ˆ) and 251ãŽ•(SPL 90ã•ˆ).
Evidence (c): Beats are produced even in the mistuned consonances, 5:9 and 4:9. Traditionally, beats of mistuned consonance types of 1:2, 1:3, 1:4,......1:n have been made out by aural harmonics. On the other hand, beats in 5:9 and 4:9 are difficult to be
explained by aural harmonics.
On the basis of these evidences, Plomp concluded that the origin of beats of mistuned consonances is not aural harmonics but a waveform variation(phase interference) by direct interaction
between two primary tones(Plomp, 1967, 1976). In the view of the direct interaction theory, it is presumable that roughness could be produced by phase effect depending upon the frequency ratio
even when two primary tones exceed the critical band. This fact contradicts with Plomp and Levelt's critical band theory(1965) and Terhardt's assumption(1974a) for the production of roughness,
because the generation of phase effect and roughness in pure-tone pairs was depending on the frequency ratio of two primary tones. Also Kameoka and Kuriyagawa's assumption, "for a given
level difference, a dyad with a spectrum form (L1>L2) is more dissonant than that with its opposite form (L1<L2)", lacks the consideration of roughness by phase effect depending on the
frequency ratio. If they were considering that beats of mistuned consonances have an origin of two tones interaction depending on the frequency ratio, their assumption would be more
complicated. Anyway, the studies of tonal dissonance sensation by all the authors above overlook the influences by phase effect or roughness by aural harmonics in pure-tone pairs exceeding the
critical band.
Apart from all arguments above, however, one of the purposes of this paper is not to give support to the direct interaction theory that phase effect resulted directly from two primary tones is an

origin of beats of mistuned consonances, but to reveal its limit and to indicate an error in Plomp's experiment(1967) for beats of mistuned consonances. Previously, three evidences for direct
interaction theory were mentioned as follows: (a) Beats are produced even in the presence of masking noise for aural harmonics and combination tones; (b) The subjects who could not identify
aural harmonics in the test of aural harmonics audibility, could hear the beats of mistuned consonances; (c) Beats are produced even in the mistuned consonances, 5:9 and 4:9.
Unlike the listening task of beat, the listening task of aural harmonics is an experiment in that subjects have to identify an individual pitch of aural harmonics separately. To identify an individual
pitch of aural harmonics, two different abilities are required. One is an analysis ability of spectral pitch and the other, a perceptibility of loudness. Generally, in the measurement of the threshold
of hearing for beats, only the perceptibility of loudness is required for subjects. On the contrary, in the measurement of the threshold of hearing for aural harmonics, both different perceptibilities
mentioned above are required for subjects. The subjects have to identify an individual pitch of aural harmonics before perceiving the loudness of an aural harmonic. Such an analysis ability of
spectral pitch is similar to a note-tracking ability by which it is possible to identify a particular tone among the 4-voiced tones accompanied by a piano. However, the note-tracking of an
individual pitch of aural harmonics is far more difficult than that of 4-voiced tones by a piano because, in the former case, the fundamental note f1 is considerably loud compared with any other
aural harmonics. Sensationally, loudness precedes pitch, i.e., the threshold of loudness perception is below that of pitch perception. If Ï‡ã•ˆLL(loudness level) is required for the perception of
minimal loudness in a certain tone, the threshold of pitch perception will become at least (Ï‡+Î±)ã•ˆLL(Î±>0) in the same tone.
First, Plomp's evidence (a) is nothing more than a clue suggesting the possibility of direct interaction. It can not be an evidence for the assumption that aural harmonics wouldn't be an origin of
beats of mistuned consonances. In his experiment, the level of masking noise was adjusted up to the level of aural harmonics. In the condition where a masking noise is presented, subjects are not
able to analyze spectral pitch. In spite of this situation, if the subjects could perceive the beats of mistuned consonances, it might be easily considered as that direct interaction is the only origin of
beats of mistuned consonances. However, such inference is incorrect. Beat is by an amplitude variation. In the same phase, loudness level by phase effect is maximal and in the opposite phase,
minimal. A loudness variation(beat) resulted from the phase relation between two tones can be produced by aural harmonics even if the aural harmonics are masked. Although the subjects
couldn't perceive the spectral pitch of aural harmonics masked by a broad-band noise, it is possible enough to perceive the loudness variation by phase effect resulted from the interaction between
aural harmonics nf1 and the higher tone f2, or aural harmonics nf1 and nf2. It is because the threshold of loudness perception is below that of pitch perception.
Second, Plomp's evidence (b) also can not be a proper evidence for direct interaction in the same reason. To perceive the spectral pitch of aural harmonics is very difficult in the condition when
modulation is present(McAdams, 1982). In Plomp's experiment, a modulation factor interrupting the perception of aural harmonics was beats of mistuned consonances. Strictly speaking,
perceiving beat is easier than perceiving an individual pitch of aural harmonics as if a dynamic image is more easily perceived than a static image.
Third, evidence (c) seems to be the most persuasive evidence among the three evidences because beats in 5:9 and 4:9 are difficult to be explained by aural harmonics. However, unfortunately,
beats in 5:9 and 4:9 were generated only at the lower tone 125ãŽ• in his experiment. None of the other cases generated beats in 5:9 and 4:9 mistuned consonances. The fact that beats in 5:9 and
4:9 mistuned consonances were produced only in 125ãŽ• would be an evidence for aural harmonics rather than direct interaction. In the lower tone 125ãŽ•(SPL 100ã•ˆ), 5th, 6th, and even 7th
aural harmonic can be perceived easily. In addition, the frequencies of aural harmonics in question, evoking beats of mistuned consonances 5:9 and 4:9, lie in the frequency band(500-4000ãŽ•) at
which we are sensitive. Virtually, it seems that beats of 5:9 and 4:9 mistuned consonances in the lower tone 125ãŽ• have a combined origin of direct interaction, aural harmonics, and combination
tones.
2. Clack's study of aural harmonics and phase effect

Aural harmonics have been called 'subjective tones' for a long time. Occasionally, they were even considered to be 'fictitious tones'. Many authors of today also regard aural harmonics as
'subjective tones'. On the contrary, authors such as Stevens and Newman(1938), Fletcher(1940), Bekesy(1972), and Clack(1967a, 1967b, 1968, 1971, 1972, 1975, 1977) are representative figures
who considered aural harmonics as existent biophysical tones produced by harmonic distortion effect in the inner-ear.
A number of papers on aural harmonics are published by Clack, T. D.. He argued that Plomp's direct interaction theory(1967) supported by Schubert(1969) cannot provide a proper explanation
for that the monaural phase effect in tone-on-tone masking was evident only in the case of 1:2(among the ratios, 1:1.5, 1:2, 1:2.5) when f1=1000ãŽ•(Clack, Erderich, and Knighton, 1972).
Clack's experimental results are very persuasive in respect of that phase effect by interference of two primary tones was evident only in the frequency ratio 1:n. In his advanced study in 1977, he
observed that also in a lower tone 500ãŽ•(f1), the monaural phase effect in tone-on-tone masking is produced at two higher tones 1000ãŽ• and 1500ãŽ•. His tone-on-tone masking paradigm was
useful to prove the existence of aural harmonics nf1. However, his explanation by the tone-on-tone masking paradigm does not seem to be faultless for revealing that the direct interaction does
not become the origin of the phase effect in the frequency ratio 1:n. It is because, in the Plomp's experiment(1967) on beats of mistuned consonances, beats were observed also in the case of 2:3
mistuned consonances(e.g., 125:189, 250:376, 500:751, 1000:1501) although, in Clack's experiment, phase effect in 1500ãŽ•(f2) was not evident when f1=1000ãŽ•. If Clack's assertion were to be
right, the reason why beats in 2:3 mistuned consonances were produced in Plomp's experiment should have been fairly elucidated in Clack's study. What is, indeed, the origin of beats in 2:3
mistuned consonances? Is it either direct interaction or aural harmonics? It is explained by the following.
According to Plomp's results, in the presence of the lower tone 1000ãŽ• at 90ã•ˆ, the sound pressure level(SPL) of the higher tone 1501ãŽ• which produced best beats was approximately 91ã•ˆ,

whereas SPL of the higher tone 2001ãŽ• produced best beats at 84ã•ˆ. In the 80ã•ˆ lower tone, SPL of the higher tone 1501ãŽ• which produced best beats was 74ã•ˆ, whereas in the higher tone
2001ãŽ• no beats were produced. According to Clack's results, on the contrary, in the presence of the 90ã•ˆ lower tone 1000ãŽ•, the higher tone 1500ãŽ• at 71ã•ˆ(masked threshold) did not
produce systematic phase effect, whereas the higher tone 2000ãŽ• at 66ã•ˆ(masked threshold) produced an evident phase effect. In the 80ã•ˆ lower tone, the higher tone 1500ãŽ• at 62ã•ˆ(masked
threshold) did not produce systematic phase effect, whereas the higher tone 2000ãŽ• at 58ã•ˆ(masked threshold) produced an evident phase effect. In both author's cases, the narrow-band noise
was presented to mask combination tones. Plomp's experimental results do not seem to be successful in rejecting aural harmonics theory in terms of the following reasons. First, he did not
measure the peak level of best beats. If the beat level of 1:2 mistuned consonance was higher than that of 2:3 in the results above, this could rather be an evidence for aural harmonics. Second,
masking noise for aural harmonics was not presented. In this condition, the best beat's level of mistuned consonances could be higher than that in the presence of masking noise for aural
harmonics. Also Clack's experimental results, when strictly speaking, do not only support aural harmonics theory. It is because a little phase interference between two primary tones in his
experiment was produced also in 2:3(1000:1500) although phase effects were not evident. After all, if both author's results are synthetically considered, the origin of beats of mistuned
consonances can be regarded as both direct interaction and aural harmonics.
3. Incomplete dissonance in pure-tone compound intervals

To investigate the incomplete dissonance aspect in pure-tone pairs exceeding the critical band, it needs to observe the results of the particular intervals; augmented 4th, minor 6th, major 7th,
minor 9th(compound 2nd), augmented 11th(compound 4th), and minor 13th(compound 6th).
Fig. 2. Average results for tonal dissonance values of simple intervals(SIs) and compound intervals(CIs) through experiment 1(L1=55ã•ˆ, L2=55ã•ˆ) and 2(L1=55ã•ˆ, L2=40ã•ˆ). The frequency of invariable lower tone f1 was 262ãŽ•. Tonal dissonance values of
SIs and CIs in experiment 1(L1=L2) are shown in graph a and that of experiment 2(L1>L2) in graph b. The shift of CIs' dissonance values in experiment 2 are observed in graph c. In graph d, the differences between the values of SIs in experiment 1 and CIs in
experiment 2 are shown. Especially, the differences of augmented 4th and 11th, and minor 6th and 13th are noticeable.

Four graphs in Fig. 2 show the dissonance aspects of SIs and CIs in experiment 1 and 2. First, observe the difference of dissonance aspects in L1=L2=55ã•ˆ(graph a) and in L1=55ã•ˆ,
L2=40ã•ˆ(graph b). The noticeable shifts are found at major 7th, minor 9th, augmented 11th, and minor 13th. In addition, it is found that augmented 11th and minor 13th are more dissonant than
augmented 4th and minor 6th in both graphs a and b. Especially, the differences of augmented 4th and 11th, and minor 6th and 13th are significant in graphs b and d.
An evidence for that the differences of augmented 4th and 11th, and minor 6th and 13th can be explained by both theories of direct interaction and aural harmonics is shown in graph c. The

findings that augmented 11th and minor 13th in experiment 2(L1>L2) were more dissonant than those of experiment 1(L1=L2), suggest that the more perceivable roughness could be generated by
the direct interaction and aural harmonics in the case of L1>L2.
Plomp presented an illustration of interference of the vibration patterns along the basilar membrane produced by two pure tones in his paper(1967). His model was shown as an example a in Fig.
3. However, when it is considered in view of such classical scholars as Stevens and Newman(1938), Fletcher(1940), Bekesy(1972) including Clack(1967a, 1967b, 1968, 1971, 1972, 1975, 1977),
a different model as an example b in Fig. 3 can be presented.
Fig. 3. Illustrations of the vibration patterns along the basilar membrane produced by a pure-tone interval, minor 9th. A direct interaction model presented by Plomp is shown in a, and an aural harmonics model in b. According to critical band model, Plomp's
direct interaction model contradicts his own critical band model in the cases of a higher intense lower tone and the low frequency range below 1000ãŽ•. Model d is a compromised model between the models of direct interaction and aural harmonics.
Now, we need to consider a problem related to the critical band model presented by Plomp and Levelt(1965). The V-shaped aspect is evident only in the relatively lower intensities, and have to be
limited only in the range of a single critical band. Paradoxically, an application of Plomp's critical band model in the higher intensities of a lower tone or for intervals exceeding critical band is
invalidated by his own direct interaction model. His direct interaction model predicts essentially not that "consonance and dissonance of pure-tone pairs depend on frequency difference rather
than frequency ratio(Terhardt, 1974a)" but that "consonance and dissonance of pure-tone pairs depend on both of the frequency difference and the frequency ratio".

An influence of aural harmonics on dissonance perception is not trivial. Particularly, it is more evident in CIs adjacent to 1:n harmonic intervals. If more CIs exceeding critical band were used in
the experiments of Plomp and Levelt, and Kameoka and Kuriyagawa, the Terhardt's assumption, "no roughness is produced in a pure-tone pair exceeding the critical band" could not be
accomplished. A direct interaction theory and an aural harmonics theory are not essentially in antagonistic relationship. Although aspects of incomplete tonal dissonance in CIs is variable
depending on a given level of two pure tones(especially, the lower tone) and frequency range, the results from this study for the lower tone 262ãŽ• at 55ã•ˆ are explained by a compromised model
as presented in model d.
â...¤. Conclusion
In this study, the aspects of incomplete dissonance of pure-tone pairs, especially, pure-tone compound intervals(CIs) exceeding critical bandwidth were investigated. The experimental results
suggest several evidences on the fact that V-shaped curve aspect as a basal psychoacoustic concept explaining dissonance mechanisms in pure-tone pairs must be applied only within a single
critical band. Furthermore, with regard to the fact that roughness can be produced even when both primary tones of pure-tone pairs exceed the critical band, this study argues that aural harmonics
can be crucial cues for the formation of dissonance sensation.
References
Allen, J. B.(1989). Is the basilar membrane tuning the same as neural tuning?, Cochlear mechanisms, ed. by J. P. Wilson, and D. T. Kemp, New York: Plenum Press, 453-460.
Allen, J. B., and Fahey, P. F.(1993). A second cochlear-frequency map that correlates distortion product and neural tuning measurements, J. Acoust. Soc. Am. 94, 809-816.
Bekesy, G.(1963). Hearing theory and complex sounds, J. Acoust. Soc. Am. 35, 588-601.
Bekesy, G.(1963). Three experiments concerned with pitch perception, J. Acoust. Soc. Am. 35, 602-606.
Bekesy, G.(1972). The missing fundamental and periodicity detection in hearing. J. Acoust. Soc. Am. 51, 631-637.
Clack, T. D.(1967a). Aural harmonics: The masking of a 2000ãŽ• tone by a sufficient 1000ãŽ• fundamental, J. Acoust. Soc. Am. 42, 751-758.
Clack, T. D.(1967b). Aural harmonics: Preliminary time-intensity relationships using the tone-on-tone masking technique, J. Acoust. Soc. Am. 43, 283-288.
Clack, T. D.(1968). Aural harmonics: Tone-on-tone masking at lower frequencies of a fundamental, J. Acoust. Soc. Am. 44, 384.
Clack, T. D., and Erdreich, J.(1971). Aural harmonics: A possible relation of loudness, J. Acoust. Soc. Am. 51, 113.
Clack, T. D., Erdreich, J., and Knighton, R. W.(1972). Aural harmonics: The monaural phase effects at 1500ãŽ•, 2000ãŽ•, and 2500ãŽ• observed in tone-on-tone masking when f1=1000ãŽ•, J. Acoust. Soc. Am. 52, 536-541.
Clack, T. D.(1975). Some influences of subjective tones in monaural tone-on-tone masking, J. Acoust. Soc. Am. 57, 172-180.
Clack, T. D.(1977). Growth of the second and third aural harmonics of 500ãŽ•, J. Acoust. Soc. Am. 62, 1060-1061.
Erdreich J., and Clack, T. D.(1971). Aural harmonics: A comparisoon of two model for the tone-on-tone paradigm, J. Acoust. Soc. Am. 51, 113.
Fastl, H., Jaroszewski, A., Schorer, E., and Zwicker, E.(1990). Equal loudness contours between 100 and 1000ãŽ• for 30, 50, and 70 phon, Acustica 70, 197-201.
Fletcher, H.(1940). Auditory patterns, Rev. Mod. Phy. 12, 47-65.
Giguere, C., Smoorenburg, G. F., and Kunov, H.(1997). The generation of psychoacoustic combination tones in relation to two-tone suppression effects in a computational model, J. Acoust. Soc. Am. 102, 2821-2830.
Kameoka, A., and Kuriyagawa, M.(1969a). Consonance theory part 1: Consonance of dyads, J. Acoust. Soc. Am. 45, 1451-1459.

Kameoka, A., and Kuriyagawa, M.(1969b). Consonance theory part 2: Consonance of complex tones and its calculation method, J. Acoust. Soc. Am. 45, 1460-1469.
Kim, D. O., Monlar, C. E., and Pfeiffer, R. R.(1973). A system of nonlinear differential equations modeling basilar-membrane motion, J. Acoust. Soc. Am. 54, 1517-1529.
Kwak, S. Y.(1998). A study of combination tone and dissonance sensation. Journal of Music Theory, Vol. 3. Seoul national University, Western Music Research Institute.
Letowski. T.(1975). Difference limen for nonlinear distortion in sine signal and musical sounds, Acustica 34, 106-110.
McAdams, S.(1982). Spectral fusion and the creation of auditory images, Music, mind, and brain, ed. by M. Clynes, New York: Plenum Press. 279-298.
Meddis, R., and O'Mard, L.(1997). A unitary model of pitch perception, J. Acoust. Soc. Am. 102, 1811-1820.
Noorden, L.(1982). Two channel pitch perception, Music, mind, and brain, ed. by M. Clynes, New York: Plenum Press. 251-269.
Plomp, R., and Levelt, W. J. M.(1965), Tonal consonance and critical bandwidth. J. Acoust. Soc. Am. 38, 548-560.
Plomp, R.(1967). Beats of mistuned consonances, J. Acoust. Soc. Am. 42, 462-474.
Plomp, R.(1976). Aspects of tone sensation, London: Academic Press.
Rakowski, A., Miskiewicz, A., and Rosciszewska, T.(1998). Roughness of two-tone complexes determined by absolute magnitude estimation, Proceedings of the 5th International Conference on Music Perception and Cognition, Seoul National University,
Western Music Research Institute. 95-100.
Schubert, E. D.(1969). On estimating aural harmonics, J. Acoust. Soc. Am. 45, 790-791.
Stevens, S. S., and Davis, H.(1938). Hearing: Its psychology and physiology, New York, American Institute of Physics.
Terhardt, E.(1974a). Pitch, consonance, and harmony, J. Acoust. Soc. Am. 55, 1061-1069.
Terhardt, E.(1974b). On the perception of periodic sound fluctuations(roughness), Acustica 30, 201-213.
Terhardt, E.(1976). Ein psychoakustisch begrundetes Konzept der Musikalischen Konsonanz, Acustica 36, 121-137
Tonndorf, J.(1959). Beats in cochlear models, J. Acoust. Soc. Am. 31, 608-619
Back to index

THE ROLE OF THE TEMPORAL ORDERING OF PITCHES IN TONAL ORGANIZATION
Proceedings paper
THE ROLE OF THE TEMPORAL ORDERING OF PITCHES

IN TONAL ORGANIZATION
Rie Matsunaga and Jun-ichi Abe
Hokkaido University
Tonal organization is an essential process for a listener to perceive a mere pitch string as a coherent melody. The process, which is constrained and led by the
tonal schema the listener has acquired, is to organize an input pitch sequence into a system of tonality that is woven around "a tonal center." For an input pitch
sequence to be organized, the tonal center of the input melody must be fixed in the mind, and vice versa. As a result of this process, the listener can perceive
tonality (or atonality) for a melody, and can perceive a melody to be in a certain key. Previous studies have shown that key perception is defined by structural
characteristics of the pitch contents of a set (Abe & Hoshino, 1990; Cuddy, 1997; Krumhansl, 1990). In other words, they argued that the perceptual
interpretation of a key of a melody would be strongly affected by the pitch contents of a set of constituent notes in the melody. Here, we would like to point
out the possibility that key perception also affected by the time-dependent characteristics of the pitches' ordering within the melody. In other words, we
assume that key perception can be affected by the temporal ordering of pitches. Consider, for example, the following two tone sequences: Both are composed
of the same set of five notes [C4, D4, E4, F4, and G4]. The only difference between the two sequences is the temporal ordering of the five notes; Sequence 1:
G4-E4-F4-D4-C4, Sequence 2: G4-C4-D4-E4-F4. When listening to the two sequences, it might be said that listeners perceive the first sequence as C-major and
the second as the F-major.
The purpose of this experiment is to investigate whether perceptual interpretation of a key of an input melody can be influenced by the temporal ordering of
pitches, and if this is the case, what kind of cues influence these key interpretations.
Method
Participants
Thirty-one undergraduate and graduate students of Hokkaido University (average age; 22.8 years; age range; 18-39 years) were participated in this experiment.
None of the participants was a professional or a serious amateur musician, although some had taken music lessons.
Materials
Eighteen tone sequences were prepared as stimulus materials. Every sequence was composed of the same set of six tones (pitch set). Absolute pitch of the six
file:///g|/poster2/Matsunag.htm (1 of 7) [18/07/2000 00:34:57]
tones were C#4, D4, E4, F#4, G4, and A4, respectively. All tones of the pitch set could only be interpreted as scale tones of D-major or b-minor, in accordance
with the traditional music theory of the Western diatonic scale. Only the same pitch set was experimentally manipulated. That is, the difference among the 18
sequences was the order of the constituent notes. Some pitch positions of the constituent notes in each sequence were assigned to either a higher octave or a
lower octave from each original pitch. The pitch range for each sequence was restricted to one octave. In other words, the highest and lowest pitches of each
sequence fell within one octave â€" but not necessarily the same octave (see Table 1). The duration of each tone was 1.0 s, so for total was 6.0 s.
Task
We adopted the "final-tone extrapolation" task (Abe & Hoshino, 1990; Hoshino & Abe, 1984), which was based on the assumption that selected the
final-tones were the tonic or the nuclear pitches of a certain key, as the task of the participants in this experiment. The assumption comes from the
characteristics of tonality, which are defined "the predominant of a certain tone, tonic, over others in a piece of music" (Abe & Hoshino, 1990).
Procedure
Each participant sat in front of a keyboard, and was presented with the sequences through a loudspeaker placed 1.0 m away. In each trial, each sequence was
presented three times. There was a 6.0 s interval between the presentations. The participants were told that they would hear a series of tone sequence. For each
one their task was to select a pitch that would be the best ending of the sequence. During and after the presentation of each sequence, the participants were
allowed to play tones on the keyboard to help them make their final-tone selection. They were required to select the final-tone from a printed list of the twelve
pitches in the octave (C, C#, D, D#, E, F, F#, G, Ab, A, Bb, B). Then, they were asked to rate the confidence of the selected final-tone on a 5 point scale (5 =
full confidence, 1 = poor confidence). When the participants felt that more than one pitch could be the best final-tone, they were allowed to select as many
pitches as they liked, but they had to rate the confidence of each selected pitch as the best final-tone. The timbre and dynamics of the keyboard sound was the
same as those of the sequences, and the tones of the keyboard played by the participants were sounded through the same loudspeaker. After 2 practice trials,
each participant performed 18 experimental trials in a random order.
Results and Discussions
In this experiment, the participants were allowed to select more than one pitch as the final-tone when they desired. In fact, the average number of the final-tone
responses of the participants was 1.06 tones per sequence. Twenty-five out of 31 participants always selected only one pitch as the final-tone throughout the
trials. Although the remaining 6 participants selected more than one pitch as the final-tone, they did not do so for every sequence. The largest number of
responses for one sequence was 3, which was seen in only one participant's response for that sequence.
Key interpretation
The number of the final-tone responses in each pitch category was counted. We adopted the coefficient of concentration of selection (CCS: Iwahara, 1964) as a
measurement to show the degree to which the final-tone responses for each sequence concentrate within a few specific pitch categories. The CCS is calculated
by the equation: CCS . Here, K is the number of response categories, that is, the number of pitches in the octave (= 12). N is the total
number of the final-tone responses, which might vary in each sequence.
Table 1 shows the results of the final-tone responses. The first column indicates the 18 stimulus tone sequences ordered in accordance with the values of CCS
for each sequence. The second column indicates the temporal orderings of the six tones for each sequence in the absolute pitch notation and in the interval
notation. The third column indicates the numbers of the final-tone responses selected in each of 12 pitch categories by the participants. The fourth column

indicates CCS values for each sequence. The fifth column indicates the mean values of the confidence rating for the final-tone responses.

As seen in the third column of Table 1, pitch D was the most commonly selected as the final-tone over all the sequences. In particular pitch D was chosen the
most often for S01, S02, S03, S04, S05, and S06, all of which also had high CCS values. For S07, S08 and S09, the number of pitch D responses was more or
less equivalent to the number of pitch F# responses. The final-tone responses were dispersed over several pitches for the remaining sequences, all of which had
the relatively low CCS values. None of the participants selected pitch B or pitch D# as the final-tone for any of the sequences. Chi-square analysis was
performed for the distribution of the final-tone responses. The chi-square analysis revealed that the difference of the distribution of responses among the 18
sequences was significant ( Ï‡2 = 114.58, df = 187, p < .01). These results suggest that the participantsâ€(tm) key interpretation varied with each sequence, all
of which consisted of the same set of pitches, but differed in their temporal ordering. This implies that key perception depends not only on the pitch contents
of a set, but also on the temporal ordering of pitches.
For the confidence rating scores (Table 1, fifth column), a one-way analysis of variance was performed. The ANOVA indicated that the difference between the
confidence ratings for each sequence was not significant (F (17.510) = 1.02, ns). The mean of the confidence rating scores was 3.47. The correlation between
the confidence ratings and the CCS values was significant (r = 0.67, p < .005). These results imply that the participants could feel somewhat above the middle
degree of tonality for all the sequence, but a little difference existed between the sequences.
What kind of cues influence key interpretations?
As mentioned above, the results showed that the final-tone response could vary across melodies which consisted of the same pitch set but differed in the
temporal ordering of pitches. Nevertheless, the participants selected pitch D most often, as the final-tone over several sequences (S01 to S06). These results
suggest that temporal ordering itself is not the sole cue for guiding key interpretation. The more specific local characteristics of pitch ordering within a melody
may also have an effect. For example, if a set of sequences differ in pitch order, but have one or more interval relationships between pitches in common,
listeners may identify them as belonging to the same key.
It is known that tonal organization is not derived from absolute pitch information in a melody, but from relative pitch relationships in a melody (Abe, 1987;
Bartlett & Dowling, 1980; Dowling, 1986). Based on this view, we will focus on interval relationships in melodies, and examine whether the existence of
specific intervals in melodies might lead listeners to make specific key interpretations.
In this paper, we denote intervals of the tone sequences by positive integers for ascending intervals and by negative integers for descending intervals (one unit
= a half-step interval). For example, the ascending minor second is denoted as (+1), the descending perfect fourth as (-5), and so on. As seen in Table 1, for
example, S01 is denoted in absolute pitch notation: D4 - F#4 - A4 - G4 - E4 - C#4, and in interval notation: (+4) - (+3) - (-2) - (-3) - (-3).
Twelve intervals were used in the 18 tone sequences for this experiment (+1, -1, +2, -2, +3, -3, +4, -4, +5, -5, +6, and -6). No other intervals were included.
We examined the frequency of occurrence of similar interval relations in sequences, for which participantsâ€(tm) responses were also similar (see Table 2).

First, we examined the 6 sequences (S01, S02, S03, S04, S05, and S06) for which pitch D was the most common response. There were no intervals which
were common to all the 6 sequences. However, in 5 out of the 6 sequences, intervals (-5) and (+3) occurred and in 4 out of the 6 sequences, intervals (+4),
(-3), and (-2) occurred. Intervals (Â±6) and (Â±1), on the other hand, were not contained in any of these 6 sequences.
We then focused on the frequency of each interval relationship across the all 18 sequences. The interval (-5) existed in 11 out of the 18 sequences including
those for which the responses were not pitch D (those with low CCS values). Therefore, it is unlikely that interval (-5) is a strong cue influencing key
interpretation in these sequences. Similarly, intervals (+3), (-3), and (-2) are also unlikely candidates as they were found in many sequences with low CCS
values.
As for the remaining interval (+4), it was found in 5 out of 18 sequences. Four of these 5 sequences (S01, S04, S05, and S06) had the highest CCS values for
pitch D. For the remaining sequence (S11), the number of pitch D responses was more or less equivalent to those of pitch E and A and accordingly it had low
CCS values. In the former 4 sequences (S01, S04, S05, and S06), the interval (+4) consisted of pitch D4 and pitch F#4 (D4-F#4). In the latter one (S11), on the
other hand, the interval (+4) consisted of pitch A4 and pitch C#4 (A4-C#4). Thus, this interval relationship of a major third could have led listeners to
recognize the lower and the first pitch of the interval as the tonic or key center of these sequences.
Interval (+4), i.e., the major third, is considered an important interval for perceiving the tonality of a melody (Krumhansl, 1979, 1990). The results of this
experiment suggest that the presence of interval (+4) in a melody had a strong influence on listenersâ€(tm) tendency to identify the lower interval pitch as the
tonic of the key. Thus it seems interval (+4) is one of strong cues for guiding key interpretation.
By the way, were there any common intervals in the sequences with low CCS values? There were no common interval relationships that occurred in the 9
sequences with the lowest CCS values (see Table2). In 5 out of these 9 sequences, intervals (-5) and (-2) occurred and in 4 out of these 9 sequences, intervals
(+5), (-4), (-3), and (-1) occurred.
There were four intervals, which were found in those 9 sequences but did not occur in the above mentioned 6 sequences with the highest CCS values: (Â±6)
and (Â±1). Interval (+6) occurred in only one sequence (S13). Similarly, interval (-6) only occurred in S11 and S18. Interval (+1) occurred in S08, S10, S16,
and S17. Interval (-1) occurred in S09, S11, S14, S15, and S18. These results suggest that intervals (Â±6) and (Â±1) can be negative cues for guiding key
interpretation. Butler and Brown (1984, 1994) argued, based on Browne (1981), that these intervals are heard less frequently than other intervals in Western
music, and so do function from their rareness, as strong cues for guiding key interpretation.
In this experiment, the sequence materials did not contain intervals (Â±7), (Â±8), and so on. Moreover, a uniform occurrence of each interval was not
manipulated in the material sequences. Further research including a wider range of intervals is required to obtain a more detailed picture of the influence of
interval relationships on key interpretation.
References
Abe, J. (1987). Senritsu wa ikani shori sareruka [How is a melody processed?]. In G, Hatano (Ed.), Ongaku to ninchi (pp. 41-68). Tokyo: Tokyo
University Press.
Abe, J., & Hoshino, E. (1990). Schema driven properties in melody cognition: Experiments on final-tone extrapolation by music experts.
Bartlett, J. C., & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance effect in development perspective. Journal of

Browne, R. (1981). Tonal implications of the diatonic set. In Theory Only, 5, 3-21.
Butler, D., & Brown, H. (1984). Tonal structure versus functionï1/4š Studies of the recognition of harmonic motion. Journal of Experimental
Psychology: General, 113, 394-412.
Butler, D., & Brown, H. (1994). Describing the mental representation of tonality in music. In R. Aiello and J. A. Sloboda (Eds.), Musical perceptions.
Oxford, NY: Oxford University Press. pp. 191-212
Cuddy, L. L., (1997). Tonal relation. In I. Deliège and J. A. Sloboda (Eds.), Perception and cognition of music. Hove, UK: Psychology Press. pp.
329-352
Dowling, W. J. (1986). Context effects on melody recognition: Scale-steps versus interval representation. Music Perception, 3, 281-296.
Hoshino, E., & Abe, J. (1984). Merodi ninchi ni okeru "chosei-kan" to shushi-on doshutsu ["Tonality" and final-tone extrapolation in melody
cognition]. The Japanese Journal of Psychology, 54, 344-350.
Krumhansl, C. L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive Psychology, 11, 346-374.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitches. Oxford, NY: Oxford University Press.
Back to index

Mark G. Orr
RECOGNITION MEMORY FOR MELODIES: IMPLICIT LEARNING, EXPERTISE AND MUSICAL

STRUCTURE
Mark G. Orr
morr@uic.edu
Background:
Past research suggests that experts' superior memory performance is domain

specific. This is due to the experts' ability to chunk the material. However,
the literature lacks any evidence whether such training benefits exist when the
encoding is implicit, i.e., when the chunking has been acquired
unintentionally.
Aims:
This study investigated whether musical training would facilitate memory when
encoding is implicit.
method:
Ten musicians (M = 19 years training) and 31 nonmusicians participated. First,

subjects rated the pleasantness of 10 short melodies (SET A). Following this,
subjects indicated which of the 20 melodies in SET B were from SET A. Subjects
either heard random melodies for both sets or they heard simple, music-like,
melodies (meaningful melodies) for both sets.
Results:
For each subject, d prime was calculated as a measure of correct recognition. A

two-way between subjects Analysis of Variance comparing the joint and
interactive effects of expertise and meaningfulness on recognition accuracy
yielded significant main effects for both expertise, F (1, 37) = 6.59, p < .05
and meaningfulness of melodies (meaningful or random) F (1, 37) = 16.86, p <
.05. Musicians ( M = 1.95) recognized melodies more accurately than
nonmusicians (M = 1.15).
Moreover, meaningful melodies (M = 1.74) were recognized more accurately than

nonmeaningful melodies (M = 0.72). These main effects were not qualified by an
interaction. The interaction, however, approached significance, F (1, 37) =
3.07, p < .088. Further analysis indicated that musicians only outperformed the
nonmusicians for the meaningful melodies, F (1, 37) = 10.87, p < .01.
Conclusions:
Even when the encoding is implicit, musical training affords better encoding of
relevant stimuli as evidenced by the experts' superior recognition. This has
implications for chunking as a mechanism of expert performance in general.
Back to index
file:///g|/poster2/Orr.htm [18/07/2000 00:34:58]

Difference Limen Variations Among Homogeneous Groups of University Musicians
Proceedings paper

Olin G. Parker, Professor Emeritus, School of Music, University of Georgia, Athens, GA 30602, U.S.A
Ever since that epochal discovery made some 25 centuries ago by a man from Samos, Pythagoras, musical
sound has been an interdisiplinary study in that its essence has embraced music, aesthetics, physics,
psychology, physiology, neurology and architecture. A proliferation of books and treatises on the physics
of music has provided humanity a substantive and objective base for musical sounds. these writings have
covered a wide range of topics such as: vibrating columns of air and accompanying idiosyncrasies of the
wind instruments; vibrating strings and the factors that govern resonance/tone quality; profuse
explications of the human auditory system, both subjectively and objectively; vibrating characteristics of
bars, plates, and membranes; from pure tones to electric synthesizers; and, from the phenomena of the
behavior of sound waves to the characteristics of perceived acoustics.
Introduction
From the many facets of psychoacoustics mentioned above, this study will investigate one aspect of the
human auditory musical functioning which is most germane to the human's musicality, viz., discernment of
pitch differences. One component of this desired musical attribute, difference limen (DL) acuity, is
defined as an amount by which a stimulus must change in the appropriate physical property in order for an
observer to detect a difference in sensation a certain criterion percentage of the time.' (Radocy & Boyle,
1997, p. 76) Perception of discreet differences in the frequencies of two separately generated tones is
one of the important characteristics of musical behavior. Many forms of musical behavior that are
fundamental to successful performance or perception require the individual musician to possess a
relatively high level of pitch discrimination ability (Sergeant and Boyle, 1980).
This investigation was concerned with whether pitch acuity abilities (discernment of frequency
differences--difference limens), not actually a requirement of pianists' performances, is a concomitant of
their development, as it must be for saxophonists, trombonists, violinists, and vocalists. In comparison
to a group of vocalists, each of whom has to make physical adjustments as a prime requisite in their
performances, violinists, and trombonists to a lesser extent, and saxophonists to a still lesser degree,
do pianists learn this musical attribute simply through maturational experiences?
Why such an investigation? Two reasons are immediately apparent. First, we need to know if the development
of pitch discernment ability must be separately emphasized in the pedagogical processes. Second, a
relatively high pitch discrimination ability, and its development, exemplifies many forms of musical
behavior that are critical to successful cognition or performance of western music. This ability to
differentiate between tones of different frequencies continues to be acknowledged as prime in that most
researchers have included a form of measure of pitch judgment among their tests and various facets of
research.
Related Literature
As with numerous research reports (Bruner, 1984, and Sloboda, 1985), this study is involved, as in most
naturally occurring instances, with the subject's perceived pitch of the fundamental frequency of each of
the stimuli. Whether pure tones or complex tones, it is 'normal hearing' for the human's basilar membrane
and the higher nerve centers, responding in consort, to identify the fundamental pitch of each stimulus
(Bekesy, 1960). Further, . . . training has a great effect on what is perceived. . . . the experience of
every teacher of musicianship--has demonstrated a marked difference between the responses of trained
musicians and those of other listeners.' (Bruner, 1984, p. 39) Discerning differences in fundamental
pitches, and the sharpening of this ability then, was the salient motivation of this study.
Two-frequency dyads can be discerned as a musical stimulus or the listener can hear two tones with pitches
corresponding to the individual frequencies of the dyad if the difference in the frequencies is not tuned
too small . (Smoorenburg, 1970)
Roederer (1975) says this is due to the ability of the cochea to extricate the frequency components from a
complex vibration pattern. A single vibration pattern at the oval window gives rise to two resonance
regions of the basilar membrane. If the frequency difference between the two component tones is large
enough, the corresponding resonance regions are sufficiently separated from each other. Each one
oscillates with a frequency corresponding to the component tone. If the frequency difference is smaller
than a certain amount (the DL), the resonance region overlaps and only one tone of intermediate pitch with
modulated or 'beating' loudness is heard. It is to be remembered that the difference limen (DL) of a
hearing sensation is the difference in frequencies which will give rise to a perception of two different
pitches in one half of the total number of trials. In this connection the pitch sub-test of the well-known
Seashore battery includes intervals down to six cents.
Leipp (1977) reported that 50 per cent of the students in the Conservatoire de Paris were able to
discriminate intervals of four cents and Rakowski (1977) observed some students at the Academy of Music in
Warsaw who could discriminate intervals with two cents differential. Meyer (1978) reported similar results
but cautioned that each musician's discrimination range varied according to timbres. In the foregoing,
discernment of frequency differences (of the fundamentals) was better than in studies such as the present
one due to the fact that musical sounds present a timbre identification element not present when the
stimuli are audio generated.
file:///g|/poster2/Parker.htm (1 of 5) [18/07/2000 00:35:00]

Encompassing the frequency ranges utilized in orchestral music, when a number of pitch judgments are
averaged, the smallest DL is generally 30 cents for most trained musicians. According to several authors
(Thurlow and Bernstein, 1957, & Plomp, 1964), the auditory separation of two simultaneous sounding
frequency tones in most musical frequency ranges may be accomplished only when the interval is between the
simultaneous sounding frequencies is not smaller than a semitone. Lundin (1985) reports that the average
person has a DL of plus or minus three cycles when the reference frequency is 435 Hz. There are many
reports which emphasize that individuals differ in their ability to discriminate differences in
frequencies--that these limits vary considerably from individual to individual, dependent on the occasion
and the frequency range (Roederer, 1975, and Radocy & Boyle, 1997). In a previous study of this nature,
Parker (1983) found no significant differences in the competencies of a group of pianists and a group of
trombonists to discern difference limens.
In the early years of a person's musical training, dependent on the musical medium, accuracy of pitch
discernment competency attainment is either paramount, or not. Because saxophonists, trombonists,
violinists and vocalists must ultimately pay constant attention to pitch and attend to very small degrees
of variances in pitch, it would seem that their acuity for frequency discernment within their instruments'
(the human voice is considered an instrument) range will be more discriminative than the acuity of
pianists, who are not required to pay constant attention to pitch. Pianists obviously are concerned, in
performance on a given piano, indirectly with the pitch variances as 'built-in' to the equal-tempered
scale. Consequently, although the other three musical elements (loudness, timbre, and time) are of
constant musical concern to all musicians, this study will investigate only the music element of pitch and
its discernment competency-level in respective groups of pianists, saxophonists, trombonists, violinists,
and vocalists.
Hypotheses
Ho -- There is no difference in the reports of difference limens (DL's) of respective groups of pianists,
saxophonists, trombonists, violinists, and vocalists.
H1 -- There are differences in the reports of difference limens (DL's) of respective groups of pianists,
saxophonists, trombonists, violinists, and vocalists.
Experimental Design
Subjects
The 75 subjects were university students--15 each in groups of pianists, saxophonists, trombonists,
violinists, and vocalists, all with ontologically normal hearing. The criteria for the pianists were that
each pianist did not play any other instrument in an instrumental performing organization and that they be
registered for applied piano lessons. The criterium for each of the subjects in the other groups was that
they be registered for applied music lessons on their designated instruments.
Apparatus and Utilization
The Johnson Intonation Trainer was used exclusively to provide the frequency signals for the experiment.
This is a tunable keyboard with two tunable sections, each ranged from C3 (130.81 Hz) to C6 (1046.48 Hz).
Each note on the keyboard is capable of being tuned up or down approximately a major third from its usual
equal tempored scale pitch. A Stroboconn Model 6T5 was used to set and check each frequency chosen.
Amplification of the signals was provided by a Technics SU-V7 Amplifier and a pair of JBL 4301B Studio
Monitors. Loudness preferences were set by each subject.
Procedure
The timbre on the Johnson Intonation Trainer chosen for this study was the 'flute' setting. Because of the
limited range of this instrument, the fact that tuning any note tunes the octaves as well, and in order to
get the maximum number of pairs of notes, it was necessary to use only the notes A, C, and E as the basic
pitches. This provided for A's at 220 Hz, 440 Hz, and 880 Hz; C's at 130.82 Hz, 261.26, 659.24 Hz, and
1046.48 Hz; and, E's at 164.8 Hz, and 329.62 Hz. A total of 60 dyads were available. Each stimulus was a
combination of the chosen fixed frequency and a higher frequency designated randomly in the dyads, ranging
from the fixed frequency to a frequency 100 cents higher, in increments of 10 cents. (See appendix B) It
should be noted here that the variances used in constructing the dyads were listed in a random fashion so
that the subjects were unable to assume a predictiveness about the next stimulus. Also, the two
frequencies making up each dyad stimulus were sounded simultaneously.
Prior to the presentation of the 60 stimuli, the experiment was explained to each subject individually.
This included a discussion of frequencies, two instruments playing together, tuning and the production of
beats. This was illustrated aurally by the use of the Johnson Intonation Trainer and visually by the
Stroboconn. When each subject indicated that each facet was understood, the subject was asked to listen to
a perfect unison using two notes on the keyboard. One of these notes was then detuned gradually sharpuntil
the subject indicated that a second pitch was audible rather than being a unison with beats. The
Stroboconn was used to check how far sharp the upper note had been tuned. Next, two notes were tuned 100
cents apart; i.e., a half step. These two tones were played together and the upper one detuned flat until
the subject indicated that one unison with beats was heard rather than the two distinct pitches. Again the
Stoboconn was used to measure this difference.
When the subject had indicated that the procedure was understood, the 60 stimuli were performed by the
administrator at the keyboard of the Johnson Intonation Trainer. The duration of each stimulus was
normally two seconds with an equal amount of silence between. The subject was asked to write '1' or '2' in
the appropriate blank indicating whether the stimulus was heard as one pitch or two pitches. The time was
lengthened if the subject needed more time on each stimulus. Repeats were all permitted since the time
factor was not crucial.

Results
Processing of the raw data was done with two procedures to both obtain and check x2 (Chi-Square) values.
First, the formula
x2 = z (O - E)2
was utilized, computing on the basis of the experiment being a one-sample test.
x2 was found to be 2.21. The tabled value of 3.84 > 2.21. (P < .05) indicated failure to reject Ho.
Second, the formula
x2 = N ( /AD - BC/ - N/2)2
(A+B) (C+D) (A+C) (B+D)
was utilized, computing on the basis of the experiment being based on two independent samples (the
pianists being one sample and all other subjects being another sample). x2 was found to be 1.04. The
tabled value of 3.84 > 1.04. (P > .05)indicated failure to reject Ho. (Madsen and Moore, 1997)
Though the statistical treatment of the data failed to indicate differences in the groups' difference
limens, the raw data do reveal some cognitive differences in the groups, albeit a lack of significance at
the .05 confidence level. They are:
Pianists Saxophonists Trombonists Violinists Vocalists
X 28 38 34 29 31
S.D. 6.3 8.2 4.69 6.1 7.1
R 17 22 18 17 18
(X = Mean Score S.D. = Standard Deviation R = Range)
Summary and Conclusions
No significant differences were found, statistically, in the difference limens of groups of pianists,
saxophonists, trombonists, violinists, and voclists respectively. the comparison of other data, as shown
above, does seem to indicate that pianists and violinists, as groups, perceive two tones in each stimulus
dyad at a smaller DL than do the saxophonists, trombonists, and vocalists, as groups. A saxophonist made
the most erros in pitch discernment and a pianist score the most accuracies (i.e., a DL); taken as groups,
the violinists and saxophonists were the most accurate, the trombonists rather 'in the middle,' and the
vocalists and pianists, as groups, scored lowest in accuracies of responses. In comparative
psychoacoustical terminology, in cents:
Pianists Saxophonists Trombonists Violinists Vocalists
Dl* 39 28 31 22 34
*Difference Limens
Due to the nature of their musical mediums, saxophonists, trombonists, violinists, and vocalists have to
give constant attention to adjustment for pitch accuracies, whereas pianists do not have this concern. It
would seem that there would be significant differences in the groups' frequency discernment abilities.
Since there is not (at least in this investigation), at least three questions are elicited. One, is it
that the pianists, through listening as an adjunct to their acquirement of psychomotor skills learn pitch
acuities indirectly? Two, is it because they, probably as a group, have begun their music studies at an
earlier age than is customary for the other groups--somewhat validating the theories regarding critical
stages of learning? Three, is it likely that pianists, practicing and performing simultaneously sounding
tones (whereas the other four groups' subjects perform tones sequentially), are learning from the
contextual musical occurrences?
Recommendations for future research would include a similar type of study with the stimuli being complex
tones produced by actual orchestral instruments. Heterogeneous and homogeneous mixtures of timbres would
be an added dimension. This could be a pseudo replication of a study reported by Geringer (1989), in which
he reported '. . . non-majors tended to have . . . more correct discriminations to quality than to
frequency stimuli.' (p. 35) Another investigation should be made as to the importance of the maturational
influence by having groupings according to sex, age, intelligence, or scores'classifications obtained from
the subjects taking a standardized music test.

Discussion
The implications of a study such as this, producing data which indicates the usual requisite results of
several years of continuous music study by the 75 music majors is that the pedagogical emphases are not
adequate or properly prioritized. The measurements acquired in this study, in relation to pitch
recognition competencies indicated in related literature, are about the norm. However, the results
produced by these subjects will not suffice for the 21st century. Music educators must prepare to produce
musical sophisticates, both consumers and performers, with developed pitch acuities that will lower the
normal DL from circa 30 cents to less than 20 cents. This degree of acuity is possessed by many,
especially in other cultures where use of the quarter-tone interval is common.
Finally, this author does not wish to leave the impression that music education has not been effective.
Lehman (1985) has said that 'the level of music teaching in the schools has never been higher . . .
(college/conservatory) freshmen play better and know more about music than the graduate students did 20
years ago . . . due simply to the magnificently successful efforts of the (educational psychologists) and
music educators.' (p. 12)
References
Bekesy, G. (E. G. Weaver, Ed. and Trans.) (1960). Auditory thresholds. Experiments in hearing. New York:
McGraw-Hill Book Company.
Bruner, C. L. (1984). The perception of contemporary pitch structures. Music Perception, 2(1), 25-40.
Frances, R. (1988). Psychological origins and development of the sense of tonality. The perception of
music___®. In W. J. Dowling (Trans.). Hillsdale, NY: Lawrence Erlbaum Associates, Publishers.
Geringer, J., & Madsen, C. K. (1989). Pitch and tone quality discrimination. Canadian Music Educator,
Research Edition (Special Supplement), 29-38.
Hall, E. E. (1980). Musical acoustics: An introduction. Belmont, CA: Wadsworth Publishing Company.
Hodges, D. A. (1996). Neuromusical research: A review of the literature. In D. A. Hodges (Ed.), Handbook
of music psychology (2nd ed., pp. 197-284). San Antonio, TX: IMR Press.
Leipp, E. (1977). La machine a ecouter. Paris, France.
Lehman, P. R. (1985). What's right with music education. Georgia Music News, 45(3), 10-12.
Lundin, R. W. (1985). An objective psychology of music (3rd ed.). Malabar, FL:
Robert E. Krieger Publishing Company.
Madsen, C. K., & Moore, R. A. (1997). Experimental research in music: Workbook in design and statistical
tests (2nd ed.). Raleigh, N___'C: Contemporary Press.
Meyer, J. (1978) The dependence of pitch on harmonic sound spectra. Psychology of Music, 6(1), 3-12.
Moore, B. C. J. (1973). Frequency difference limens for short-duration tones. Journal of the Acoustical
Society of America, 54, 610.
Parker, O. G. (1983). Quantitative differences in frequency perceptions by violinists, pianists, and

trombonists. Bulletin of the Council for Research in Music Education, 76, 49-57.
Plomp, R. (1964). The ear as a frequency analyzer. Journal of the Acoustical Society of America, 36,
1628-1636.
Pratt, C. C. (1928). Quarter-tone music. Journal of Genetic Psychology, 35,288.

Radocy, R. E., & Boyle, J. D. (1997). Psychoacoustical foundations. Psychological foundations of musical
behavior (3rd ed.). Springfield, IL: Charles C. Thomas.
Rakowski, A. (1977). Memory for absolute and relative pitch. Paris: Symp. Psychoacoustique Musicale.
Roederer, J. G. (1975). Introduction to the physi___¦cs and psychophysics of music.New York:
Springer-Verlag.
Serafine, M. L. (1983). Cognitive process in music: Discoveries vs. definitions. Bulletin of the Council
for Research in Music Education, 73, 1-14.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability (2nd ed.). New York: Methuen &
Co.
Sloboda, J. A. (1985). Categorical perception of frequency. The musical mind. Oxford: Clarendon Press.
Smoorenburg, G. F. (1970). Pitch perception of two-frequency stimuli. Journal of the Acoustical Society of
America, 48, 924.
Thurlow, W. R., & Bernstein, S. (1957). Simultaneous two-tone pitch discrimination. Journal of the
Acoustical Society of America, 29, 515-519.
Appendix A

To the best of your knowledge, do you have 'normal' hearing? ____Yes. ____No.
(If 'no,' briefly describe your deviciency.)
(Explanation of test administrator with illustrations of how stimuli are to be presented.)
Practice exercises: 'You will hear two notes played together as a perfect unison. Then one note will be
raised in frequency, gradually. When you hear two distinct pitches (that is, no longer just 'beats'),
raise your hand.' (the difference is then read and recorded.)
Practice exercise 1: ____________ Practice exercise 2: ______________
Now you are ready to take the test. You will hear, at spaced intervals of four seconds (two seconds of
sound followed by two seconds of silence) a sound to which you are to respond by writing 1 if you hear one
pitch, and 2 if you hear two pitches simultaneously. These occur relatively fast, so mark each answer
quickly and be ready for the next stimulus. Mark your answers in the columns indicated below. At the end
of each set, there will be a four-second interval. You should, during that time, get ready to start the
next column. The test admistrator will call out the beginning of each set.
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6

1. ___ 11. ___ 21. ___ 31. ___ 41. ___ 51. ___
2. ___ 12. ___ 22. ___ 32. ___ 42. ___ 52. ___
3. ___ 13. ___ 23. ___ 33. ___ 43. ___ 53. ___
4. ___ 14. ___ 24. ___ 34. ___ 44. ___ 54. ___
5. ___ 15. ___ 25. ___ 35. ___ 45. ___ 55. ___
6. ___ 16. ___ 26. ___ 36. ___ 46. ___ 56. ___
7. ___ 17. ___ 27. ___ 37. ___ 47. ___ 57. ___
8. ___ 18. ___ 28. ___ 38. ___ 48. ___ 58. ___
9. ___ 19. ___ 29. ___ 39. ___ 49. ___ 59. ___
10. ___ 20. ___ 30. ___ 40. ___ 50. ___ 60. ___
Appendix B
F R E Q U E N C I E S
(Deviations in cents)
X E3/A3 C5 E5 C6 / C3 E4 /A4 C4
Set (164.8/22) (523.23) (659.24) (1046.48/130.81) (327.62/440) (261.62)
1. 40 100 80 10 50 70
2. 90 50 50 80 10 40
3. 100 10 70 70 100 60
4. 50 60 40 20 20 30
5. 70 70 60 90 80 50
6. 20 20 30 60 30 90
7 80 80 90 30 70 20
8. 10 40 20 100 40 100
9. 30 90 100 50 90 80
10. 60 30 10 40 60 10
Back to index

Rebecca A
PREFERENCE AND SIMILARITY JUDGMENTS OF MISTUNED UNSTABLE TONES
Rebecca A. Pittenger
Rebecca.A.Pittenger@Dartmouth.EDU
Background:
Listeners expect unstable tones to be followed by stable tones that are close
in pitch. This is familiarly known as 'resolution'. According to our model, an
unstable tone, being unexpected, attracts frequency selective attention to
itself. A stable tone within the attention band (roughly a minor third)
captures the focus of attention, leading to a directional expectation toward
the stable anchor. Mistunings of unstable tones should thus generally be
preferred if they are in the direction of the nearest anchor (e.g., a leading
tone is preferred if mistuned sharp rather than flat). Paradoxically, because
of the band of attention around the anchor, mistunings should sound more
pronounced (more distant or dissimilar from the original) if they are in the
direction of the anchor (e.g., a mistuned leading tone should sound less
similar to a well tuned leading tone if mistuned sharp than flat).
Aims:
The aim of this study was to compare the perception of anchored and nonanchored
mistunings of nonchord tones by eliciting preference and similarity judgments.
method:
Participants heard a three-second context chord followed by two pairs of

shorter chords. The first chord in each pair had an added nonchord tone and the
second chord had a nonchord tone that was mistuned sharp or flat by a
quarter-tone. Participants judged which pair was more preferable and which pair
was more similar.
Results:
Participants judged nonanchored pairs to be more similar than anchored pairs,

and they judged anchored pairs to be more preferable than nonanchored pairs.
Conclusions:
The divergence of preference and similarity judgments in the mistuning of

unstable tones is discussed in terms of frequency selective attention.
Back to index
file:///g|/poster2/Pittenge.htm [18/07/2000 00:35:00]

A longitudinal study of the process to acquire absolute pitch
Proceedings paper
A longitudinal study of the acquisition process of absolute pitch: an effect of subject's age on the process.
Ayako Sakakibara
JSPS Fellowships for Japanese Junior Scientists
3-19-2 Nagasaki
Toshima-ku Tokyo 171-0051 Japan
HQM01603@nifty.ne.jp
Introduction
Absolute pitch (AP) is the ability to identify or produce a musical pitch without the use of external reference tones. In this study,
some aspects concerning the acquisition processes of AP is investigated.
It has been suggested that everyone initially has the potential to acquire AP. However, almost everyone who exceeds a certain age
cannot acquire AP. This "early-learning theory" states that the musical experiences only during a limited early period can be
effective to develop AP. According to this theory, we can expect that the learning ability to develop AP is decreased as a person
grows older. But little attention has been given to the question about what kind of changes decrease the possibility to acquire AP as
one gets older.
I suppose that the acquisition processes of AP would be different by age of the subjects. The purpose of this study is to examine the
effect of a subject's age on the acquisition processes of AP.
I trained 6 young children (non-AP possessors) to develop AP by the chord identification training method, and investigated their
acquisition processes of AP longitudinaly.
In this training method, children generally start the training at 3 or 4 years old. This age group can be called the "general age". In
this study, I trained subjects who started the training when they were younger or older than general age. Subjects of this study were
3 younger subjects starting the training at 2 years old and 3 older subjects starting at 5 or 6 years old.
I would like to clarify the different characteristics of the acquisition processes of AP between the subjects of differing ages. This
attempt would offer a key to understanding the changes which decrease the possibility to acquire AP as one grows older.
Method
(1) AP training method.
This study used the chord identification training method, which is the most successful method to acquire AP (Eguchi, 1991). The
method consists of tasks for identifying some chords.
Training by the use of chords is considered to be proper to acquire AP. According to the notion that the attributes of tones have two
components "tone height" and "tone chroma", the acquisition of AP can be regarded as developing the reference frame of "tone
chroma". The use of chords can prompt attention to "tone chroma". In the case of identifying single tone, subjects would tend to
pay attention to "tone height". In contrast, the use of chords enables one to make stimulus that "tone height" are similar, but "tone
chroma" are quite different. So to identify among chords can make subjects to identify chords depending on their "chroma".
Subjects did chord identification tasks everyday. One session consisted of twenty to thirty trials (it took about three minutes).
Subjects had to do four or five sessions per day, thus totaling about one hundred and twenty trials per day. If subjects could identify
chords, the number of chords would be increased. At the time when nine kinds of chords are identified perfectly, the AP for every
white-key note is acquired(Oura et al,1981).
The training generally took about one year for the acquisition of white-key notes. The analysis of this paper deals with AP of
white-key notes (the identification of nine kinds of chords) only. The nine chords were, CEG, CFA, HDG, ACF, DGH, EGC, FAC,
HCD and GCE.
(2) Subjects.
In the chord identification training method, children generally start the training at 3 or 4 years old. This can be called as the
"general case" (Sakakibara,1998,1999). The subjects of this study were children who started the training at 2 years old (younger
file:///g|/poster2/Sakakiba.htm (1 of 4) [18/07/2000 00:35:03]

cases' subjects) and children who started at 5 or 6 years old (older cases' subjects). 3 subjects of younger cases and 3 subjects of
older cases participated this training.
(3) Method of data analysis.
The analysis was done on the data which covered the training records for about one year. The mothers of subjects trained and
recorded subjects' responses for every trial. An experimenter instructed mothers of subjects the next training procedure (based on
these records), twice a month. A week after the instruction, the experimenter analyzed the records of one whole day.
I analyzed each case longitudinally and illustrated their transitions of cognitive strategies in the acquisition processes. The
acquisition processes of younger case group and older case group were compared with the acquisition processes of general case.
The aim of this analysis was to explore the differences among subjects of differing ages.
Result
1. Stages of the acquisition processes of AP.
The acquisition processes of every case could be divided into 4 stages depending on the number of chords of the tasks. The way to
divide into stages was same as that of "general case".
The number of chords continued to increase during the training. If subjects could answer every trial of the task correctly, another
chord would be added. So the number of chords showed the degree of acquisition. The training process seemed to be differentiated
into two types of periods: (a) the period when the number of chords increased and (b) the period when the number of chords didn't
increased.
Stage 1 was the period that the number of chords increased rapidly. When the number of chords was six or seven, the chords didn't
increase. This period in which chords didn't increase was called Stage 2. After Stage 2, the number of chords began to increase
gradually. This period when the number of chords increased gradually was called Stage 3. After Stage 3, the last period when the
subjects could answer every chord correctly was called Final Stage.
I illustrated the characteristics of each stage of younger cases and older cases, and compared them to the characteristics of general
cases.
(2) Errors of identification tasks.
The training tasks were chords identification tasks. Errors observed in the tasks were classified into the following three categories.
(a) Errors depending on "(tone) height".
The errors were caused by mistaking a chord for a similar one in term of "tone height". They represented the strategy depending on
"tone height" (e.g. CEG-CFA).
(b) Errors depending on "(tone) chroma".
The errors were caused by mistaking a chord for another one which component tones were same but different in term of "tone
height". In term of "tone chroma", nine chords were differentiated into CEG group, FAC group and GHD group. The mistakes
within same group means the strategy depending on "tone chroma" (e.g. CEG-EGC).
(c) Errors indicating no answer .
The subjects had no answer.
What kind of errors occurred in each stage would show characteristics of hearing strategy in each period and their transition.
(3) Investigation of the errors observed in each stage.
[Stage 1]
In the general case, the characteristics of stage 1 were that the percentage of correct answers was very high, and most of errors
observed were dependent on "height". Subjects of general case were supposed to use only the strategy to identify chords based on
"height".
Stage 1 of the older case group had almost same characteristics as that of the general case, but Stage 1 of the younger case group
was different. In Stage 1 of the younger case group, a lot of errors depending on "chroma" were observed. This result showed that
younger subjects had a tendency to easily pay attention to "chroma".
[Stage 2]
Stage 2 of every case was the period while chords didn't increase and the percentage of correct answers was low as compared with
Stage 1. Stage 2 of the general case was particularly characterized as the period when the subjects began to identify the chords
based on "chroma". They were supposed to try to get out of the strategy based only on "height", but they couldn't perfectly use the
strategy based on "chroma" yet.

In the case of the younger case group, errors depending on "chroma" already appeared in Stage 1. Stage 2 of the younger case
group was the periods when the percentage of errors depending on "chroma" was high.
In contrast, Stage 2 of the older case group had very few errors depending on "chroma". In the older case group, errors depending
on "height" were consistently dominant. The result suggested that older subjects tended to identify the chords mainly based on
"height".
[Stage 3]
In Stage 3 of the general case, a lot of errors were observed. Especially the percentage of errors depending on "chroma" was high.
The Stage 3 characteristics of the older case group were the same as the general case with respect to low percentage of correct
answers. But the details of errors were quite different. Errors observed in the older case group were almost dependent on "height".
In the case of the younger case group, few errors were observed. Younger subjects came to correctly identifying the chords in this
stage. Younger subjects always had a few errors depending on "height" throughout the processes.
[Final Stage]
In the Final Stage, there was no difference among cases. Errors of Stage 3 decreased gradually and the percentage of correct
answers became 100% in every case.

Figure.1 Percentage of correct answers and three types of errors in Stage 2.

Conclusion
Results showed that errors observed in the training of AP were different among cases differing in subjects' age. The difference
means that the strategies children tend to use would change as they grow older.
Younger subjects tended to identify chords based on "chroma". In contrast, older subjects had a strong tendency to identify chords
based on "height".
These characteristics suggested that as one grows older, the observed change was a decrease in the tendency to use the "chroma"
component. Children would come to depend "height" and not to use "chroma" with increasing age.
If we consider the acquisition of AP as the formation of the reference frame of "chroma", the change suggested by this study is
agreement with the principles early-learning theory.
References.
Eguchi,K. 1991 Oto wa rokketto mitaini tondekuru. Tokyo:Niki-syuppan.
Sakakibara,A. 1998 A longitudinal study of the process to acquire absolute pitch. Paper presented at the 5th International
Conference on Music Perception and Cognition, Seoul,Korea.
Sakakibara,A. 1999 Zettaionkan syuutoku purosesu ni kansuru jyuudan teki kenkyuu (A longitudinal study of a process for
acquiring absolute pitch).Japanese Journal of Educational Psychology, 47,19-27.
Oura,Y.,& Eguchi,K.1981 Is absolute pitch innate or acquired? Paper presented at the colloquium of the International Music
Festival, Brno, Czechoslovakia.
Back to index

Brief description of L&J rules and applications -ref Deliege
Proceedings paper
COMPUTATIONAL MODELLING OF SEGMENTATION PROCESSES IN UNACCOMPANIED MELODIES

Maja Serman, Dept. of CSIS, University of Limerick, Ireland.
Niall, J.L. Griffith, Dept. of CSIS, University of Limerick, Ireland.
Nikola Serman, Dept. of Power Engineering, FMENA, University of Zagreb, Croatia.
Introduction
Music is sequential and more and less continuous. But we hear 'chunks' of sound that pass before us; they begin and end. Models of segmentation thought to be responsible for chunking define musical
descriptors through which the degree of change in a sequence of events can be measured. Segmentation occurs when change in a parameter(s) exceeds some bound of coherence, causing a 'break'. Lerdahl and
Jackendoff proposed one model of this kind - as part of a broader theory - the Generative Theory of Tonal Music (GTTM), (Lerdahl and Jackendoff, 1983).
Lerdahl & Jackendoff's Grouping rules

In GTTM, Grouping Preference Rules (GPRs) for melodic segmentation are specified that are based on gestalt principles. The rules have been discussed and investigated in several studies of Western Tonal
Music (WTM). GTTM as a whole was proposed as a model of the perceptual processes of "an experienced listener", i.e. an acculturated listener, who has learned and incorporated the schemas inherent in a
particular musical form. However, the GPRs are conceived as more general, and by implication lower level processes - "the rules for grouping seem to be idiom-independent" (Lerdahl & Jackendoff, 1983,
p36). Lerdahl and Jackendoff defined their GPRs through a common specification of the properties of sequences of four notes to determine segmentation based on the gestalt principles of proximity and
similarity. In GTTM these events are defined in terms of the symbols found in WTM notation.
The main experimental verification of Lerdahl and Jackendoff's GPRs is Deliège's work with musicians and non-musicians, using WTM polyphonic examples and artificial sequences (Deliège, 1987). Deliège
re-classified the rules into two main groups - based on the perception of length or purely acoustic differences. Deliège's experiments seem to support the existence of the grouping mechanisms proposed by
Lerdahl & Jackendoff. Her experiments also showed that the segmentation processes in musicians seem to be more directly linked to the features of WTM (articulation, dynamics, octave interval) than those of
non-musicians. In addition, non-musicians' performance accorded more with the GPRs when applied to artificial sequences (figure1). However, there remain questions as to how applicable GTTM is to
non-western music. To investigate the universality of these rules it is useful to study how well they describe processes in melodies from a variety of musical cultures. However, when trying to do so, we are
faced with problems of transcription and the adequacy of WTM notation for describing non-western music.
The WTM score is inadequate for investigating perception?

A score is a product of the WTM tradition and reflects WTM's building blocks and rules. Through the centuries, composers and theorists have encoded information needed for the performance of WTM. The
result is a system that a) preserves the music and b) enables its dissemination and teaching (Serafine, 1988). WTM notation is a system of discrete symbols, dominated by the representation of pitch and
duration. Information about accentuation, tone-quality and partially at least, information about changes in tempo and dynamics are left to the oral tradition of performance practice. It is arguable that models of
file:///g|/poster2/Serman.htm (1 of 11) [18/07/2000 00:35:11]

Figure 1: Diagrammatic view of the differences in attending to segmentation by musicians and non-musicians reported by Deliège (1987).
melodic organisation that use the score as their input depart from trying to model perceptual processes because they use a representation that is an informal and partial model of WTM's organisational principles
(Baker, 1989; Friberg, Bresin, Frydén, & Sundberg, 1998; Stammen & Pennycook, 1994). Assuming that music and its notated representation are equivalent confuses aspects of human auditory mechanisms
with a symbolic communication code and runs the risk of confusing the organisational principles of WTM with more general principles. To investigate this a number of exploratory studies have been
undertaken.
Lerdahl and Jackendoff's GPRs have been implemented as a program, and applied to sets of melodies of different cultures transcribed via traditional western notation into symbolic codes. The data sets used in
this transcription set included German children songs, Irish traditional melodies, Ojibwa songs and Chinese traditional melodies (table 1.). The first two sets were encoded in the Essen Associative Code
(EsAC) (Kindly provided by Ewa Dahlig from the archive established by Helmut Schaffrath) and the third and fourth sets (Ojibwa and Chinese melodies) in the Kern code (von Hippel, 1998). We will use the
term Chippewa music instead of Ojibwa, as it is the case in the original report (Densmore, 1909).
Song sets Sample size Average notes Average rules

per melody fired per melody
German 78 39 16
Chinese 30 67 31
Irish 50 77 33
Chippewa 41 56 25
Table 1: Some basic information about the four song sets.

All four data sets are represented by discrete (chromatic) pitch and duration information. Therefore, all the GPRs acting on expressive melodic features, e.g. articulation, timbre and dynamics were excluded.
This is the situation when the computational implementation of the segmentation rules is based on notation input. The GPRs listed below, showing the names used by Lerdahl and Jackendoff and Deliège, were
implemented.
Lerdahl and Jackendoff Deliège
GPR 2 a) Rest R.1

GPR 2 b) Attack/Point R.2
GPR 3 a) Register R.3
GPR 3 d) Length R.6
The encoded melodies were presented to the 'rule program' and a number of statistics were collected, e.g. the number of times that each rule was fired during a melody. To balance the effect of uneven melodic
length this raw count was normalized by dividing the number of firings by the number of notes in the melody. The mean percentage of that rule's normalized firings was calculated for each set of melodies
(figure 2).
Figure 2: Normalized numbers of rule usage for each transcribed data set.
When making comparison with results from (Deliège, 1987) we must bear in mind that these results reflect all situations where a rule could fire (no salience scale was implemented). The rule usage measured is
consistent with the results of Deliège's second experiment (Deliège, 1987) for three of the four data sets. The R.3 (Register rule) is the most widely used rule, followed by the R.2 (Attack/Point) rule (as it is
reported in Deliège (1987), taking into account the absence of the R.7 (Timbre rule) from this assessment). The least differentiated rule usage is that of the R.6 (Length rule). The low counts for the R.1 (Rest
rule) usage comes somewhat as a surprise considering that it is often considered to be the chunking cue (Narmour, 1990; Deliège, 1987)
"The sensitivity to a sensation of a gap in music perception may be considered, by the way, as a key element in the grouping behaviour." (Deliège 1987, p.343)
The following observations about segmentation processes in these different musical cultures seem valid:
● In all four sets, the importance of the Length rule (R.6) is similar.
● The Rest rule (R.1) is largely absent from all but the European melodies.
● In the Chippewa set, there is a preference for using the Attack/Point rule (R.2) over the Register rule (R.3). The situation is opposite for the other three sets.
However, it is questionable whether even such general interpretation of the measures can be made with any confidence due to the uncertainties associated with the encoded information. The main problems
arising from pitch and duration transcriptions of melodies apply mostly to the data sets from the musical cultures most remote from the WTM system, namely Irish traditional and Chippewa music (Densmore,
1909; Mulheir, 1991; Henebry, 1928). These will now be briefly described. For research on similarities and historic relationship between Western and Chinese scale systems see (Kuttner, 1975; Dowling &
Harwood, 1986). The following problems are more or less applicable to the transcription of non-western music.
Pitch transcription
What Densmore calls "incorrect tones" are tones that are at a categorical granularity that is either finer than or out of step with the chromatic pitch system of WTM. These are transcribed with symbols
representing a pitch "slightly less than a semitone higher/lower than the proper pitch" (Densmore, 1909, Vol. II, p. XIX; Abraham & von Hornbostel, 1909-1910). When taking into account that human 'just
noticeable difference' (JND) for pitch is approximately 1/12th of a semitone (Howard & Angus, 1996), 'slightly less' carries very little information.
The use of vibrato (wavering tone) that is prevalent in both Chippewan and Irish singing styles can be indicated by a symbol (which is absent from the above mentioned data sets) but even then, the influence of
the pitch change (due to vibrato) on perceptual chunking is lost.
Interval transcription
When looking at Densmore's extensive statistics, one is struck by the variety of scale types and intervals used by Chippewa singers that do not conform to the WTM system definitions and classifications
(Burns, 1999). Without deeper analysis of this variety, the appropriateness of the Register rule definition (GPR 3a) as given in (Lerdahl & Jackendoff, 1983) for Chippewa melodies is questionable.
Metrical transcription
Both Irish and Chippewa singers structuring of time often completely evades the rules of WTM transcription. Chippewa singers introduce time changes almost every measure in many songs (Densmore 1909).
One of the problems with the Irish tunes transcriptions, comes from the dominance of the bar lines over the note importance. This often obscures the accents (present in the performance of the melody). Also,
airs (both slow and quick) have no definite time structure at least not in the WTM system sense and their transcription is therefore particularly "untrustworthy" (Henebry 1928).
Durational transcription
Leaving aside prolongations of notes that are reported to be extremely difficult to transcribe by all three of the aforementioned authors, the virtual non-existence of rests in three out of four transcribed data sets
is highly unlikely. In spite of Densmore's observation that rests occurred in only 4% of the transcribed songs, from the perceptual point of view rests do not have to fit into any metrical system in order to be
perceived as such. For this reason the absence of rest indicators may be a serious omission and this brings into question the Rest rule (GPR 2a) results (figure 2).
When reviewing the counts above and reading the reports of the transcription of Chippewa and Irish melodies, two conclusions seem inescapable:
1. The number of omissions from the real sound of the music that the ethnomusicologist has to allow to be able to make the transcriptions is unknown but probably significant.
2. The notation is incomplete and hence inappropriate as a representation of perceptual properties. We have no way of measuring the potential role of absent descriptors.
These problems must lead us to ask the following question: Do the variations in organisation - reflected in the frequency of rules - imply different organisations of rule use in different musical cultures, are they
meaningless, or do they simply reflect properties of the WTM notation system? The answer is that we won't know unless we analyse perception independently of notation. We must start with sound rather than
from an incomplete theory laden abstraction.
What do we want from real sound?

As we have mentioned, much research in music perception points to the operation of gestalt principles in the grouping mechanisms underlying segmentation (West, Howell & Cross, 1991). Experimental
evidence, also points to there being great flexibility in auditory perceptual processes regarding the attention span and temporal granularity at which the grouping mechanisms operate (Deliège, 1987;
Jones-Boltz, 1989). Yet, most of the research done on the grouping mechanisms in perceiving musical sound relies on WTM notation. This is an issue identified in comparative studies by Seeger (1958).
"Each of the many music traditions in the world probably has its own distinctive ways of connecting or "putting in what should come between the notes"" (Seeger,1958).
Recognising the problems involved in attempting an objective analysis of musical sound Seeger designed several types of an electronic sound analyser, which he called the Melograph (Crossley-Holland 1974).
The Melograph provided ethnomusicologists with three physical attributes of the performed sound - fundamental frequency, intensity and spectral content. The Melograph was an analogue electronic device
(the first results were reported in the early 1960's (Crossley-Holland, 1974; Erdely & Chipman, 1972)). The main shortcomings of the Melograph for investigating auditory perceptual processes come from its
analogue nature, the form in which the results of the sound signal analysis were presented (oscilloscope display) and its lack of programmability.
Today there is still a gap between computational signal analysis on one hand and the complexity of processes involved in human hearing at the sensory and perceptual levels on the other. Therefore, the same
three perceptual attributes of musical sound have been chosen as significant with respect to the grouping mechanisms: pitch, relative loudness (dynamics) and timbre (these attributes will be called "melodic
descriptors"). The main difference between the description of the sound provided by the Melograph and that available from the present digital signal processing system is that the current system has the
capacity to model aspects of signal interactions and salience in more or less arbitrary ways.
It is well known that the perception of any of these melodic descriptors in human hearing process is not related to a single parameter obtainable by computational signal analysis (Handel, 1995; Krumhansl &
Iverson, 1992). Furthermore, in perceiving these descriptors a listener not only depends on the melodic sequence, but is influenced by other conditions in operation during the performance, e.g. the overall
loudness of the sound, etc.
Taking this into account, the software model is a long way from an emulation of the human auditory pathway. Consequently, instead of an attempt to quantify the melodic descriptors themselves by
computational signal analysis, the emphasis has been on calculating numeric indicators to enable estimation of change within a respective melodic descriptor that is conceived as input for modelling the
grouping processes.
By compromising between the existing experimental and theoretical knowledge regarding the field of auditory perception and the affordable computational means, the indicators of pitch, dynamics and timbre

have been defined in a way which is suitable for building a computer-based model of melodic segmentation applicable to monophonic musical sounds.
The MusicTracker
The software tool described here is called the MusicTracker. It extracts information directly from the digitally recorded music signal. The present version is suitable for application to monophonic music and its
development is still in progress. It has been developed in MATLAB and is provided with a graphical user interface.
The MusicTracker reads a 'wav-file', i.e. an auditory signal expressed in an array of numbers representing changes in the acoustic air pressure over time. The signal array is divided into consecutive 'frames' of
equal duration - about 20 ms, and for each frame the values of pitch, perceptual dynamics and timbre indicator are computed.
Due to the fixed frame duration, the strings of consecutive values of each indicator form a discrete function of time representing the change of the respective indicator during the melody. These functions can be
represented graphically or stored for further manipulation either in their raw form, i.e. as computed or smoothed by the use of filters with selectable bandwidths.
Melodic descriptor indicators

The pitch indicator
Though the perceived pitch of a musical sound is not unambiguously determined by its fundamental frequency, this frequency is accepted as the pitch indicator of monophonic musical sound. The well-known
phenomenon of the 'missing fundamental' (Rossing, 1990) might call for more sophisticated algorithms for the fundamental frequency calculation. However, testing the MusicTracker with various musical
sounds (including the human voice) has shown that the frequency of the first peak in the power spectrum of a signal can be treated as the fundamental frequency with an acceptable risk of error. The power
spectrum obtained by the signal-processing algorithm is discrete. The frequency range and resolution are parameterised. A value of 1 for the frequency resolution parameter corresponds to the semitone
division of the WTM scale, while higher values give finer resolution. The only limitation in selecting the value of this parameter is the time needed to process the sound signal within a given frequency range.
The perceptual dynamics indicator

It is well known that subjective perception of loudness depends mainly on sound pressure, but also varies with fundamental frequency and frequency content of the sound signal (i.e. spectrum) and duration
(Rossing, 1990). In the MusicTracker the influence of fundamental frequency is taken into account, while the influence of the critical bandwidth across components and duration is being experimented with.
The term perceptual dynamics in the MusicTracker denotes the ratio of the signal power of a frame and the minimal perceptual power, obtained by correction of the minimal physical power in the recording
with respect to frequency. This correction is according to a simplified "pianissimo equal loudness curve" derived from Fletcher & Munson (Fletcher & Munson, 1933). The monophonic musical sound signals
are comprised of the fundamental and harmonics, so currently the following assumptions have been made:
● A signal whose power spectrum contains several spectral peaks is treated as a sum of the pure sinusoidal components, each of them contributing to the total signal power proportionally to it's spectral
peak height.
● Each of components contributes to the perceptual dynamics of a frame proportionally to its individual perceptual dynamics.
In order to eliminate the influence of the pre-set physical dynamic range depending on the sound record quality, the values of perceptual dynamics indicator of all frames of the melody are normalized. All
values are divided by the highest value of the perceptual dynamics indicator found in the entire sequence. Thus, dynamics are bounded within the range from 0 to 1.
The timbre indicator

Research into timbre and its perception has developed the idea of tone colour as the simplest notion of the sound quality of a musical tone. The multidimensionality of the descriptors that determine tone colour
seems to be a generally recognized property of this indicator (Grey, 1977; Grey, 1978; Houtsma, 1997; Iverson & Krumhansl, 1993; Hajda, 1997; Krumhansl & Iverson, 1992; Rossing, 1990). The dimensions
that have been implicated as important in the human perception of timbre include the relative contribution of low and high frequency harmonics in the spectrum of a stationary sound signal, and the dynamic
attributes of onsets and decays of tone components (Iverson & Krumhansl 1993). At the present stage of the MusicTracker development the unambiguousness of the timbre indicator has been sacrificed in
favour of simplicity and convenience for use in melodic segmentation investigations. Its calculation relies on the idea of the centroid frequency. It has been defined as the weighted sum of relative heights of
spectral peaks, i.e. the relative peak height of each harmonic is weighted by the logarithm (base 2) of the ratio between the frequency of respective spectral peak and the fundamental frequency. In the case of a
pure sinusoidal signal the value of the timbre indicator equal zero, and it's value increases with the increasing presence of higher harmonics in the signal. The timbre indicator values calculated in this way for
all frames are finally divided by the largest value found in the sequence of frames, so the normalized timbre indicator has values between 0 and 1.

Applying the MusicTracker to a fragment of Shakuhachi music
An initial study was made of the segmentations that arise when applying the GPRs to a transcribed melody, and to the melodic indicators obtained through the MusicTracker. The purpose of this study is to
illustrate in a very simple way the problems involved. The example used was a short excerpt from a shakuhachi piece "Hifumi Hachigaeshi", performed by Iwamoto Yoshikazu. The piece was transcribed by an
ethnomusicologist. The most cursory glance at the fragment (figure 3) alerts us to the kinds of problem we face working from the transcription. Even with WTM tempo set to very slow we must stretch our idea
of time to draw the WTM shakuhachi notation to its original duration of 10.5 seconds.
Applying GPRs on the notated melody results in no rule firing during the excerpt (figure 4), and only one rule would fire in a longer melodic context. The Rest rule (GPR 2a) would fire between the end of the
transcribed phrase and the next 'phrase' in the melody. Shakuhachi players tend to follow natural breathing pattern in phrasing therefore the phrases are quite distinctive (Fujie, 1992).
Figure 3: Transcribed WTM notation of the first few seconds of the shakuhachi piece.
Figure 4: Applying GPRs to the notated shakuhachi excerpt results in no rules firing.
However, when we listen to the fragment it is clear that the dominant perceptual changes are occurring not in the development of the pitch indicator but in the loudness and timbre indicators. This can be seen
in the graphic representation of the MusicTracker indicators (figure 5). This is perhaps not surprising when we consider that shakuhachi players can use 6-8 different tone qualities on the same tone (Dai Shihan

T. Inzan, April 2000, personal communication). The same, or even stronger expressive importance is attributed to loudness (Fuije, 1992). If we apply GPRs to the values measured for the pitch indicator by the
MusicTracker software, the algorithm that implements GPRs - which is implemented from the GTTM specification in terms of discrete notes - has no insight into the granularity of the discrete events that it
receives. GTTM just assumes that discrete events are the notes used by WTM. The algorithm treats the discrete pitch oscillations as a 'four note sequence' and the result is that the rules propose two
perceptually irrelevant segmentations (figure 6).
This preliminary study has been very simple. However, it does not need to be complex to illustrate the two main points that it raises. These are that the application of grouping rules specified in terms of the
discrete pitch events drawn from music notation, fails to capture all the perceptually significant aspects of melodic structure that may arise in melodies from different musical systems. Specifically, the
Shakuhachi example involves patterns of organisation in which both dynamics and timbre changes dominate the discrete pitch change-based segmentation rules. The algorithm for implementing GPRs defined
in terms of WTM system cannot recognise this perceptually significant changes. In addition, the detail that WTM fails to represent is probably also significant for our understanding of the more dynamic
aspects of melodic process, of the type proposed in Narmour's Implication-Realisation model (Narmour, 1990). The contribution of these processes within phrases and their interaction with phrase
segmentation will be lost in transcription.

Figure 5: MusicTracker indicators measured from the performance recording.
Further development
The initial analysis described above indicates some of the advantages of working with real sound. In this way, it has been possible to look at issues for modelling melodic segmentation processes at a level
closer to a real performance and how these compare to the results of notational transcriptions. However, the MusicTracker is currently still under development. A number of developments will be included in
the future. The indicators of perceptual dynamics and timbre are currently defined by simplifications derived from our existing knowledge based on psycho-acoustics and related disciplines, and these
definitions are to some degree simplifications of the perceptual descriptors - this is especially so for the timbre indicator.
The application of the MusicTracker in analysing many monophonic melodies performed by various sound sources (including human voice, oriental instruments like shakuhachi and sounds generated by
computer) has shown its ability to indicate significant changes of pitch, perceptual dynamics and timbre respectively. This makes it a useful tool in the computational investigation of melodic segmentation
across cultures, albeit a first approximation.
In order to investigate perceptual processes and build a model of melodic segmentation two problems have to be addressed. Firstly, there is the problem of modelling in a more realistic way the sensory and
perceptual information available to the listener when listening to melodies. Secondly, we need to take into account the fact that this perceptual information is conditioned heavily by the music culture that the
listener belongs to, i.e. that the organisation and balance of the contribution of descriptors within a musical culture is to some extent learned. While the descriptors themselves may be considered to be
hard-wired gestalt processes, how they are used together has to be learned. It is this aspect of melodic organisation that the developing MusicTracker will be used to model. Thus we hope it will help clarify the
validity or otherwise of models such as GTTM's GPRs and thus contribute to an increasingly scientific approach to the investigation of melodic processes.

Figure 6: Applying GPRs to the MusicTracker results for pitch from the shakuhachi excerpt. The numbers at the top, represent the number of the rule that fired. In this case, it was the
Attack/Point rule (GPR 2b) that fired - twice.
Acknowledgments
We would like to thank Mr Tomizu Inzan - Master of the Tozan School of Shakuhachi, Mr Kevin Hayes, Ms. George Mulheir, Dr. Ewa Dahlig, Paul von. Hippel and Dr Donncha O'Maidin for their comments
and suggestions as well as for the recordings, transcriptions and encoded data sets.

References
Abraham, O., & von Hornbostel, E. M. (1909-1910). Vorschlaege fuer die Transkription exotischer Melodien. Sammelbaende der Internationalen Musikgesellschaft, 1-25.
Baker, M. (1989) An artificial intelligence approach to musical grouping analysis. Contemporary Music Review, 3, 43-68.
Burns, E. M. (1999). Intervals, Scales and Tuning. In Deutsch, D. (Ed.), The Psychology of Music San Diego: Academic Press, 215-264.
Crossley-Holland, P. (Ed.). (1974). Selected Reports in Ethnomusicology, 2(1)
Densmore, F. (1910). Chippewa Music. Washington: Government Printing Office.
Deliège, I. (1987). Grouping conditions in Listening to Music: An Approach to Lerdahl and Jackendoff's Grouping Preference Rules. Music Perception, 4(4), 325-360.
Dowling, W. J., & Harwood, D. L. (1986). Music Cognition. San Diego: Academic Press.
Erdely, S. & Chipman, R. (1972). Strip-chart recording of narrow band frequency analysis in aid of ethno musicological data. 1972 Yearbook of the International Folk Music Council, 120-136.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Journal of the Acoustical Society of America, 5, 82-108.
Friberg, A., Bresin, R., Frydén L., & Sundberg, J. (1998). Musical Punctuation on the Microlevel: Automatic Identification and Performance of Small Melodic Units. Journal of New Musical
Research, 27(3), 271-292.
Fujie L. (1992) East Asia/Japan. In Todd Titon J. (Ed.), Worlds of Music: an introduction to the music of the world's peoples. New York: Schirmer Books, 318-375.
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America 61(5), 1270-1277.
Hajda, J. M., Kendall, R. A., Carterette, E.C., & Harshberger, M. L. (1997). Methodological issues in timbre research. In Deliège I., & Sloboda J. (Eds.), Perception and Cognition of Music.
Psychology Press, 253-305.
Handel, S. (1995). Timbre perception and auditory object identification. In B. C. J. Moore (Ed.), Hearing, New York: Academic Press, 425-461.
Henebry, R. (1928). A handbook of Irish music. London: Cork University Press
Hippel, P. T., von. (Ed.). (1998). 42 Ojibwa songs in the Humdrum **kern representation: Electronic transcriptions from the Densmore collections [computer database].
Houtsma, A. J. M. (1997). Pitch and Timbre: Definition, Meaning and Use. Journal of New Music Research, 2, 104-115.
Howard, D.M., & Angus, J. (1996). Acoustics and psychoacoustics, Oxford: Focal Press.
West, R., Howell, P., & Cross, I. (1991). Musical structure and knowledge representation. In Howell P., West, R., & Cross, I.(Eds.), Representing Musical Structure. London, Academic Press.
Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America. 94(5), 2595-2603.
Jones, M. R., & Boltz, M. (1989). Dynamic Attending and Responses to Time. Psychological Review, 96(3), 459-491
Krumhansl, C. L., Iverson, P. (1992). Perceptual interactions between musical pitch and timbre. Journal of Experimental Psychology: Human Perception and Performance, 18(3), 739-751.
Kuttner, F. A. (1975). Prince Chu Tsai-Yü's life and work: a re-evaluation of his contribution to equal temperament theory. Ethnomusicology, 19(2), 163-204
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press.
Mulheir, G. (1991). The structure and function of the 'sean-nós' tradition in Connemara, Ireland and the effects of outside influence on the Irish oral tradition vol. I & II Unpublished manuscript
with the University of Sheffield.
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures. Chicago:University of Chicago Press.
Rossing, T. D. (1990). The Science of Sound, Addison-Wesley.
Seeger, C. (1958). Prescriptive and Descriptive Music-Writing. The Musical Quarterly, 44(2), 184-195.
Selfridge-Field, E. (Ed.). (1997). Beyond MIDI - The handbook of Musical Codes. Cambridge, MA: MIT Press.
Serafine, M. L. (1988). Music as Cognition, New York: Columbia University Press.
Stammen, D. R., & Pennycook, B. (1994). Real-time segmentation of music using and adaptation of Lerdahl and Jackendoff's Grouping Principles. In Proceedings of the 3rd International
Conference on Music Perception & Cognition. Liège, Belgium: European Society for the Cognitive Sciences of Music, 268-270.
Back to index

The Effect of Context Key on Melody Recognition:
Proceedings paper

The Difference between Absolute Pitch Possessors and Others
Iwao YOSHINO
Department of Psychology, Hokkaido University
N10 W7, Kita-ku, Sapporo 060-0810, JAPAN
yoshino@psych.let.hokudai.ac.jp
Introduction
The process of "tonal organization" is an indispensable part of melody perception. The cognitive
organization of tones within a melody shapes the "Gestalt" of pitch in the tone sequence. The function
of tonal organization (which is greatly dependent on "tonal schema") is to organize input tone
sequences into a system of tonality which is centered around "tonal center" (Abe & Hoshino, 1990).
In the process of tonal organization, each tone of a pitch sequence is related to the tonic according to
the relative distance (interval) and assigned tonal function in hierarchical relation to other pitches
(Dowling, 1994; Krumhansl, 1990). When tonal organization is well-formed, the tonal center is fixed
in the mind. That is, as a result of this process, we can perceive tonality.
Well-formed tonal organization is not easily achieved for all pitch sequences. Usually, listeners have
difficulty finding the tonal center of atonal pitch sequences. Some pitch sequences may have several
possible tonal organizations, leading to ambiguous key perception. Generally, when a stimulus object
has several possible and plausible patterns of organization, we will be just as likely to follow one
pathway as another. However, if the object is preceded by a context which eliminates all but one
possible pathway, we can perceive the object in one stable organization as determined by this context.
In the case of a pitch sequence with an ambiguous key, if the sequence is preceded by a context which
establishes only one key of several possible keys, the tonal organization of the sequence will proceed
according to the context driven key. For example, if the pitch sequence "F4 G4 F4 Bb4 D4 C4" (key =
Bb major or F major) is preceded by the Bb major scale, it will be tonally organized in the key of Bb
major.
One may question whether or not the melody "F4 G4 F4 Bb4 D4 C4" organized in Bb major is
perceived differently from the same melody organized in F major. This is not a difficult question for
ambiguous visual stimuli such as "figure-ground reversible figures (vase/profile reversible figure)",
"depth reversible figures", and "ambiguous figures (My wife and mother-in-law)". It is evident that
what is perceived in one interpretation (a vase) is different from what is perceived in another
interpretation (two faces in profile). However, the question of interpretation is more difficult for aural
stimuli, such as the aforementioned ambiguous melody. A simple method examining this question is
to prepare both pitch sequences which include the ambiguous melody and the context establishing
each possible key, and then compare listeners' interpretation.
file:///g|/poster2/Yoshino.htm (1 of 9) [18/07/2000 00:35:14]

Dowling (1986), using the method similar to above one, examined listeners' recognition of melodies
and which of two types of pitch information, intervals or scale steps, listeners use in encoding
melodies. He asked listeners (inexperienced, moderately experienced, and professional listeners) to
perform a long-term transposition recognition task of melodies which were preceded by a chordal
context that established the key of each melody. The experiments demonstrated that inexperienced
listeners (as well as professionals) performed the task regardless of context, and that moderately
experienced listeners performed differently as a function of context. Further, professional musicians
performed most accurately. He proposed that inexperienced listeners use pitch-interval
representations, moderately experienced listeners use scale-step representations, and that professional
musicians use more sophisticated representations, in their memory for melodies.
From Dowling's results (1986), it may led that inexperienced and professional listeners are able to
recognize that two identical melodies presented with different contexts are the same, and that
moderately experienced listeners perceive the same stimuli as different. The question of interest to
this study is what factors account for the difference in tonal organization between these groups of
listeners. It is possible that the tonal representation strategies of experienced listeners may be
differentiated by pitch perception strategies (the use of absolute pitch or relative pitch), rather than
their degree of musical training. That is, it seems that the way that AP experienced listeners perceive
and encode melodies may be somewhat different from non-AP experienced listeners. Second,
Dowling's long-term recognition task (which is rather similar to the short-term situation) may focus on
one of several aspects in the melody recognition process. As Dowling has demonstrated, melody
recognition strategy varies depending on the delay time between the standard and comparison
melodies (Dowling, Kwak and Andrews, 1995). Therefore, the effect of context should be examined
in other types of melody recognition tasks. For example, simple immediate recognition and
recognition after a period of learning could be explored. Until these factors have been taken into
account, the question of how ambiguous melodies are perceived has not yet been resolved.
This paper examines the effect of "context key" on recognition of the melodies with two possible
keys, using short-term (Experiment 1) and long-term (Experiment 2) recognition tasks. Secondly, to
assess the difference in melody representation strategies between AP possessors and others
(experienced listeners without AP, and inexperienced listeners). The main interest of this study is the
recognition of ambiguous melodies preceded by different context stimuli. In order to explore this
question as directly as possible, it was decided that a simple non-transposed recognition task would
serve the design of present study better than a transposed recognition task. It was also important to be
sure that pitch sequences were reliably key-ambiguous. Therefore, a set of the pitch sequences from
Yoshino and Abe (1996) and Yoshino (1998b) which were verified as having two possible and
plausible key interpretations were used in the present study.
Experiment 1
Experiment 1 examines the effect of "context key" on melody recognition in the short therm memory.
This experiment adopts a simple discrimination recognition task in short retention interval.
Method
Subjects
45 subjects, graduate and undergraduate students at Hokkaido University, participated in the
experiment. They were grouped into 3 groups (15 in each). The inexperienced group (IE group)

consisted of 15 subjects who had studied music for less than 4 years or not at all. The remaining 30
subjects had each studied music for more than 10 years. Of these 30 experienced listeners 15
possessed absolute pitch (AP group) and 15 did not (EN group).
Stimuli
32 standard stimuli consisted of six tones and were presented at a rate of two tones per second. Each
standard melody had two possible keys on the diatonic scale. A preliminary experiment confirmed
that listeners found the key of the melodies sufficiently ambiguous. That is, of the two possible keys
that could be identified for each melody, one was not chosen significantly more often than the other.
32 comparison stimuli also consisted of six tones and were presented at the same rate as the standard
melodies. Half of them were the same as the standard stimuli, and the other half differed from the
standard stimuli by the alteration of only one tone by one diatonic step, in a way that did not alter the
contour and the original key (changing in 2nd to 5th position tone but never in 1st or 6th).
Each melody (standard and comparison) was preceded by a scale context. The scale context of a
standard melody is called "first context scale", and that of a comparison is called "second context
scale". The scale context stimuli consisted of upper octave scales from one of the two possible keys
for each melody, presented at a rate of two tones per second. The key of the scale context was selected
from the two possible keys for the standard or comparison melody. In order to strengthen the sense of
key, the tonic tone of the context scale was presented simultaneously with the melodies as an
additional context cue. The tonic tone was played one octave below the original pitch for 3 seconds.
In half of the trials, the key of first context scale and second context scale was the same (same context
condition). In the other half of the trials (different context condition), the key of second context scale
was different from that of first context scale (the other possible key). For example, in the different
context condition, the melody composed of "C4 F4 G4 F4 E4 A4" has two possible keys: "C major"
and "F major". If the key of the first context scale (preceding the standard melody) was set to C major,
the key of the second context scale (preceding the comparison melody) was set to F major.
All stimuli were presented on a Yamaha tone generator MU50 using MIDI controlled by the MIDI
sequencer software "Performer". In order to distinguish the standard and comparison melodies from
the context stimuli, melodies were produced using a piano timbre (Grand Piano) and the context
scales and tones using an organ timbre (Church Organ). The stimuli were presented at a comfortable
listening level that could be adjusted by the subjects.
Procedure
A trial consisted of a first context scale (3s), a standard melody with a context tone (3s), a second
context scale (3s) and a comparison melody with a context tone (3s). Subjects' task was to indicate
whether the standard and the comparison melodies were same or different within the inter-trial
interval of 5 seconds. They were asked to ignore the context stimuli. After 4 practice trials, each
subject performed 32 trials presented in random order.
Results
The variables of interest in this experiment are the hit rate (HIT), the false alarm rate (FA) and the
recognition probability. A hit is defined as a SAME response for stimuli in which standard and
comparison melodies are identical, and a false alarm as a SAME response for stimuli in which the
melodies differ. A recognition probability is defined as the HIT rate minus the FA rate.
Figure 1 shows the mean recognition probability for the three subject groups and the two types of
context condition. These recognition probability data were analyzed in a two-way analysis of variance

[3 (Subject Group) x 2 (Context Conditions)], which yielded only a main effect of subject group, F (2,
42) = 68.18, p < .0001. AP subjects (M = .80) performed significantly better than EN subjects (M =
.48), and EN subjects performed significantly better than IE subjects (M = .13), in Tukey's Test (HSD
= .14, p < .05). The main effect of context was not significant, F (1, 42) = .26, p > .10.
Discussion
In the short-term recognition task, there were no differences between same and different context
conditions for all three listener groups. Listeners performing this task did not appear to be influenced
by the context stimuli. It was unexpected that the performance of EN subjects were not influenced by
context. They seemed to pay attention to pitch change, a perceptual feature rather than whole melody
difference, a cognitive feature. It appears that second context scale in this experiment was not strong
enough to affect tonal organization. This result converges with previous findings that information
based on pitch-interval pattern doesn't contribute to immediate recognition of novel melodies
(Dowling, 1978, 1991; Dowling, Kwak & Andrews, 1995).
As expected AP listeners performed best, followed by EN listeners, and the IE group performed most
poorly. It appeared that AP possessors could recognize melodies correctly by linguistically encoding
the pitches of standard melodies to absolute pitch names and maintaining them in memory. Based on
the questionnaire data, it seemed that EN listeners could sometimes encode the pitches of standard
melodies to scale names (movable do), and sometimes they could not. For IE listeners, the main
difficulty was maintaining the pitch information of the standard melody while listening to the second
context scale.
Experiment 2
Experiment 2 examines the effect of "context key" on melody recognition in long term memory. This
experiment adopts a simple long-term recognition task consisted of a learning session and a
recognition session. In the learning session, subjects listened to standard melodies preceded by a
context scale five times and were asked to remember them. Then, in the recognition session, they

were asked to discriminate standard melodies from distractor melodies preceded by a context scale.
Method
Subjects
39 subjects, graduate and undergraduate students at Hokkaido University, participated in this
experiment. There were 13 inexperienced listeners (IE group), 13 experienced listeners without AP
(EN group), and 13 experienced listeners with AP (AP group). EN and AP subjects had studied music
for more than ten years. None of these subjects participated in Experiment 1.
Stimuli
48 melodies were used in this experiment, including the 36 melodies used in Experiment 1. As before,
each melody could be assigned either of two possible keys in the diatonic scale. All of the melodies
consisted of six tones and were presented at a rate of two tones per second, as in Experiment 1. Half
of the 48 melodies were used as the standard melodies to be remembered, and were presented in both
the learning session and the recognition session. The other half were used as distractors in the
recognition session. In the recognition session, standard melodies were presented without any pitch
changes.
The context stimuli were presented before the standard melodies and the distractors. In each case, the
context consisted of a scale and a tonic tone, in one of the two possible keys for the melody. In the
learning session, each standard melody was presented five times, with the same preceding key
context. In the recognition session, the context key of half of the the melodies from the learning
session were the same as those in the recognition session (same context condition). The other half
were preceded by the alternate context key in the recognition session (different context condition).
The context key of each distractor melody was arbitrarily selected from the two possible keys for the
distractor melody, independent of any condition in this experiment. The rate of presentation for
context and the duration of the tonic tone were identical to Experiment 1.
All stimuli were generated and stored as standard MIDI file by the MIDI sequencer software
"Performer" on Macintosh computer, and presented on a Yamaha tone generator MU50. A customized
program generated with PsyScope software was used to control stimulus presentation and to record
responses. Stimulus timbres were similar to Experiment 1, that is, standard melodies and distractors in
piano timbre (Grand Piano), and context scales and tones in organ timbre (Church Organ). The stimuli
were presented at a comfortable listening level that could be adjusted by the subjects. Subjects were
tested individually in a soundproof room.
Procedure
Learning Session. Each pitch sequence with context was presented five times in succession. As noted
above, the same key context preceded a standard melody every 5 times. Subjects were instructed to
listen to the melodies and to remember them for a subsequent memory task.
Recognition Session. After a 10-minute retention interval, recognition memory was tested in a
forced-choice task. Each trial began with a scale context, followed by the standard or distractor
melody with a tonic tone. Subjects were asked to indicate whether the melody was "old" or "new" by
pressing the "1" and "3" keys on a keyboard as quickly as possible. In other words, they listened to a
melody and indicated whether or not they had heard the melody in the preceding learning session.
They were also asked to ignore the context. All subjects performed the 48 trials in random order.
Result

The main variables in this experiment were the hit rate (HIT), the false alarm rate (FA) and the
recognition probability, as in Experiment 1. However, in this case recognition probability in each
context condition was calculated using the mean FA, because FA could not be calculated for each
context condition.
Figure 2 shows the mean recognition probability for the three subject groups and the two types of
context condition. These recognition probability data were analyzed in a two-way analysis of variance
[3 (Subject Group) x 2 (Context Conditions)], in which only a main effect of context condition was
obtained, F (1, 36) = 32.14, p < .0001. Subjects performed significantly better in the same key
condition (M = .52) than in the different key condition (M = .34). The main effect of subject group
was not significant, F (2, 36) = .92, p > .90. There was no significant difference in be recognition
probability between AP (M= .44), EN (M= .42), and IE groups (M= .42). There was a significant
interaction effect, F (2, 36) = 3.49, p < .05. This interaction reveals that the difference between the AP
group's recognition performance in same and different context conditions was much large than that of
EN and IE groups. The simple main effect of context condition for each subject group was also
significant, HSD = .08, p < .05. However, the simple main effect of subject group was not significant
for any of the context conditions.
Discussion
In the long-term recognition task, there were significant differences between same and different
context conditions for all three listener groups. Not only EN listeners' performance, but also IE and
AP listeners' performance were influenced by context key. Hit rates in the different key context were
near chance level (.50 for AP listeners, .58 for EN listeners, and .61 for IE listeners). That is, they
didn't seem to be able to recognize a melody preceded by a different key context even though it was
the same melody they had heard in the learning session. Because the FA rate was not high (about .22
for all subject groups), the difficulty of this experiment task could not account for such low hit rates. It
is probable that listeners perceived the same melody differently as a function of differing key contexts

which led to different pathways of tonal organization for the same melody.
Although in Experiment 1 we have the main effect of subject group, however, here in Experiment 2
such effect has not occurred, adding new information to look into. In short-term recognition,
experienced listeners including EN and AP groups had the advantage of their musical ability in
remembering tones by linguistically labeling them. However, in this situation, that mean retention
interval was much longer (about 25 minutes), and even experienced listeners could not maintain
linguistic labels in memory for that duration. Listeners in both of these groups seemed to depend on
similar memory traces, based on the tonal organization of standard melodies. It has been shown that
recognition of pitch patterns based on familiar melodies are high not only for experienced listeners,
but also for inexperienced listeners as well (Attneave & Olson, 1971; Dowling & Fujitani, 1971;
Smith et al., 1994). The recognition condition in this experiment may be similar to that of hearing
familiar melodies, because subjects had previously heard the standard melodies repetitively. The
difference between subject groups was evident only in the interaction between subject group and
context. The differences between recognition probabilities in each context condition were about .11
for EN and IE listeners, and .28 for AP listeners. This result is difficult to interpret as it suggests AP
listeners didn't depend on their absolute pitch ability, but rather used relative pitch information more
than non-AP listeners.
General Discussion
The present study demonstrated that recognition of ambiguous melodies was influenced by context
key in a long-term task, but not in a short-term task. These findings suggest that an ambiguous melody
contextualized in one key can be perceived differently from the same melody contextualized in
another key, when there is long retention interval between them. On the other hand, it was natural that
there was no context effect in the immediate recognition task as subjects were simply detecting any
change of pitch. This difference in the effect of context between short-term and long-term memory
seems to be analogous to the difference in the contribution of contour and pitch interval information
between short-term and long-term memory. However, as yet we don't have any positive evidence that
context has no effect on ambiguous melodies in short-term memory. Thus, it is necessary to examine
the effect of context on a short-term recognition task that is not dependent on subjects' detection of
pitch change.
The effect of context key shown in Experiment 2 could be considered part of the general context
effect in "state-dependent memory". However, we should note that this effect occurred in the special
case of melodies that could be perceived in two possible keys. If this effect could be explained by
simple state-dependent memory, recognition of "unambiguous melody" would be influenced by
context key. However, it is questionable whether a melody with only one possible key is recognized
as a different melody when it is preceded by a different key context.
The strategies used by experienced listeners to perceive and recognize pitch sequences is not yet fully
understood. In Experiment 1, there was an expected difference in recognition performance between
AP listeners and EN listeners. In this type of immediate recognition task, the strategy AP listeners use
should be clearly different from that of non-AP experienced listeners. However, this difference in
pitch perception strategy was not be found in Experiment 2. On the contrary, the opposite result to our
expectations was observed. The recognition performance was similar for all three subject groups.
Furthermore, AP listeners showed a tendency to depend more on relative pitch information or
information based on tonal organization (rather than absolute pitch information) than EN listeners. It
likely seems that AP listeners are best able to use their AP ability while rehearsing pitch names in

short-term tasks, and when having remembered the pitch names in long-term memory. In Experiment
2, AP listeners seemed unable to remember enough pitch names for each melody to use them in the
recognition session. Presumably, If AP listeners were given enough opportunities and time to store the
pitch names for each melody, their recognition performance would improve.
Inexperienced listeners recognized fewer melodies in Experiment 1. This result converges with
previous findings that the recognition task which requires the ability to accurately recognize intervals
is difficult for inexperienced listeners (Bartlett & Dowling, 1980; Cuddy & Cohen, 1976; Trainor &
Trehub, 1992). It may not be surprising that inexperienced listeners perform as well as experienced
listeners in Experiment 2, and that their performance was influenced by context key. The nature of the
task in Experiment 2 doesn't require special music ability. Every listener seemed to depend on
auditory memory traces which are not related to musical training. Yoshino (1998a) suggested that
inexperienced listeners can also carry out tonal organization as experienced listeners do, and that they
can interpret the key of a melody, at least, implicitly. Inexperienced listeners seemed to tonally
organize a melody according to the context key, and consequently perceived two identical melodies
presented with different contexts as different.
References
Abe, J., & Hoshino, E. (1990). Schema driven properties in melody cognition: Experiments on final
tone extrapolation by music experts. Psychomusicology, 9, 161-172.
Attneave, F., & Olson, R. K. (1971). Pitch as medium: A new approach to psychophysical
scaling.American Journal of Psychology, 84, 147-166.
Bartlett, J. C., & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance
effect in developmental perspective. Journal of Experimental Psychology: Human Perception &
Cuddy, L. L., & Cohen, A. J. (1976). Recognition of transposed melodic sequences. Quarterly Journal
of Experimental Psychology, 28, 255-270.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies.
Dowling, W. J. (1986). Context effects on melody recognition: Scale-step versus interval
representations. Music Perception, 3, 281-296.
Dowling, W. J. (1991). Tonal strength and melody recognition after long and short delays. Perception
& Psychophysics, 50, 305-313.
Dowling, W. J. (1994). Melodic contour in hearing and remembering melodies. In R. Aiello(Ed.),
Musical perception. Oxford: Oxford University Press.
Dowling, W. J., & Fujitani, D. S. (1971). Contour, interval, and pitch recognition in memory for
melodies. Journal of the Acoustical Society of America, 49, 524-531.
Dowling, W. J., Kwak, S., & Andrews, M. W. (1995). The time course of recognition of novel
melodies. Perception & Psychophysics, 57, 136-149.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. Oxford: Oxford University Press.

Smith, J. D., Kemler Nelson, D. G., Grohskopf, L. A., & Appleton, T. (1994). What child is this?
What interval was that? Familiar tunes and music perception in novice listeners. Cognition, 52, 23-54.
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity to Western
musical structure. Journal of Experimental Psychology: Human Perception & Performance, 18,
394-402.
Yoshino, I., & Abe, J. (1996). Cognitive modeling of the process of tonal organization in melody
perception. International Journal of Psychology, 31, 51.
Yoshino, I. (1998a). Can non-musicians interpret the key of a melody? Proceedings of the fifth
international conference on music perception and cognition, 225-229.
Yoshino, I. (1998b). Key interpretations of the melodies composed in various periods. Journal of
Music Perception and Cognition, 4, 81-99. (in Japanese)
Back to index

Dr
MUSIC IN EVERYDAY LIFE
G. Bertling
Rolf.G.Bertling@ruhr-uni-bochum.de
Background:
It is known for a long time that music is perceived and dealt with in a
different way by musicians than by nonmusicians. Whether these differences also
concern daily life activities is not yet known.
Aims:
It was aim of the presented study to find eventual differences concerning

patterns in the use of music in daily life of musicians and nonmusicians.
method:
35 musicians and 46 nonmusicians participated in the investigation and

underwent a semistructured interview concerning time, situations and purpose of
listening to music as well as musical preferences and equipment used for
listening to music.
Results:
After matching for age and sex, mean age was 28 years. Mean time of listening
to music per week was about 16 hours in both groups . While nonmusicians
listened to music significantly more for "entertainment", as "background" and
in "social situations" (dancing, disco etc), musicians dealt more with music
for educational and professional purposes. No differences could be found
concerning the number of sound recording and storage media.
Musical preferences of musicians reflected the style they were mostly occupied
with professionally (Classic, Jazz). In contrast nonmusicians preferred Pop,
Rock and modern Pop styles.
Conclusions:
The results implicate that for musicians there is no clear difference between
professional and private aspects of dealing with music, while for nonmusicians
music predominantly means "fun".
Back to index
file:///g|/poster2/Bertlin1.htm [18/07/2000 00:35:14]

Dr
MUSIC IN EVERYDAY LIFE OF PATIENTS WITH MENTAL DISEASE
Dr. med. Bertling
Rolf.G.Bertling@ruhr-uni-bochum.de
Background:
Clinical observation shows that music plays an important role in everyday life
of patients with mental illness. But there are very few empirical data
concerning psychiatric patients dealing with music in daily routine.
Aims:
This study investigates the musical education, musical skills and use of music
in daily life of patients with mental disease in comparison to a sample of sane
people.
Method:
After matching for age and sex 131 psychiatric in-patients of different
diagnostic subgroups and 86 sane people (mean age 43 years) were applicated a
selfconstructed questionnaire.
Results:
No differences could be found in percentage of playing an instrument, begin and

length of musical education, actual ability to read a musical score, percentage
of participation in an ensemble, time of listening to music per week and
preference of musical styles (classical vs popular music). There were
signifikant differences concerning some situations in which music was heard as
well as concerning the purposes for which music was heard. Interestingly there
was a (negative) correlation between age and the time of listening to music per
week in the sane people, whilst this time remained unchanged in the mentally
ill.
Conclusions:
Musical education, musical skills and actively making of music are comparable
to the normal population. Concerning the receptive use of music there seem to
be different patterns between the psychiatric patients and the normal
population. This might indicate that psychiatric patients use music in a
specific mode.
Back to index
file:///g|/poster2/Bertlin2.htm [18/07/2000 00:35:15]

Djego
Proceedings paper
An exploratory psychological investigation into the "meaning" of popular music style

SONNY DJEGO
INTRODUCTION
It has recently been suggested that, "â€¦music can be perceived by listeners as if it were equivalent to a person making a disclosure."
(Watt & Ash, 1998:37). The suggestion is formulated by analogy with spoken communication. For example, if I were to stand before
my peers and deliver this paper, the information that I would be giving would comprise not only of documented facts and ideas but
would also involve communicating something of myself - my nature or personality. Thus, although my principal intention may be to
maintain the listeners' interest through a cogent and logical oration, I would also be revealing some uniquely personal qualities as well
as some more general qualities that are capable of identifying me as belonging to particular categories. My voice is capable of
revealing my sex, my approximate age and my ethnic origin, for example. The qualities of my utterances, even in the absence of visual
clues, may reveal something of my present health or state of mind, as well as something of my personal character. It is assumed that
an audience can detect these qualities during the communication even if they are not specifically attending to such. The presumption is
that by the time I have finished speaking my audience will have not only heard the message I intended to convey but will also have
come to some understanding, or have formed some opinion, concerning â€˜who I am'. In other words, apart from what I said, how did
I come across? Was I confident, honest ? or was I obviously concealing something: Have I doctored my experimental results or am I
just nervous? Do I sound â€˜intelligent' or â€˜interesting' ? do I like or care about the people I am addressing? And on a more personal
level, am I attractive ? and if so, to which sex? Do I sound trustworthy; would you buy a used car from me or let me go out with your
son or daughter? (Hendrix, 1997; Harris & Busby, 1998; Robinson, Obler, Boone, Shane, Adamjee & Anderson, 1998).
Watt & Ash (1998) report that, when forced to make choices between binary opposite adjectives to describe auditioned music, levels
of agreement between respondents appear significantly higher for adjectives assumed to be associated with descriptions of people than
those not normally associated with people. A comparable study using food stimuli rather than musical stimuli did not reveal the same
high levels of inter-respondent agreement. Thus, it was hypothesised that music has an action upon listeners similar to the actions that
the â€˜qualities of a person' have upon listeners during spoken communication. In other words, music conveys an impression of its
â€˜self' or â€˜identity' in addition to communicating whatever is â€˜meant' by the actual compositional text. This represents a most
interesting departure from previous empirical work because the meaningful natures of these â€˜disclosed' representations are
considered akin to psychological aspects of people rather than descriptions of the affective value of sound expressed using emotional
descriptors (e.g. Hevner, 1936; Wedin, 1972). Different musics, therefore, may be said to have different â€˜personalities' that are
somehow distinct from the â€˜message' actually conveyed by the musical text. The â€˜message', to use the terminology of Meyer
(1956), is informed by embodied meanings (sonic stability, tension and resolution) and by designative meanings (suggestions of
symbolic references to extra-musical phenomena). The notion of disclosure meaning, however, rests less with the â€˜message' and
more with the â€˜musical informant'.
Adopting a forced-choice paradigm requires one to consider the degree to which binary opposite adjectives are both relationally
appropriate and commonly understood. Gross, Fischer & Miller (1989) note that, "â€¦the basic organisational relation between
adjectives has generally been assumed to be antonymy." (p.92). The authors demonstrate experimental support for the relative strength
and perceptual salience of adjectives inferred from the amount of time taken to judge opposites, i.e. to identify antonyms. More
importantly, they provide evidence regarding the different levels of antonymic classification and suggest that there exists a set of
â€˜direct antonyms' which are most easily and readily identifiable as having relational opposition. In addition, it seems that these
strongest antonymic dimensions (direct antonyms) are also words that are used more frequently in spoken and written language.
In a preliminary study, 6 members of staff in a university department specialising in popular music were asked to rate the plausibility
of describing either people or music using the same 40 descriptive dimensions (binary opposite adjectives). These included the
adjective-pairs originally used by Watt & Ash (1998) and the direct antonyms identified in the study by Gross et al (1989). Two
questions were asked: 1) â€˜In general, how often could MUSIC be plausibly described using the following dimensions?'; and, 2)
â€˜In general, how often could PEOPLE be plausibly described using the following dimensions?' Participants were required to circle
one response on a five-point Likert-type scale comprised of the possible responses â€˜very often' ? â€õften' ? â€õccasionally' ?
â€ñot often' ? â€˜hardly, if ever' for each of the forty dimensions in respect of both questions (music/people descriptions).
The least plausible dimensions to describe either people or music were: â€˜inside/outside'; â€˜clear/cloudy'; â€˜dry/wet';
â€˜prickly/smooth'; â€˜sweet/sour'; and â€ñear/far'. All these dimensions were considered plausible descriptors less than
â€õccasionally'. Repeated measures t ? tests revealed no significant statistical differences between respondents' plausibility ratings
when applying these dimensions to descriptions of people versus music. A further six dimensions were found to be highly plausible to
apply to descriptions of people but much less plausible to apply to descriptions of music. These dimensions were: â€˜rich/poor';
â€˜friendly/unfriendly'; â€˜male/female'; â€˜patient/impatient'; â€˜shy/outgoing'; and â€˜honest/deceitful'. Responses indicated that
such dimensions were considered likely to be used to describe people more than â€õccasionally', but less than â€õccasionally' to
file:///g|/poster2/Djego.htm (1 of 6) [18/07/2000 00:35:17]

Djego
describe music in spoken and written discourse. Repeated measures t ? tests revealed that for these dimensions the differences
between respondents' plausibility ratings (comparing applicability of descriptive dimensions to people versus music) were all
statistically significant (at least p < 0.01, two tailed).
Interestingly, of the person-type attributes identified by Watt & Ash (1998) the dimension â€˜good/evil' revealed no significant
difference in its applicability to descriptions of either people or music. Additionally, the dimensions â€˜joyful/sad' (t = 2.91, p < 0.05),
and â€˜gentle/violent' (t = 2.67, p < 0.05) were actually considered more likely to be adopted for descriptions of music rather than
people.
Watt & Ash (1998) found no evidence to suggest that respondents' attribution of person-like qualities to music was capable of
delineating similarities in musical style. However, the stimuli they adopted were comprised of very short extracts of commercially
available recordings (3 - 5 seconds each) spanning six centuries of composition. Given such short presentations together with no
details concerning the location in the musical pieces from which the extracts were taken, nor indeed the names of the pieces sampled,
it seems impossible to make any inferences concerning the issue of style. If style is assumed to refer to a typical musical construction
that can be classified using a conventional nominal label then it seems common sense to suppose that some understanding or
recognition of stylistic â€˜meaning' requires time to become established. In discussing the dimensions for analysing a complete
composition (as propounded by Umemoto, 1990), Berz & Kelly note that the, "â€¦content dimension appl[ies] best to an extended
piece of music in which certain ideas are presented and developed and to which the listener can be expected to have both an affective
and cognitive response." (1998:176). In order that the typicality of a musical style may be conveyed, then, there is a need for, "â€¦high
intrastimulus homogeneity, i.e, the homogeneity of each excerpt (over time) with respect to main character or "expression."" (Wedin,
1972:243). The use of real music, whether sourced from commercial recordings or newly composed, carries with it the possibility that
the respondent may (mis)recognise the piece or the artist or the writer. As such, preconceived ideas about the concepts surrounding
that piece or that band etc (i.e. extra-musical knowledge) may interfere with the perception of any informational content which is
assumed to be disclosed in audition. An alternative possibility to using pre-recorded or newly recorded real music is to use an
instrument such as an electronic keyboard that incorporates a â€˜library' of typical musical styles referred particularly to the popular
music domain.
In a second preliminary study, 42 auto-accompaniment musical styles sourced from an electronic keyboard (Yamaha PSS680) were
played to 34 second-year undergraduates studying in a university department specialising in popular music. The musical extracts,
recorded onto cassette, were each separated by a 5-second silent gap and respondents were asked to write down what they thought
each stylistic exemplar should be called. Extracts were recorded in a quasi-random order and each was assigned a random tonal centre.
All extracts were in the major mode and each had a duration of approximately 18-seconds (range: 12-22 seconds depending upon
default tempo and structure).
Individual participants gave between 4 and 42 responses when asked to name the musical styles auditioned
.
14 musical styles were immediately discarded on the basis that eleven or more respondents (i.e. at least 30%) had provided no answer
and that there was no consensus of opinion regarding stylistic labelling by the participants who did provide a response. Responses to
the remaining styles were put into 4 categories, arbitrarily entitled â€˜correct' and â€˜similar' ? the "positive pole", or â€˜incorrect' and
â€ño answer' ? the "negative pole". The experimenter determined the categorisation of responses with reference to the auditioned
stimuli themselves, rather than the manufacturer's labels (which were renamed according to responses given, if necessary). The
following were considered prototypical exemplars of particular popular music styles based on responses demonstrating a bias in
favour of the "positive pole": hillbilly; country; heavy metal; rock; rock â€ñ' roll; blues-rock; blues; boogie-blues; swing band;
modern jazz; bossa nova; samba; latin pop; rap/hip-hop; funk; soul-funk; disco; and techno pop.
EXPERIMENTAL HYPOTHESES
1) When forced to choose between adjectival antonyms to describe musical stimuli, respondents will demonstrate significantly
higher levels of agreement in their preferences for person-like adjectives than non-person-like adjectives.
2) The adjectival dimensions associated with person-type attributions that demonstrate significant levels of participant
agreement will reveal clusters of stimuli that group popular music by developmental/stylistic similarity.
In the absence of adjectival dimensions considered particularly appropriate to descriptions of music, it is proposed that respondents
will suggest that there is some content to the musical information presented that can be better explained by reference to attributions
more commonly associated with descriptions of people. In addition, it is an a priori assumption that the significant person-type
attributions will group stimuli by stylistic traits.
EXPERIMENT: ATTRIBUTION OF ADJECTIVES TO MUSIC
Participants:

Djego
Participants (male = 14, female = 18) were volunteers from a variety of backgrounds between the ages of 20 and 62 (ï•¸ = 39.22, ≥ =
11.98). No participants had studied music or psychology at advanced levels (i.e. in higher education) and none reported having
hearing difficulties that affected their enjoyment of music. All but one (a postgraduate student studying at a British university)
reported English to be their first/native language. Eight participants considered themselves to be musically active (i.e. played music or
sang in their leisure time). Four of these participants had received musical instruction in the past but none had studied music in an
academic setting since leaving school. Participants reported very broad ranges of personal tastes in music preference.
Materials, Design and Procedure:

Response booklets were prepared with separate pages ? one for each stimulus - listing the 12 pairs of antonyms derived from the
preliminary adjective survey in a quasi-random order and bias: 6 person-type dimensions, 6 non-person-type dimensions. The 18
exemplars of popular music style derived from the second preliminary study were recorded onto tape-cassette preceded by the spoken
instruction: â€˜Please turn over the page and prepare to respond to music number #.' A thirty-second silent gap separated each of the
stimuli. The musical stimuli were recorded in a quasi-random order such that seemingly similar musical styles were not adjacent to
one another in terms of presentation order. The quasi-random order of stimuli presentation was re-recorded in reverse order on Side B
of the tape cassette and the order in which the sides of the tape was played was counterbalanced across participants. In addition to the
experimental stimuli, two further examples of musical styles were recorded at the beginning of each side of the tape. These were
played prior to the experimental session proper in order to familiarise participants with the procedure. Participants were requested to
listen to the musical extracts and to choose (by circling) one adjective from each of the 12 dimensions that they thought best described
each musical extract. The task was performed either individually or in small groups (n â‰¤ 5).
Results:
If a binomial distribution is assumed based on a chance level of 16 (i.e. a half-and-half split between participants' responses in terms
of the preferred descriptor in each dimension) then the amount of deviation from the chance level at which preferred responses
become statistically significant (p < 0.05) is 5. One way to represent the extent to which adjectival dimensions appear valid measures
of musical perception is to count the number of stimuli for which a significant deviation from chance was revealed per dimension. If a
further binomial distribution is assumed, based on a chance level of 9 (i.e. significant deviations might be expected in half of the
stimuli), then at least 13 out of the 18 stimuli should evoke significant agreement amongst participants' descriptive attributions for a
dimension to demonstrate a general deviation from chance across all stimuli.
Six dimensions elicited participant agreement that significantly deviated from chance in at least 13 out of the 18 stimuli presented.
Five of these dimensions (â€˜friendly/unfriendly'; â€˜male/female'; â€˜patient/impatient'; â€˜shy/outgoing'; and â€˜honest/deceitful')
are within the person-type category. The fact that the presumed person-type dimension â€˜rich/poor' has not elicited such high levels
of participant agreement in choice of preferred descriptor whilst the non-person-type dimension â€˜smooth/prickly' has, implies that
experimental hypothesis 1 has been confounded. Nevertheless, these results are encouraging and do appear to suggest that adjectival
dimensions that can be subsumed into the person-type category are, generally speaking, more informative measures of psychological
effect.
A cluster analysis was conducted on the data obtained from the five person-type dimensions that revealed the higher proportion of
significant deviations from chance in terms of participants' adjectival preferences. The dendogram in figure 1 shows the clusters
revealed. Three main clusters are revealed. The third cluster, on the extreme right, contains the related styles rock & heavy metal.
These can clearly be considered out-liers belonging to what might be termed the â€˜hard rock' genre. Reference to the preferences of
adjectival attribution reveals that the ascriptions â€˜unfriendly', â€˜deceitful', â€˜male', â€õutgoing' and â€˜impatient' typified these
styles. The second cluster, to the immediate left of the third cluster, contains six styles. Reference to the expected groupings of styles,
based on conventional popular music discourse, suggests that all but one (country) of these styles â€˜belong' together ? perhaps
generally classifiable as residing within the North and Central/Southern American-influenced Jazz and Latin genres. This second
cluster is typified principally by the adjectival ascriptions â€˜friendly', â€˜honest' and â€˜patient', with boogie-blues & modern jazz
differentiated by the additional ascription â€õutgoing', and latin pop & bossa nova by the ascription â€˜female'. The adjectival
ascriptions â€˜male', â€õutgoing' and â€˜impatient' generally typify the first cluster, which is situated on the left-hand side of the
dendogram. It is slightly more complex than the other clusters but nevertheless divides into three somewhat simpler sub-clusters. The
first and second of these sub-clusters contain eight styles that may be considered to demonstrate various influences of Afro-American
dance music and Afro-diasporic â€˜protest culture'. The first sub-cluster appears to group dance-oriented styles, with disco & techno
pop differentiated from funk & soul-funk by the additional ascription â€˜honest'. The second sub-cluster appears to group styles that
are more associated with 'serious' reflections (often lyrical) upon the human social condition. Blues, blues-rock & rap/hip-hop were
differentiated from the other â€˜dance-influenced' styles because no significant agreement was demonstrated in respect of attributions
within the dimension â€˜patient/impatient'. The third sub-cluster, however, is ambiguous in terms of expected stylistic grouping.
Samba would have been expected to be located closer to the other Latin-American styles, i.e. bossa nova & latin pop ? all three of
which, incidentally, are notable for being the only stimuli to attract responses preferring the attribution â€˜female'. Similarly, hillbilly
would have been expected to be grouped somewhat nearer to country, assuming that a developmental relationship exists between the
two. Despite such anomalies, which will be discussed further in the following section, it is considered that this cluster analysis
demonstrates partial support for the hypothesis that respondents' preferred choice of adjectives assumed to refer to person-type
attributions are capable of grouping musical stimuli by style.

Djego
Fig. 1: The results of a cluster analysis conducted on the five dimensions assumed to lie within the person-type
attribution category.
Discussion:
The results of the adjectival attribution experiment indicate some support for experimental hypothesis 1, however, there remain
anomalies that require the null hypothesis to be retained at this stage. In the person-type category only one dimension has proven to
confound experimental hypothesis 1 in terms of the overall results. The dimension â€˜rich/poor' was included in this category based
upon the results obtained in the preliminary adjective survey. In retrospect, it is conceded that there is a serious possibility that
respondents may have interpreted this dimension in differing manners and that this might explain why substantially less agreement
was demonstrated in terms of preferred descriptor. There is semantic confusion evident within this dimension that is
context-dependent and which was overlooked in preparation of the final experimental design. In terms of person-type attributions, the
dimension â€˜rich/poor' refers to relative affluence. However, there is clearly a strong alternative attribution that may be made to the
dimension and which could perhaps be situated within the context of â€˜value-judgement'. Thus, as a dependent variable, this
dimension is probably invalid since it cannot be guaranteed to be measuring the attribution intended. Indeed, a few participants who
expressed criticisms of the stimuli in terms of â€˜musical pleasure' suggested that the synthetic nature of the sounds was â€˜poor' and
that this was reflected in their attributions within the dimension â€˜rich/poor' specifically. As regards the non-person-type dimensions,
at least one (smooth/prickly) and possibly two (sweet/sour) dimensions may have confounded the experimental design. In the case of
the latter, it is possible that â€˜sweet' may actually be considered an appropriate descriptor of music by musically unsophisticated
respondents, thus incorporating a natural bias toward the adjective. It is possible that the experimental stimuli may have influenced
this result given that each stimuli was in the major mode and harmonically consonant. This may perhaps be evidenced by the fact that
80% of the significant deviations from chance attribution were in favour of â€˜sweet' as opposed to â€˜sour' (cf. â€˜smooth/prickly' for
which significant deviations from chance were split equally). In the case of â€˜smooth/prickly', the feedback from one participant was
particularly notable. After suggesting a tendency to think about the musical stimuli as if they represented different types of people, this
participant then stated that it seemed more straightforward to choose an adjective from some dimensions because it was natural to talk
about people using certain descriptors. An example cited in respect of this natural tendency was that of â€˜smooth/prickly'!
The cluster analysis performed on the five dimensions assumed to be associated with person-type attributions reveals partial support
for hypothesis 2. Due to the reservations expressed above concerning validity, the dimension â€˜rich/poor' was omitted from the
cluster analysis. The grouping of stimuli, illustrated in figure 1, revealed three main clusters. With reference to the

Djego
historical/genealogical development of popular music styles, the cluster analysis suggests that preferences for descriptors presumed to
refer to person-type attributions are capable of grouping stimuli into â€˜families' that demonstrate an affinity with stylistic
expectations. Nevertheless, the clusters revealed do not map perfectly onto the expected grouping of popular music styles. In
particular, two important questions are raised: why is there no obvious country ? rock â€ñ' roll ? blues connection? And why are
samba and hillbilly nested together?
Samba & hillbilly (presumed to be distinctive styles of disparate heritages) were nested together typified by the ascriptions
â€˜friendly', â€˜honest', â€õutgoing' and â€˜impatient'. The latter two ascriptions differentiated samba from the other Latin-American
influenced styles (bossa nova & latin pop). Similarly, â€õutgoing' and â€˜impatient' differentiated hillbilly from country which, in
turn, was curiously nested with swing band (assumed to be a style subsumed within the Jazz genre). In addition, there was no clear
evidence to suggest that styles associated with Country or Blues genres were perceived in a manner similar to that of rock â€ñ' roll.
Although rock â€ñ' roll is associated with dancing, it is generally considered to have evolved from the generic influences of Blues
and Country styles (Stuessy, 1994). Thus, it seems somewhat curious to find it clustered with â€˜funky' dance-influenced styles.
The expectation that bossa nova, latin pop, and samba should be related may demonstrate a degree of naivety, upon the author's part,
in automatically assuming that just because these styles share an apparent heritage they should all convey information that is revealed
by similar person-type attributions. Samba, for example, might be considered representative of a strong Southern American social
tradition that links it with the strong community spirited hillbilly style of Northern America. Participants may have described these
stimuli based upon some conception of â€˜social framework'. On the other hand, since both styles have similar musical surfaces ? fast,
bright tempo, fluid motion of a carnivalesque nature ? it may simply be the case that respondents chose the words considered to be the
closest metaphorical approximations to musical description. Both samba and hillbilly were considered â€õutgoing' and â€˜impatient'
(agreement was highly significant). Does this really mean anything more than fast and lively ? allegro vivace? It is tempting to
speculate that this might be the case; after all, it is not unreasonable to state that many popular music styles can be considered lively,
or â€õutgoing'. It is also the case that only 3 of the stimuli elicited responses favouring â€˜shy' and that all of the significant
deviations from chance attribution were in favour of â€õutgoing' (14 of the 18 stimuli). However, it is also the case that both hillbilly
and samba are quite fast stimuli in terms of tempo - 126~130 b.p.m. ? perhaps, one may say, â€˜impatient'. Nevertheless, the fact also
remains that in terms of adjectival preferences, â€˜impatient' was chosen for 10 stimuli (of which 9 demonstrated significant
agreement deviating from chance) and â€˜patient' was chosen in preference for 7 stimuli (6 significant). Given that the attribution
â€˜impatient' was preferred in four cases where stimuli were presented at tempi slower than those of hillbilly & samba, and that the
attribution â€˜patient' was preferred in four different cases where stimuli were presented at tempi equal to or faster than those of
hillbilly & samba, this casts some doubt on the speculation that the dimensions are simply reflecting metaphorical inferences
conceived principally from perceptions of the musical surface.
GENERAL DISCUSSION
This investigation has been termed â€˜exploratory' in deference to contemporary criticisms of the traditional positivistic paradigm
(e.g. Persson & Robson, 1995). The empirical design has been reported with consideration given not only to the interpretation of
quantitative data but also to the qualitative insights provided by participants. Because of this, it is possible to make strong
recommendations for improving experimental designs aimed at investigating the concept of disclosure meaning in music. The present
research process has revealed that further scrutiny of the dependent variables is warranted. As noted in the previous section one cannot
confidently assume that adjectival dimensions will fit neatly into categories suggestive of either person-like attributions or
non-person-like attributions. For example, the dimension â€˜rich/poor' may refer to a person's wealth but may equally refer to some
â€ãesthetic value' associated with the description of music. The justification for empirically determining the classification of
adjectival dimensions arose originally because of suspicions that it is all too easy to infer that little or no musically plausible
attribution will be considered concomitant with certain descriptive terms where, in fact, there may exist an arbitrary, yet conventional,
discourse that incorporates such terms. The first preliminary study supported criticisms of some of the dimensions used in the original
study by Watt & Ash (1998). However, the results of the present attribution experiment and the participant feedback discussed in the
preceding section imply that it is not sufficient to survey only â€˜musical experts' when attempting to classify adjectival dimensions
into categories intended to delineate particular attributions. If musically unsophisticated individuals represent the population under
investigation, then there is clearly a requirement to assess the extent to which descriptive attributions are considered plausible
indicators of the phenomena under investigation by musically unsophisticated respondents themselves. As previously noted, some
participants do use words like â€˜smooth' and â€˜prickly' to describe people, and possibly â€˜sweet' to describe music: the quantitative
data evidences sympathy with this.
One of the main emphases of the present investigation has been to explore the possibility that disclosed meanings in music are capable
of delineating the stylistic similarities of different musics. There does seem to be at least partial support for the assertion that
attributions of person-like qualities to music do group music by style. Further, it is suggested that the level of support demonstrated for
this assertion rests upon the basis of musical perceptions that are clearly derived from the general typicality of the stimuli heard as
opposed to the specific characteristics of musical extracts from individual works. Thus, in the absence of contextual information, such
as the idiosyncrasies of expression that may give rise to â€˜extra-musical' information about a piece of music, it is suggested that the
information disclosed is textual, i.e. contained within the musical construction. In other words, the agreement demonstrated by
participants in their attributions of person-like qualities to music, generally speaking, is based upon perception of the text, i.e. the
heard characteristics of the musical stimuli. Thus, the attributions made are not dependent on any preconceptions informed by the

Djego
participants' personal history of exposure to, and knowledge and experience of, specific musics and what they are supposed to be or
mean.
The studies reported herein do lend support, overall, to the general hypothesis that music has disclosure meaning (Watt & Ash, 1998).
However, further research is required before the true importance of this line of inquiry can be assessed. Unlike previous studies, e.g.
Hevner (1936); Wedin (1972); Juslin (1997), this investigation has not concentrated on descriptions of â€˜emotional qualities' in
music per se. Rather, it has concentrated on descriptions of music informed by the assumption that music may be perceived as if it
were capable of conveying â€˜person-like qualities'. The â€˜person-type' adjectival dimensions utilised herein are considered to reflect
â€˜people traits' (i.e. the relatively stable, or very slowly changing, aspects that describe people) rather than â€˜people states' (i.e.
"â€¦qualities that can vary rapidly, such as emotions." [Watt & Ash, 1998:50]). As I have suggested, further measures to validate
dependent variables in order to be confident that they reflect the phenomenon of person-like attribution are to be encouraged.
Notwithstanding this, the present attribution experiment already provides encouraging insights into this process of validation. Prima
facie, the five adjectival dimensions belonging to the person-type category that elicited high levels of respondent agreement do not
appear to be emotionally charged but nevertheless seem capable of delineating some form of meaningful information perceived during
musical audition. In addition, the person-like attributions reported do appear at least partially capable of delineating the similarities
that are expected to characterise and differentiate popular music styles. The musical stimuli adopted herein are considered prototypical
exemplars of particular styles of music and, as such, it is granted that they may be somewhat â€˜idealistic' variables. Future inquiries
are encouraged which utilise differing stimuli and the exploration of music using the present paradigm, if not improving hypothesis 2,
has implications for the understanding of musical perception that may impact upon research into complementary disciplinary tracts.
Cultural and social analyses of music, especially popular music, are often underpinned by an interest to discover more about what
music means to listeners. The psychological representation of musical meaning may have valuable contributions to make towards the
understanding of musical movements, the musicians, their fans and the commodification of musical production.
REFERENCES
Berz, W.L. & Kelly, A.E. (1998) Research note: Perceptions of more complete musical compositions: An exploratory
study. Psychology of Music 23, 39-47.
Gross, D., Fischer, U. & Miller, G.A. (1989) The organization of adjectival meanings. Journal of Memory and Language
28, 92-106.
Harris, S.M. & Busby, D.M. (1998) Therapist physical attractiveness: An unexplored influence on client disclosure.
Journal of Marital and Family Therapy 24.2, 251-257.
Hendrix, K.G. (1997) Student perceptions of verbal and nonverbal cues leading to images of Black and White professor
credibility. Howard Journal of Communications 8.3, 251-273.
Hevner, K. (1936) Experimental studies of the elements of expression in music. American Journal of Psychology 48,
246-268.
Juslin, P.N. (1997) Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgement policy. Musicae Scientiae 1.2, 225-256.
Meyer, L.B. (1956) Emotion and Meaning in Music. Chicago: Chicago University Press.
Persson, R.S. & Robson, C. (1995) The limits of experimentation: On researching music and musical settings.
Psychology of Music 23, 39-47.
Robinson, K.A., Obler, L.K., Boone, R.T., Shane, H., Adamjee, R. & Anderson, J. (1998) Gender and truthfulness in
daily life situations. Sex Roles 38 (9-10), 821-831.
Stuessy, J. (1994) Rock and Roll: Its History and Stylistic Development [2nd edn.]. Englewood Cliffs (NJ): Prentice Hall.
Umemoto, T. (1990) The psychological structure of music. Music Perception 8, 115-128.
Watt, R.J. & Ash, R.L. (1998) A psychological investigation of meaning in music. Musicae Scientiae 2.1, 33-53.
Wedin, L. (1972) A multidimensional study of perceptual-emotional qualities in music. Scandinavian Journal of
Back to index

Anna-Karin Gullberg
ROCK MUSIC MAKING -- A POSSIBILITY WITHIN A UNIVERSITY COLLEGE OF MUSIC?
Anna-Karin Gullberg
anna-karin.gullberg@mh.luth.se
Background:
The fact that rock music sounds different within and without University College
of Music has been known and discussed for years among teachers, musicians and
researchers in the field of music, as well as in the field of popular culture
and youth studies. It seems that methods of learning and practising are very
influential on expression, aesthetics and music taste. Few researchers have
studied this question empirically and the body of knowledge is not so
comprehensive.
Aims:
This study investigate how two ensembles differs in learning strategies,
expression and music taste when it comes to create and perform a rock song. An
important part of the project is to pinpoint crucial turning points during the
learning process and to compare the way of working between the groups.
method:
Two groups of musicians, one from a University College of Music and the other,
a formally non-educated rock group were asked to create a rock song from a
melody with lyrics composed especially for this project. Each group had one day
in a recording studio to record and mix their versions of the song. Information
concerning thinking and acting before and during the recordings was collected
by interviews and the two music making situations were observed and
video-taped.
Results:
The version of the music students was a jazzy pop tune and the Rock group made
a hardcore tune. Data from interviews and observations showed separate
attitudes to music making and to learning strategies when playing in an
ensemble. The method of the music students was distinguished by being
thoroughly democratic and polite. The rock group learned by the singer who did
all the arrangements.
Conclusions:
This would suggest that the way one is performing music and which genre that is
to be favoured is motivated by the way one has gathered knowledge about music.
With this in mind, one can wonder if higher music education is giving the
students the opportunity to develop the qualifications needed for becoming an
open-minded and professional teacher and performer in music.
Back to index
file:///g|/poster2/Gullberg.htm [18/07/2000 00:35:18]

KantorMa
Proceedings paper
Influence of background music with distinct or unclear melodic line on reading

comprehension.
Joanna Kantor-Martynuska
Jagiellonian University, Cracow, Poland
INTRODUCTION
The starting point of the research was the interest in how musical background affects the effectiveness
of studying. This problem mainly concerns secondary school students who regard music as an integral
element of their daily life. At that age listening to music is one of the most important spare time
activities. Contact with music exceeds the frames of entertainment and plays a significant role in the
process of getting autonomy in adolescence. Music accompanies studying, is a symbol of identification
and a measure of separation from the adults' world. It's teenagers who listen to music to the largest
extent. An average amount of time when they're involved in listening to music is no more intensive
either before or after.
The main idea of the research concerns the effectiveness of mental tasks performance in the presence of
musical background. The challenging aspect of this problem is a diversity of the influence that various
types of music may have on human mental processes. As the initial survey reveals, adolescents listen to
many different kinds of music while studying. In such a context only a small number of them doesn't
listen to any music at all. The study carried out by Neilly (1995 see Sloboda, 1998) among
undergraduate students shows that half of them use music while they study. However, the questionnaire
accompanying the experimental research presented here reveals that only 30 per cent of 18-year-old
secondary school students listen to music while studying. A question arises: Does music have any
influence on their studying efficiency in terms of affecting their cognitive abilities?
The research concentrated on the problem of effectiveness of reading comprehension with attention
paid to individual differences and aural conditions of the verbal task. It was meant to check whether
there is any difference in the performance of the tasks that demand verbal reasoning in silence and with
two types of music, concerning such variables as intelligence and extraversion. This issue strongly
corresponds with young people's study habits when studying texts is a widely applied method of
knowledge acquisition particularly in the educational system. As adolescents' mental activity, studying
is often accompanied by music. The research area combines three branches of psychology: cognitive
processes, individual differences and psychology of music. It also corresponds with the problem of
studying as knowledge acquisition in the process of taking in new information by the conceptual system
that already exists. The pragmatic aim of the study was to provide young people with the information on
the effectiveness of their work in the conditions that they choose or create themselves.
Here stress was put on the dimension of melody clearness as a factor impairing performance of the
reading comprehension test. It was assumed that melodic music is easy to be perceived and to draw
listener's attention. Improvisational music is regarded as more difficult to listen to because of its unclear
melodic structure. This type of music is assumed to demand more effort to be perceived. As a
consequence, melodic music occupies mental resources to a wider extent than improvisational music
file:///g|/poster2/KantorMa.htm (1 of 10) [18/07/2000 00:35:20]

KantorMa
with an unclear melodic line, so it may cause a stronger impairment of mental tasks performance.
COGNITIVE PROCESS IN READING COMPREHENSION
Learning by studying texts is a common way to acquire new information. As an important part of our
daily cognitive activity it should be done as effectively as possible. In the process of text
comprehension new information is referred to the conceptual system of representations stored in
memory. Acquisition and storage of declarative knowledge is the domain of semantic memory which is
responsible for attaching meaning to the perceived information and for the process of coding it. Craik
and Lockhart (1972) state that the extent and level of information processing influences the
effectiveness of storage and retrieval. The storage of information in semantic memory follows the
analysis and transformation of cognitive structures. These processes take up much attention but provide
the best results of studying. Text comprehension is due to the consequent construction of mental
representation of the text. The process of studying a text requires fixed attention. Therefore it is easy to
be disturbed by the musical background that partly occupies the attention span.
At a certain grade of task difficulty the speed of mental operations of intelligent people is faster than of
those who are less intelligent. They can also resist distraction more easily. Consequently, their attention
span is bigger because of the larger capacity of the information processing system and their
performance of mental operations is more efficient. Despite the fact that they pay their attention partly
to the distracting stimuli, they can deal with mental tasks better than less intelligent people.
MUSIC "PERCEPTION"
Music is an important factor of the youth identity development. Listening to music meets their need for
stimulation. When held as auditory background of the prior task, music is perceived unintentionally in
form of passive reception (Jordan-SzymaÅ„ska, 1991) which is rather hearing the music than listening
to it. Hearing the music brings about auditory impressions experienced without any involvement.
However, when auditory stimuli are attractive and easy to attend to, they distract attention from
performing the main task. In such cases mental resources which could be put into dealing with the task
are consumed by processing irrelevant information. Nevertheless, unintentional perception - reception -
doesn't involve mental resources to such a degree as conscious efforts do. Musical elements (e.g.
melodic theme) that strongly attract attention are processed focally and thus most thoroughly in working
memory (Sloboda, 1985). This fact is of great importance in terms of text comprehension with musical
background. Relations within the melodic line can be perceived by means of attention. That's how the
melody can be recognised and compared with the previous or accompanying tone sequences within the
piece of music. Other melodic lines form the harmonic background that is not processed focally. A
listener "drifts" with the melody that attracts his attention and follows the appearance of sound relations
(Sloboda, 1985). In improvisational music the element of predictability is only marginal or even
unnoticeable.
Some authors claim that music with a clear melodic line is easier to remember than an incoherent
progression of tones. The perception of melody is optimal when the golden mean between its coherence
and variability is found (Jordan-SzymaÅ„ska, 1990).
Musical sequences must be redundant to some extent so that the perceptual organisation of sounds
would be possible. The melody contour can be slightly modified but its general shape remains the same.
Melodic contour is an important element of musical structure as it integrates the piece and enables the
holistic perception of music (Patel and Peretz, 1997). Unclearness of the melodic line disturbs the
perception of the piece and makes it more difficult to be remembered. As to melodic music, a piece
based on clear melorhythmic motives is perceived gradually as a whole structure (Farnsworth, 1958 see
WierszyÅ‚owski, 1979). The melodic theme becomes a figure and all the other musical elements make

KantorMa
up its background.
The melodic structure in improvisational music is unclear. To the listener the relations between sounds
seem to be more accidental. The element of foreseeing the following tones is only marginal so the
holistic perception and cognitive organisation of the melody need more effort. For those who are not
experienced in listening to the music that comprises improvisation, its perception is much harder than
that of melodic music. To fix attention on it, one needs a much greater mental effort than in the case of
melodic music. The way in which music influences the effectiveness of mental operations depends on
both the types of music and the task.
THE INFLUENCE OF MUSIC ON TASK PERFORMANCE
The research run hitherto indicates that the performance of complex mental tasks with musical
background is impaired, compared with those performed in silence. The results of the research on this
problem diverge. Furnham and Bradley (1997) state that in introverts musical accompaniment of
reading affects the later recall of the information acquired. According to the study of Freebourne and
Fleischer (1952 see Furnham and Bradley, 1997) in such conditions there's no difference in terms of the
level of performance between people varying as to their intelligence level. This finding seems to be
incompatible with Kahnemann's theory of limited mental resources (1973) and with the assumption that
intelligence determines the size of mental resources. Of course, peaceful and quiet baroque music is
widely believed to improve the effects of foreign vocabulary learning. However, a vast majority of
young people of today do not listen to such music at all. Eysenck's theory claims the nervous system of
introverts to be sensitive to the stimuli overloading whereas extraverts need an environment rich in
stimulation. The past research proves that the reaction of extraverts to the distracting stimuli is weaker
than in the case of introverts. Due to extraverts' low activation level it's easier for them to resist
distraction. According to Eysenck's theory auditory stimulation in form of musical background is
believed to disturb introverts' work significantly. It is also supposed to raise the level of extraverts'
performance by helping them to reach their optimal activation level for dealing with complex mental
tasks. On the other hand, KoneÄ•ni (1982 see Furnham and Bradley) assumes that each kind of music
takes up mental resources and can be detrimental to all subjects regardless of their extraversion level.
MUSIC IN THE EXPERIMENT

On account of a stunning diversity of the kinds of music which young people listen to while studying,
the experimental musical background has been chosen arbitrarily. The main assumption about the
construction of this background was that it should comprise elements common for several kinds of
music that were indicated in the initial survey as those which accompany studying activities most
frequently. The main kinds of music mentioned were rock, jazz, acid jazz, classical and film music.
On account of the experiment two soundtracks - a melodic and improvisational one - were composed of
parts of instrumental music pieces, rather fast-paced, all alike as regards dynamics and tempo. They
were also unified as to their stimulatory power within each soundtrack as well as relating to each other
but differed in terms of their character. Music of this type was regarded as representative for the music
that secondary school students listen to. It combines features of the styles that are the most popular
study accompaniment - rock and jazz.
Both soundtracks were constructed as collages. The melodic one - out of the fragments with a distinct
melodic line easy to perceive and remember and the improvisational one - out of the improvisational
parts of music pieces, in which the process of picking up any melodic order is seriously impeded or

KantorMa
even impossible. Music with a clear and distinct melody is easier to remember than an incoherent
progression of sounds whose perception needs certain knowledge and listening habits. A melody is
more than a sequence of sounds, similarly to a text whose sense comes not only from the semantic
intent of particular words. In a melody there is a hidden order and harmony which enables its holistic
perception.
HYPOTHESES
In this research it was assumed that the musical background impairs the performance of reading
comprehension test in comparison to silence. Melodic music was expected to disturb reading
comprehension to a wider extent than improvisational music. As said before, intelligent students were
expected to show a higher level of task performance than less intelligent ones. This hypothesis concerns
general intelligence as well as verbal. It was anticipated that there is a positive correlation between the
level of intelligence and the effects of reading comprehension. Finally, as regards both musical
experimental conditions, extraverts were expected to score higher than introverts at the reading
comprehension test with musical accompaniment.
SUBJECTS
The Ss were 111 18-year-old secondary school students of the last (fourth) grade from classes with the
intensive programme in foreign languages. They were selected by lot and placed under the following
experimental conditions:
1. melodic music - with a distinct melodic line
2. improvisational music - with an unclear melodic line
3. no music
42 students
35 students
34 students
VARIABLES
The main independent variable whose impact is investigated in the study is the type of experimental
conditions - musical background (melodic or improvisational) and silence. It seems that general
intelligence which determines the amount of mental resources and extraversion as a factor that regulates
the optimum intensity of stimulation are the variables that determine differences in the level of task
performance. Another variable that can't be left out in the analysis is verbal intelligence as a set of
specific abilities regarding the use of language.
A dependent variable is reading comprehension test performance defined as the accuracy score.
EXPERIMENTAL PLAN
At the first stage of the research the subjects were asked to perform the experimental task which had a
form of a reading comprehension test and was 40 minutes long. The test was to assess the
comprehension of the article â€žEmotional mimics" by Paul Ekman as an example of verbal reasoning.
The experiment was carried out in a school classroom under the conditions mentioned above.
Afterwards the subjects were provided with the post-test questionnaire which investigated their listening
habits and their attitude towards the music they were exposed to.
The second stage of the research was completed after three weeks. It comprised three tests in the
following order:

KantorMa
1. Raven's Advanced Progressive Matrices Test - 30 minutes

2. Eysenck Personality Questionnaire - Revised - no time limit
3. Choynowski's Vocabulary Test (a multiple choice test of vocabulary familiarity) 5 minutes
RESULTS
The subjects were divided into two groups in the median point according to the results of the reading
comprehension test, extraversion, general and verbal intelligence within the whole group as well as in
particular experimental groups.
The data were analysed using correlation analysis, ANOVA/MANOVA and a post-hoc Tukey test to
find out whether there were statistically significant differences between the three aural conditions of a
reading task. Contrary to the expectations, the main effect of either general intelligence or extraversion
with the results of reading comprehension was insignificant (F=1,08; p<0,29; F=3,18; p<0,07
respectively). However, the analysis indicates that the subjects with a higher level of intelligence and
more introvert performed better than the less intelligent subjects with the same extraversion level
(p<0,05). The quality of their performance was comparable to the level of performance of the subjects
under no music condition. (Table 1.)
general intelligence (+) general intelligence (-)
extraversion (+) 26,52 26,26
extraversion (-) 30,66 25,08
Table 1. Results of the reading comprehension test performance in the subgroups discriminated according to the level of
extraversion and general intelligence. (+) symbolises higher level of both variables, (-) is a symbol for the lower level of
both extraversion and general intelligence.
The effect of both extraversion and general intelligence taken simultaneously with the test results
appeared only as a tendency (F=3,24; p<0,07). The main effect of the type of musical background and
task performance was significant (F=4,72;p<0,01) and the no music group scored the highest on the test
(p<0,05)(Table 2.).
CONDITIONS TEST SCORE
melodic music 25,45
improvisational music 25,97
no music 30,47
Table 2. Results of the reading comprehension test performance under particular experimental conditions.
The results indicate that the level of reading comprehension correlates significantly with verbal
intelligence in the whole sample (0,32; p<0,05), and in each particular experimental group considered
separately (0,27; p<0,05). The correlation between the experimental conditions and the results of
reading comprehension also proved to be significant (0,26; p<0,05). However, as to the performance
there was no difference between the melodic and improvisational music conditions. The performance of
the subjects with a higher verbal intelligence under improvisational and no music conditions was the
best. In the presence of music the results of verbal reasoning were significantly worse than in silence,
no matter how distinct was the melodic line of the background music presented (melodic - no music:
p<0,01; improvisational - no music: p<0,04).

KantorMa
There was a significant interaction between verbal intelligence and the level of task performance
(F=12,16; p<0,0007). The subjects with high verbal intelligence under both no music and
improvisational music conditions turned out to score the highest on the test (Figure 1.).
Figure 1. Interaction between verbal intelligence, experimental conditions and the reading comprehension test
performance
Their performance differed significantly from the performance of the subjects from the whole melodic
group and from those less intelligent from the improvisational group. As regards the groups dealing
with the experimental task in the presence of background music of both types, the mean results of the
task did not differ significantly. Nevertheless, it was observed that music with distinct melody might
impair the mental efficiency of the students with high verbal intelligence. The effects of their reading
comprehension don't differ statistically from those of persons with lower verbal intelligence.
DISCUSSION
The study indicated a significant difference between experimental groups that completed the task in
different conditions. The results of both "musical" groups were similar. However, verbal intelligence
turned out to be an important factor in the assessment of reasoning on verbal material with musical
background. The past research did not distinguish between background music with various types of the
melodic line.
The interaction of verbal intelligence with the performance of the experimental task and the fact that no
significant correlation between the effectiveness of verbal reasoning and general intelligence has been
found, indicate that in the process of text comprehension verbal skills are much more decisive than
general intelligence. General intelligence embodying cognitive processes that are part of reading
comprehension is measured with the skilfulness of reasoning on the graphic data. It does not seem to
influence the effectiveness of the acquisition of verbal information significantly.
The fact that the results of the test do not vary as to the level of extraversion suggests two
interpretations. In the sample extraverts might have prevailed which might have caused that the subjects

KantorMa
whose results that have been compared did not differ enough in terms of extraversion level. Another
possible explanation is that music was so distracting and the grade of task difficulty so high that the
stimulation was past its optimal level even for extraverts.
No significant difference between people of varying verbal intelligence has been observed. This fact
suggests that in the presence of melodic music even the peripheral perception of melody occupies
mental resources to such an extent that the amount left for the accomplishment of a prior task is too
little to enable high level of performance.
It can be concluded that in the presence of melodic music verbal intelligence had no influence on the
level of task performance. Presumably, the significant role of this variable was weakened by the distinct
melody. Even in more clever students the melody took up mental resources to such an extent that they
had no chance to perform better than less intelligent students. Improvisational music caused a
significant difference in the results of reading comprehension of students with higher and lower verbal
intelligence which indicates that as an unstructured stimulation improvisational music doesn't absorb
mental resources, particularly attention, to such an extent as music with a clear melodic line.
The study proves that a melodic background equalises the efficiency level of people with different
levels of verbal intelligence. Below a certain level of verbal intelligence the background music
influences the performance harnessing the effectiveness of mental processes comprised in the wider
process of text comprehension. Presumably, melodic music involves mental resources to such a large
extent that intelligence has no influence on the efficiency of task performance. On the other hand,
improvisational music doesn't attract attention so much and intelligent people do better than those with
lower mental supplies. The study suggests that music absorbs working memory and involves part of
attention span limiting mental resources ready to process the significant verbal stimuli.
The main hypothesis assuming that the presence of background music results in a drop in the quantity
and quality of work on the reading comprehension test has been supported. Individual differences in
extraversion and general intelligence didn't appear to affect verbal reasoning. The study implies that as
an unstructured background improvisational music doesn't take up mental resources, particularly
attention, to such an extent as music with a distinct melody. However, this effect seems to be
insignificant in the case of people with comparatively lower intelligence. In the research presented
above study habits and the attitude of the subjects towards the music which they were exposed to had no
effect on the task performance.
To keep the experimental plan and research clear it was indispensable to ignore such variables as
musical preferences, musical sensitivity and temperament factors regarding emotional reactivity. It
would be interesting to examine the influence of background music on people who are involved in
intensive musical activities and people whose contact with music is passive and rather accidental. The
study could not throw light on which particular cognitive processes are affected by background music.
However, it provides eager young listeners with the information on the disadvantages of making
studying compete with listening to music at the same time. The frequent disapproval of parents who
observe their children's study habits doesn't seem to be only empty preaching.
REFERENCES
Anderson, J.R. (1976). Language, memory and thought. Hillsdale: Erlbaum.
Baddeley, A.D. (1976). The psychology of memory. Oxford: Clarendon Press.
Barrett, P.T. i Eysenck, H.J. (1994) The relationship between evoked potential component

KantorMa
amplitude, latency, contour length, variability, zero-crossings and psychometric intelligence.

Personality and Individual Differences, 16, 3-32.
Bower, G.H. (1975). Cognitive psychology: An introduction. In W.K. Estes (Ed.) Handbook of
learning and cognitive processes. T.1. Hillsdale: Erlbaum.
Craik, F.I.M., Lockhart, R.S. (1972). Levels of processing: a framework for memory research.
Journal of Verbal Learning and Verbal Behaviour, 11, 671-684
Dollinger, S.T. (1993). Personality and Music Preference: Extraversion and Excitement Seeking
or Openness to Experience? Psychology of Music, 21, 73-77.
Eysenck, M.W. (1981). Learning, Memory and Personality. In H.J. Eysenck (Ed.) A Model for
Personality. Berlin: Springer - Verlag.
Eysenck, M.W. (1982). Attention and arousal: Cognition and performance. New York: Springer
Eysenck, H.J., M.W. Eysenck (1985). Personality and Individual Differences. A Natural Science
Approach. New York and London: Plenum Press.
Fan Ng, C., Turnbull, J. (1997). Preference for Noise and Effectiveness of Studying. Perceptual
and Motor Skills , 85, 155-160.
Farnsworth, P.R. (1958). The social psychology of music. New York: Dryden
Furnham, A., Bradley, A. (1997). Music While You Work: The Differential Distraction of
Background Music on the Cognitive Test Performance of Introverts and Extraverts. Applied
Cognitive Psychology,11, 445-455.
Freebourne, C.M., Fleischer, M.S. (1952). The effect of music distraction upon reading rate and
comprehension. Journal of Educational Psychology, 43, 101-110
Guilford, J.P. (1967). The nature of human intelligence. New York - St. Louis - San Francisco -
London - Sydney: McGraw-Hill, Inc.
Hallam, S., Kotsopoulou, A. (1998). The Effects of Background Music on Learning, Performance
and Behaviour. Paper presented at the conference of Society for Research in Psychology of
Music and Music Education, Sheffield.
Jordan - SzymaÅ„ska, A. (1990). Percepcja muzyki. [Music perception] In M. Manturzewska, H.
Kotarska (Eds.) Wybrane zagadnienia z psychologii muzyki. Warszawa: Wydawnictwa Szkolne i
Pedagogiczne.
Jordan - SzymaÅ„ska, A. (1991). Model uwarunkowaÅ„ percepcji utworu muzycznego.
Podstawa teoretyczna planu badaÅ„ eksperymentalnych. [A model of conditions of music
perception. A basis of an experimental study.] In K. Miklaszewski, M. Meyer - Borysewicz (Ed.)
Psychologia muzyki. Problemy, zadania, perspektywy. Proceedings of the International Seminar
of Music Psychologits. Warszawa: Akademia Muzyczna im. F. Chopina.
Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, New Jersey: Prentice Hall, Inc.
Klimas - Kuchtowa, E. (1986). O odbiorze muzyki z punktu widzenia teorii informacji. [About
the music perception with regard to the theory of information]. PrzeglÄ…d Psychologiczny, 2,
335-344.
KoneÄ•ni, V. (1982). Social interacton and musical preference. In D. Deutsch (Ed.) The

KantorMa
psychology of music. New York: Academic Press

Kozielecki, J. (1995). MyÅ›lenie i rozwiÄ…zywanie problemów. [Thinking and problem
solving] In T. Tomaszewski (Ed.) Psychologia ogólna. Warszawa: Wydawnictwo Naukowe
PWN.
Kurcz, I. (1995). PamiÄ™Ä‡. Uczenie siÄ™. JÄ™zyk. [Memory. Learning. Language.] In T.
Tomaszewski (Ed.) Psychologia ogólna. Warszawa: Wydawnictwo Naukowe PWN.
Kurcz, I., Polkowska, A. (1990). Interakcyjne i autonomiczne przetwarzanie informacji
jÄ™zykowych. [Interactive and autonomous information processing] WrocÅ‚aw: PAN.
Maruszewski, T. (1996). Psychologia poznawcza. [Cognitive psychology] Warszawa: Polskie
Towarzystwo Semiotyczne.
Natanson, T., (1992). Programowanie muzyki terapeutycznej. [Programming the therapeutic
music]. Zeszyty Naukowe Akademii Muzycznej we WrocÅ‚awiu, 53. WrocÅ‚aw: Akademia
Muzyczna.
NÄ™cka, E. (1994). Inteligencja i procesy poznawcze. [Intelligence and cognitive processes]
Kraków: Oficyna Wydawnicza â€žImpuls".
Neilly, L. (1995). The uses of music in people's everyday lives. Unpublished undergraduate
dissertation, Departament of Psychology, Keele University
Patel, J.A., Peretz, I. (1997). Is music autonomous from language? A neuropsychological
approach. In J. A. Sloboda, I. Deliege (Ed.) Perception and cognition of music. Hove: Psychology
Press Ltd.
Polkowska, A. (1993). Rozumienie tekstu. [Reading comprehension] In I.Kurcz (Ed.)
Psychologia a semiotyka. Warszawa: Polskie Towarzystwo Semiotyczne.
Rauscher, F.H., Hughes, J.L., Miller, R.J. (1996). Music-Induced Mood Affects Spacial Task
Performance. (paper presented at the 8th Annual Convention of the American Psychological
Society, June 30, 1996).
Robinson, D.L. (1985) How personality relates to intelligence test performance implications for a
theory of intelligence, ageing research and personality assessment. Personality and Individual
Differencies, 6, 203-216.
Sloboda, J.A. (1985). The musical mind. The cognitive psychology of music. Oxford: Clarendon
Press.
Sloboda, J.A. (1998). Everyday Uses of Music Listening. Proceedings of the Fifth Conference on
Music Perception and Cognition, August 26-30, 1998 Seoul, Korea
Smyth, M.M., Collins, A.F., Morris, P.E., Levy, P. (1994). Cognition in Action. Hove, Hillsdale:
Erlbaum.
Sternberg, R. (1995). For whom the bell curve tolls. Psychological Science, 6, 257-261. Tulving,
E. (1972). Episodic and semantic memory. In E. Tulving, W. Donaldson (Eds.) Organization of
Memory. New York: Academic.
Umemoto, T. (1990). The psychological structure of music. Music Perception, 2, 115-128.
WierszyÅ‚owski, J. (1979). Psychologia muzyki. [Psychology of music] Warszawa: PWN

KantorMa
Back to index

Introduction
Proceedings paper
Measuring Music Preference or Preference for Measuring?

Marcin Klusak, Jagiellonian University, Cracow, Poland
KUDLATY@apple.phils.uj.edu.pl
Introduction
Investigators interested in music preference studies often face the problem of choosing a method of
collecting data. The area of music is very complicated in its nature and no perfect tools for measuring
preference are available. The variety of kinds of music, impossibility of their exact, univocal defining,
permanent evolution and changes within (and between) different genres are only a few difficulties
challenging a researcher at the beginning of a survey. Actually such disadvantages make every
thorough investigator start her/his study with a new, or at least freshly updated, method of assessing
the preference.
At present, there exist two main approaches to examining music preference (for a brief review see
Rawlings & Ciancarelli, 1997). The first one seems to be more obvious - if anybody is to investigate
music, a musical material should be used. According to that, recorded excerpts of music are rated by
subjects. This approach is criticised mostly because of rather arbitral choice of examples by
researchers. Going further, however, one can find some other flaws in this way of measuring
preference. Preparing musical examples has to be based on several criteria, which are very difficult to
be controlled altogether: the level of familiarity of every piece should be similar among subjects (and
between the examples), chosen piece should be as much representative for a genre in question as
possible, and so on. Therefore preference profiles measured with this method may greatly depend on
material used.
The other way of assessing the preference is using paper questionnaires. The Musical Preference Scale
devised by Litle and Zuckerman (1986) is probably the best-known representative of the approach.
Using such tools is undoubtedly simpler and less time consuming, hence more appropriate for
studying the whole music field; moreover the method is believed to be more objective (Christenson &
Peterson, 1988, Dollinger, 1993, Rawlings & Ciancarelli, 1997, Rawlings et al., 1998). However, the
investigators who choose to work with genre labels seem to be unaware (or to neglect) of a serious
disadvantage, which is a high likelihood of collecting rather declarative responses than information
about real preferences. Besides, usually a decision what genres or categories should be included in a
list, or how "broad" they should be, is also arbitral.
There are some other difficulties connected with the preparation of testing material, which are not
specific to any of above methods, e.g. number of excerpts or genres constituting the tool. Christenson
& Peterson (1988) note examples of studies examining music preference by dividing popular music
into only five categories.
file:///g|/poster2/Klusak.htm (1 of 10) [18/07/2000 00:35:23]

Introduction
Thus it seems interesting to directly compare the two methods of measuring music preference. Such
comparison could confirm the results of Müller (1998), stating that in general verbal preferences are
higher. This could also test few modifications and thoroughly investigate some flaws of both tools.
Many researches investigated relationships between music preferences and personality dispositions
(e.g. Litle and Zuckerman, 1986, Dollinger, 1993). Personality is also the element of the interactive
theory of music preference by LeBlanc (1982), where it is considered not only to influence music
interests as such, but also to modify (or even to protect from) the external, environmental influences.
It is worth examining then, if any of personality dimensions are predictors of genre labelling
knowledge. For example, openness to experience was found to correlate with a general preference for
a wide range of music types (Rawlings & Ciancarelli, 1997, Rawlings et al., 1998). This factor may
be also responsible for a better acquaintance with genre labels. Additionally, the study allows looking
for a relationship between personality and a similarity of preference profiles assessed with "paper"
and "music" tools.
Method
Subjects
Subjects were 17-18 year old students from a secondary school. Because of the two stage research
design, results of only those who completed all tasks (82 students - 58 females, 24 males) were
included into analyses.
Measures
Genre labels list. Polish questionnaire of music preference based on the updated version of Musical
Preference Scale (Litle & Zukerman, 1986, Rawlings & Ciancarelli, 1997) was developed. MPS was
independently modified by music store employees in order to best fit in Polish market. Many genres
were eliminated as completely not existing in Polish culture (e.g. various kinds of country music
created one country item). A lot of items were added instead - not only as a result of culture
differences (like dividing metal music into several genres), but also because of constant changes and
evolution within "youth" music. Examples given as a help for correct recognition of the particular
genres were also updated.
Moreover, questions about preference for "general" kinds of music (like rock, classical and so on)
were left out as not suitable to the research. The only general category (with several subcategories)
was techno music.
The final version of the questionnaire consisted of 72 genres and 7 techno subgenres (see Appendix
1).
Music excerpts. Music examples corresponding to the items of the questionnaire were chosen using
several principles of selection. First of all, every selected piece had to be the best representative of a
specific genre. This was the precondition of a possibility of comparing the methods. Very important
thing was also keeping in mind that not the whole production of a band connected to a specific area of
music is characteristic for that area (let Metallica be the example here - will anybody recognise the
band as the leading thrash metal group after listening to the recent concert with the symphonic
orchestra?). Compositions also should have been unknown to the listeners.
Technical reasons made 3 examples unavailable so 75 pieces were recorded on a cassette tape (see
Appendix 2).

Introduction
Excerpts were about 30 second long, fading-out at the end, with 2 seconds of silence following each
of them. Their order was randomised to avoid examples of similar music went together. The whole
"sound test" lasted about 40 minutes.
NEO-FFI. Costa and McCrae's NEO-FFI (1992) adaptation is the only questionnaire of the five-factor
model of personality translated and normalised in Poland (Zawadzki et al., 1998).
Procedure
Subjects were tested in groups of 25-30 persons. During the first meeting they completed both
NEO-FFI and music preference questionnaire. They were instructed to rate their preference level for
every genre (not for any band or composer) included in the list using 7-point Likert scale (ranging
from like very much to dislike very much, with indifferent as a mid-point). Additionally an unknown
answer was possible.
Some weeks later subjects rated the recorded musical excerpts with the same Likert scale.
Results
The data were analysed with ANOVA/MANOVA and Non-parametric Statistics modules of Statistica
for Windows (version 5.5). Because the comparison of two measuring methods was the main goal of
the experiment, direct information about preference profiles will not be included here.
Techno subgenres were not differentiated well by subjects - in almost every diagnostic case these
items were rated on the same, case-specific, level. Therefore they were excluded from analyses and all
results concern 70 genres.
Analysis of variance showed that differences between "paper" and "sound" ratings appeared
significant in 39 genres. 23 of them differed at p<0.001 level (F ranging from 74.56 to 13.6), for 7
items p<0.01 (10.93>=F>=7.52), and for the remaining 9 - p<0.05 (7.52>=F>=4.03). A certain
number of other genres seemed to produce similar relationships, though they were not statistically
significant. The significance of differences in all but 2 above genres was confirmed by Wilcoxon test
of pair order.
From among 39 items in question, 31 declarative responses were higher than corresponding opinions
about the musical material. Moreover, most of the remaining 31 genres showed a tendency of that
kind.
Almost magic "31" appeared once again as a number of genres rated positively in the questionnaire;
13 of them were also liked in the real music test (so were another 4 excerpts).
Letting the subjects not rate unknown items in the questionnaire allowed investigating the level of
genre labelling knowledge among youngsters. First, less than the half of the respondents indicated
their attitude towards few genres (gothic rock/cold wave, industrial rock and all jazz categories). On
the other hand, individual differences in this matter were hypothesised to correlate with personality
dispositions. Unfortunately, no relationship was found between any of NEO-FFI factors and number
of known genres.
Personality dimensions, as assessed with NEO-FFI, didn't relate to a stability of answers between two
presented methods of measuring music preference. In very few cases high and low scorers on a
personality subscale changed their ratings in a different way.
Discussion

Introduction
The main conclusion drawing from the results is consistent with the note by Müller (1998)
-declarative preferences are generally higher than preferences assessed with musical material. Some
speculative considerations are worth putting forth to reinforce the finding. It seems reasonable to say
that some labels names or a category that they belong to, made them be rated more severely. The
opinions about heavy metal, American metal (two "softer" subcategories of metal music), hard rock
(commonly associated with "heavier" playing), and world music (belonging to disliked folk/ethnic
music category) most probably were lowered just because of the labels. The 4 corresponding musical
examples ratings were significantly higher. It may be supposed then, that declarative responses
sometimes concern the concepts, which are not well defined.
The very similar inference (although the results contrast with the above) may derive from considering
musicals and movie music items. The differences between "paper" and "sound" answers appeared to
be the highest ones here - this was very likely caused by too broad (thus not clear and precise) terms;
preference for these kinds of music genres probably reflects attitudes towards main themes or
leitmotifs. Therefore "plain" music is not liked so much.
On the other hand, dividing music into more and more narrow genres makes their precise
identification very difficult. That might be the reason why most subjects in the presented study didn't
know any of jazz subcategories. Commonly jazz is... jazz and no deeper knowledge in this area is
obtained.
What is better then? Gaining almost-surely-false (or not-surely-true) results by creating broad
categories or agreement for missing some information at all? Anyway, a necessity of rating general
heavy metal genre for example (being in fact the incorrect label of metal music and consisting of such
miscellaneous subgenres as trash, death, gothic or American metal), may cause certain uneasiness or
displeasure in people familiar with that type of music.
The possibility of not rating unknown genres seems to be valuable solution. Thanks to the option
subjects were not forced to choose an answer and probably much fewer indifferent rates were used.
Second modification of MPS, namely broadening the Likert scale in its negative end, allowed subjects
to grade their negative opinions. They didn't have to choose between indifferent and don't like it only -
negative attitudes are for sure as gradable as positive ones.
Hopefully these two modifications resulted in much reliable data.
Additional information about another 2 genres used in the study confirms earlier findings about more
positive attitudes towards known music (e.g. North & Hargreaves, 1995). The subjects undoubtedly
knew disco and hip-hop excerpts and this familiarity caused higher ratings. This observation
emphasises the importance of levelling the familiarity of all used pieces.
Unexpectedly no relationships between personality and the range of known genre labels were found.
In spite of the fact that openness to experience correlates with higher number of preferred styles of
music, broader knowledge of labels is not related to this factor. So preferences for wider range of
music types are independent of a known genres pool size.
To summarise, preference profiles measured with genre label list are usually higher than those
assessed with music excerpts.
Thanks to the comparison, paper questionnaires seem not to be so much better methods of measuring
as their advocates claimed it. Especially trouble with a breadth of used terms and impossibility of
precise definitions of genres vs. subgenres make the tool far from ideal.

Introduction
Of course it is also almost impossible to create an ideal music excerpts test. Too many variables
would have to be controlled at once to secure similar level of examples familiarity, their
representativeness, and so on.
Therefore, it seems very important not to choose a measuring tool only on the basis of investigator
preference for a given method.
References
Christenson, P.G. & Peterson, J.B. (1988). Genre and Gender in the Structure of Music Preferences.
Communication Research, 15, 3, 282-301.
Dollinger, S.J. (1993). Research Note: Personality and Music Preference: Extraversion and
Excitement Seeking or Openness to Experience? Psychology of Music, 21, 73-77.
LeBlanc, A. (1982). An Interactive Theory of Music Preference. Journal of Music Therapy, 19, 1,
28-45.
Litle, P. & Zuckerman, M. (1986). Sensation Seeking and Music Preference. Personality and
Individual Differences, 7, 4, 575-578.
Müller, R. (1998). Young People's Distinction Between Verbal and Sounding Preferences - An
Indicator of Musical Literacy. International Annual Meeting of the German Society for Music
Psychology. The Musicians Personality. Proceedings: Schedule and Abstracts. Universität Dortmund.
North, A.C. & Hargreaves, D.J. (1995). Subjective Complexity, Familiarity, and Liking for Popular
Music. Psychomusicology, 14, 77-93.
Rawlings, D. & Ciancarelli, V. (1997). Music Preference and the Five-Factor Model of the NEO
Personality Inventory. Psychology of Music, 25, 120-132.
Rawlings, D., Twomey, F., Burns, E. & Morris, S. (1998). Personality, Creativity, and Aesthetic
Preference: Comparing Psychoticism, Sensation Seeking, Schizotypy, and Openness to Experience.
Empirical Studies of the Arts, 16, 2, 153-178.
Zawadzki, B., Strelau, J., Szczepaniak, P. & SliwiÅ„ska, M. (1998). Inwentarz osobowosci NEO-FFI
Costy i McCrae. Adaptacja polska. PodrÄ™cznik. [Costa and McCrae's NEO-FFI personality
inventory. Polish adaptation. A manual]. Warszawa: Pracownia Testów Psychologicznych PTP.
Appendix I
The list of genres used in the music preference questionnaire (translated into English)
Rock
1. Rock and roll/classic rock (Beatles, Rolling Stones, Doors)
2. Acid/psychodelic rock (Jimmy Hendrix, Greatful Dead, Jefferson Airplane)
3. Jazz-rock (Pat Metheny, Mahavishnu Orchestra, SBB)
4. Progressive/symphonic rock (Pink Floyd, King Crimson, Yes)
5. Electronic rock (Tangerine Dream, Kraftwerk, Klaus Schulze)
6. Pop rock (Queen, Madonna, Kylie Minogue)
7. New wave (Stranglers, Depeche Mode)

Introduction
8. New romantic (Alphaville, OMD)

9. Gothic rock/cold wave (Bauhaus, Sisters of Mercy, Joy Division)
10. Punk rock (Greenday, Sex Pistols, Bad Religion, Exploited)
11. Alternative (Hole, Smashing Pumpkins, Nick Cave and the Bad Seeds)
12. Mainstream (U2, REM)
13. Grunge (Pearl Jam, Nirvana)
14. Brit-pop (Oasis, Blur)
15. Industrial (Nine Inch Nails, Einstürzende Neubauten)
16. Hard rock (Deep Purple, Black Sabbath, Led Zeppelin)
17. Hard core (DRI, Helmet)
Metal
18. Heavy metal (Iron Maiden, Judas Priest, TSA)
19. Thrash metal (Slayer, Megadeth, Acid Drinkers)
20. Death metal (Morbid Angel, Obituary, Vader)
21. Grind core (Napalm Death, Brutal Truth)
22. Black metal (Dimmu Borgir, Cradle of Filth, Behemot)
23. Doom metal (Cathedral, Candlemass, St. Vitus)
24. Gothic metal (Crematory, Theatre of Tragedy)
25. Symphonic metal (Tiamat)
26. American metal (Bon Jovi, Motley Crue)
Jazz
27. Dixieland (Preservation Hall Jazz Band)
28. Big band/swing (Glenn Miller, Duke Ellington, Benny Goodman)
29. Bebop (Charlie Parker, Dizzie Gillespie)
30. Fusion/progressive jazz (Miles Davis, Weather Report)
31. Cool/West coast style (Buddy Rich, Dave Brubeck, Chet Baker)
32. East coast style (Urbie Green, Grover Washington Jr.)
33. Big band jazz/pop (Maynard Ferguson, Lou Rawls)
34. Acid jazz (Brand New Heavies, Us3, Tab Two)
35. Yass (MiÅ‚osc, Kury)
36. Hard bop (S. Rollins)
Soul/rhythm & blues
37. Rhythm & blues (John Lee Hooker, Gary Moore)
38. Soul-rock (Stevie Wonder, Boyz II Men)
39. Gospel (Aretha Franklin, Al Green)
40. Modern style (Diana Ross, All For One)
41. Soul-jazz style (Ray Charles, Brook Benton, James Ingram)
42. Funk (James Brown)Popular/dance/beat

Introduction
43. Disco (Bee Gees, Donna Summer)

44. Latin pop (Enrique Iglesias, Ricky Martin)
45. Dance (Mr. President, Vengaboys)
46. Techno in general (Orb, Prodigy, Chemical Brothers, FatBoy Slim)
Ambient
breakbeat
drum'n'bass
house
jungle
rave
trance
47. Trip-hop (Massive Attack, Portishead)
48. Reggae (Bob Marley, UB40, Izrael)
49. R&B (Mary J. Blige, Mariah Carey, Erykah Badu)
50. Rap (Ice-T, Beastie Boys, Public Enemy)
51. Hip-hop (2Pac, Fugees)
Folk/ethnic
52. Polish folk music
53. Country (Dolly Parton, Johnny Cash, Babsztyl)
54. American folk music (Joan Baez, Ani Difranco, Bob Dylan)
55. Folk music from other cultures (Clannad, Ustad Nusrat Fateh Ali Khan)
56. World music (Deep Forest, Adiemus)
Electronic
57. Classical (Carl Stockhausen, Walter Carlos, Tomita)
58. Modern (Jean-Michel Jarre, Vangelis)
59. Ambient/new Age (Tony O'Connor, Brian Eno)
Classical
60. Baroque (Bach, Händel)
61. Classical period (Mozart, Beethoven)
62. Romantic (Schumann, Tchaikovsky, Chopin)
63. Impressionistic (Debussy)
64. Neoclassical (Stravinsky, Prokoviev)
65. Contemporary (Bernstein, Ives, Schönberg, Penderecki)
66. Opera music (Verdi, Puccini)(varia)
67. Hymns/chorales/sacral music
68. Religious songs (A. Krzyszton, M. Szczesniak)

Introduction
69. Student songs (J. Kaczmarski, Wolna Grupa Bukowina)

70. Sung poetry (G. Turnau, Pod Buda)
71. Musicals (My Fair Lady, Westside Story, Hair, Metro)
72. Movie music (Star Wars, Trzy Kolory)
Appendix II
The list of musical excerpts used in the listening condition (not randomised order)
Rock
1. Rock and roll/classic rock - The Rolling Stones Let's Spend the Night Together
2. Acid/psychedelic rock - Jimi Hendrix Can You See Me
3. Jazz-rock - Pat Metheny Extradition
4. Progressive/symphonic rock - Pink Floyd Eclipse
5. Electronic rock - Kraftwerk Schaufensterpuppen
6. Pop rock - Queen In Only Seven Days
7. New wave - Depeche Mode And Then
8. New romantic - OMD Tesla Girls
9. Gothic rock/cold wave - Sisters of Mercy
10. Punk rock - The Exploited Addiction
11. Alternative - Nick Cave and the Bad Seeds The Carny
12. Mainstream - REM Driver 8
13. Grunge - Pearl Jam Porch
14. Brit-pop - Oasis Live Forever
15. Industrial - Einstürzende Neubauten Wardrobe
16. Hard rock - Deep Purple Black Night
17. Hard core - Helmet FBLA
Metal
18. Heavy metal - Iron Maiden The Evil That Man Do
19. Thrash metal - Slayer War Ensemble
20. Death metal - Vader Dark Age
21. Grind core - Brutal Truth Kill Trend Suicide
22. Black metal - Cradle of Filth The Forest Whispers My Name
23. Doom metal - Candlemass Bewitched
24. Gothic metal - Crematory Only Once in a Lifetime
25. Symphonic metal - Therion To Mega Therion
26. American metal - Bon Jovi Hey God
Jazz
27. Dixieland - Jazz Band Ball Orchestra Tin Roof Blues
28. Big band/Swing - Duke Ellington Black and Tan Fantasy
29. Bebop - Charlie Parker Ornithology

Introduction
30. Fusion/progressive jazz - Weather Report Mr. Gone

31. Cool/West coast style - Dave Brubeck You Go to My Head
32. East coast style - Grover Washington Jr. Body & Soul
33. Big band jazz/pop - missed
34. Acid jazz - Us3 Time & Space
35. Yass - MiÅ‚osc Kusocinski
36. Hard bop - S. Rollins On Green Dolphin Street
Soul/rhythm & blues
37. Rhythm & blues - John Lee Hooker Mr. Lucky
38. Soul-rock - Boyz II Men I'll Make Love to You
39. Gospel - Etta James At Last
40. Modern style - Diana Ross My Old Piano
41. Soul-jazz style - Ray Charles Rockhouse
42. Funk - James Brown Give It Up Or Turn It Loose
Popular/dance/beat
43. Disco - Bee Gees Tragedy
44. Latin pop - Ricky Martin La Diosa Del Carnaval
45. Dance - Mr. President Give a Little Love
46. Technoambient -
Orb Blue Room breakbeat - missed
drum'n'bass - Alex Reece Reactivate
house - Basement Jaxx Yo-yo
jungle - Goldie Digital rave - missed
trance - Wayworth Industries Dimension DJ
47. Trip-hop - Coco and the Bean Shan Dancer
48. Reggae - Habaluk Soul Food
49. R&B - Mary J. Blige Searching
50. Rap - Run DMC Groove to the Sound
51. Hip-hop - Nas & Lauryn Hill If I Ruled the World
Folk/ethnic
52. Polish folk music - Lachy A gdziezes ty, moj Jasiu
53. Country - Shania Twain What Made You Say That
54. American folk music - Bob Dylan It Ain't Me Babe
55. Folk music from other cultures - Huun-Huur-Tu Ancestors
56. World music - Adiemus Elegia
Electronic

Introduction
57. Classical - Tomita Reverie

58. Modern - Jean-Michel Jarre Equinoxe part 4
59. Ambient/new age - Brian Eno Music for Airports 1/2
Classical
60. Baroque - Bach Chromatic Fantasia and Fugue in b minor BWV907
61. Classical period - Beethoven Symphony No 3 in E-flat Major, op. 55, Scherzo
62. Romantic - Chopin Ballada nr 1 g-moll op. 23
63. Impressionistic - Debussy La Mer
64. Neoclassical - Stravinsky L'Oiseau De Fu
65. Contemporary - LutosÅ‚awski Symfonia nr 3
66. Opera music - Mozart Die Zauberflöte(varia)
67. Hymns/chorales/sacral music - Lucernanium Paravi lucernam Christo meo
68. Religious songs - A. Krzyszton Hymn o miÅ‚osci
69. Student songs - J. Kaczmarski Cesarz
70. Sung poetry - G. Turnau Zanim
71. Musicals - My Fair Lady You Did It
72. Movie music - W. Kilar Kontrakt
Back to index

PSYCHOLGICAL ASSESSMENT OF SIGNAL MUSIC AT RAILWAY STATIONS
Proceedings paper
A LABORATORY STUDY OF THE EVALUATION OF SIGNAL MUSIC AT RAILWAY STATIONS.

Yoko Ogawa*, Tazu Mizunami **, Teruo Yamasaki ** & Sonoko Kuwano**
* Faculty of Education and Regional Sciences, Tottori University
** Faculty of Human Sciences, Osaka University
1. Introduction.
In Japan, the departure music provided by Japan Railway Company offers a very diverse sound environment. As trains leave the platform, the music is used as an alarm in place of the electric
bell. Differetnt music is used for each station, each track and also each kind of train. Every 1 or 2 minutes when the trains start, the platforms are covered with many sounds. The electric bell was
regarded as noisy sound 10 years ago. Similarly the departure music has been considered the controversial sound, by now. In other words, the departure music is taking the place of the modern
sound pollution, or, according to Van den Borre (1998), an invader of our social life.
Over the past few decades a considerable number of studies have been made on noise problems and protection from noise (eg. Sasaki,1993; Igarashi,1993; Sterne, 1997), however, only few
studies have ever tried to investigate how people feel about the departure music. While the new designs of departure music have been the subject of music composers, all passengers do not have
the opportunity of assessing the music design; i.e. pleasure or displeasure.
The principal aim of this study is to investigate the impression of the departure music alarming at many stations. The following two points are examined: (1) which aspects of music are related to
the impression? (2) what are the differences between melodious music and sound-logo and how are they different? The term "sound-logo" can be defined as a music style, melody spanning less
than an octave, without harmony, simple music structure. The second aim of the study is to compare the results of our previous study using departure music recorded by DAT.
2. Method.
Musical Stimuli.
Thirty pieces of music stimuli were selected from departure or arrival musics around Tokyo and Osaka train stations, and classified into 3 groups by 4 musicians: melodious music, sound-logo
and neutral. Each stimulus was made in MIDI format by a Mackintosh computer
with a Sound card(Creative Technology, Sound Blaster 16) and sequencer soft. Acoustic Grand Piano was used as timbre of stimulus. The average duration of each stimulus was 8.37 sec
(SD=3.79).
Procedure.
Semantic differential was used. The 16 pairs of adjectives were presented on a computer monitor one after the other in random order (Table 1). Subjects seated in a soundproof room individually,
and listened to the each stimulus presented through headphones. The value of Laeq was 75dBA. The music stimulus were reproduced with a DAT recorder(PIONEER D-05) and presented
through headphones (STAX SRD-X) in a soundproof room. Subjects were required to judge the impression of the sound by 7-point for adjective scale and press a corresponding key on the
keyboard. Each subject repeated the judgment session twice in different order of stimuli.
Table 1. List of pairs of adjectives

noisy quiet
annoying non-annoying
powerful unsatisfactory
file:///g|/poster2/Ogawa2.htm (1 of 10) [18/07/2000 00:35:28]

loud soft
sharp dull
hard gentle
shrill calm
metallic grave
pleasant unpleasant
beautiful ugly
clear thick
vivid vague
stable unstable
hurried peaceful
consistent rambling
melt loose
Prior to each session, the following instructions were given to subjects:

The purpose of this experiment is to examine what impression you will have when you hear the departure music excerpts. Your task is to judge the impression of each sound according to the
adjective scale presented on a monitor and press one of the keys (1 to 7) on a computer keyboard. Each sample of departure music will be presented through your headphones.
Subjects
The participants were twenty subjects, 8 males and 12 females. They were undergraduates and graduate students at Osaka University, aged from 20 to 40 years (Average=26.55). They majored in
experimental psychology and had experience to hear the departure music at the stations.
3. Results.
Factor analysis.
All the responses of subjects were directly saved in a computer. Significant moderate correlation was found between two judgement sessions by each adjective (.50 to .91). On the basis of the
matrix of average judgments by 20 subjects, the phi coefficient between adjectives was calculated. Table 2 shows the result of factor analysis (principal component with varimax rotation). The
adjectives "sharp", "metallic", and "vivid" were highly loaded on Factor 1. Factor 1 was labeled "Metallic". The adjectives "beautiful", "pleasant", and "consistent" were highly loaded on Factor 2.
Factor 2 was interpreted as "Pleasant". The adjectives "powerful", "hurried", and "loud" were highly loaded on Factor 3. Factor 3 was interpreted as "Potent". These 3 factors agree with our
previous study (1999).
Table 2. Factor analysis. The thick letter = highly loaded in each factor
Adjective Factor 1 Factor 2 Factor 3
noisy--quiet -0.47 -0.51 -0.70
annoying--non-annoying -0.41 -0.60 -0.67
unsatisfactory--powerful 0.27 0.13 0.93
loud--soft -0.36 -0.37 -0.77
sharp--dull -0.86 -0.14 -0.46
hard--gentle -0.76 -0.49 -0.38
calm--shrill 0.76 0.39 -0.49
metallic--grave -0.82 -0.39 -0.32
pleasant--unpleasant 0.20 0.92 0.24
beautiful--ugly 0.07 0.95 0.15
clear--thick -0.63 0.69 0.00
vivid--vague -0.94 0.05 -0.26
stable--unstable 0.28 0.82 0.43
hurried--peaceful -0.44 -0.40 -0.72
consistent--rambling 0.12 0.89 0.34
melt--loose 0.41 0.81 0.35
----------- ----------- -----------
% 30.84 36.59 26.25
The factor scores of each musical stimulus were calculated, and are shown in Figures 1-1 to 1-3. Factor 1 (Metallic) was represented by the horizontal-axis and Factor 2 (Pleasant) was represented
by the vertical-axis in Figure 1-1. In succeeding Figures, Factor 1 and Factor 3, Factor 2 and Factor 3 were represented respectively. The "MM" is the symbol for "melodious music", and "s" is the
symbol for "sound-logo". The other dots show the sound classified as "neutral".
Figure 1-1. Scatter map of departure music

As shown in Figure 1, the 6 melodious musics (MM) were perceived as being powerful and metallic. Furthermore, some subjects regarded the departure music as not very pleasant. On the other
hand, 8 sound-logo had a little unsatisfactory impression, and gave more pleasant feeling than the melodious music except for S1. The neutral music was accepted as sound without any character.

In the factor of the "Pleasant", MM4 and MM6 gave subjects an unpleasant feeling. Both of these departure music excerpts are played in modulated (hectic changing) key and complicated
harmony. S3 and S4 give a pleasant impression due to the non-modulated key and simple melody. These results show that the harmony and progression of the departure music seem to affect the
pleasant/unpleasant feeling.
Furthermore, MM1 and MM2 were perceived as being strong, S5 and S2 being weak slightly in the factor of the "Powerful". MM1 and MM2 are played in fast tempo and divided rhythm as
sixteenth notes and thirty-second notes. In comparison, S5 and S2 are played in middle or slow tempo and give a basic rhythmic accent. Namely, the fast tempo and quick rhythm have some
relation to the powerful impression.
In the factor of the "Metallic", sound-logo music lay spread out. To put it more precisely, S3, S2, MM1 and MM5 gave metallic impression, S8 and S1 stimmlated a grave impression. The former
4 music excerpts are played at a higher pitch range, and the latter 2 music excerpts are played at a lower pitch. Thus, it can be suggested that the metallic impression has the relevance to the tone
height.
Profile of musical stimulus categorized by music experts

Figure 2-1and 2-2 show the profiles of the "melodious music" and "sound logo" respectively on the basis of the average judgments by 20 subjects.
Figure 2-1 Profile of the "melodious music"

Figure 2-2 Profile of the "sound-logo"
Figure 2 show the difference of "melodious music" and "sound-logo" clearly. Melodious musics are accepted as the "noisy" (M=2.5, SD=0.22), "shrill" (M=5.5, SD=0.25), "unstable" (M=4.93,
SD=0.43), "hurried" (M=2.18, SD=0.34), and "loose" (M=4.82, SD=0.4). Meanwhile, the sound-logo gives the various impressions to subjects. Some sound-logo is accepted as "unsatisfactory",
"calm" and "grave", however, other sound-logo is not. The characteristics which are common to all sound-logos cannot be found from this results.
Multiple regression analysis.
The above result of each factor loading was analyzed in relation to the musical features of departure music using multiple regression analysis. Eight musical elements; the way of modulation, the
progression of cadence, the rate of harmony transition, the tone density, the lowest tone height, the highest tone height, the mean and the variance of pitch were extracted from each musical
stimulus. Multiple regression analysis indicated that the all factors were estimated significantly as follows: Factor 1 (F=5.53, p=0.001), Factor 2 (F=2.28, p=0.06), and Factor 3 (F=3.08, p=0.01).
Results from regression coefficient of explanatory variables indicated that the lowest tone height was most closely related to the metallic factor (p=0.009), the way of modulation, the progression
of cadence were related to the pleasant factor (p=0.003, p=0.08) and the tone density was related to the powerful factor (p=0.005).
Based upon the data given by the 20 subjects, the following results were found:
1) Subjects' judgements of thirty styles of departure music from around Tokyo and Osaka area are divided into 3 main factors: metallic, pleasant and powerful.
2) "Melodious music" categorized by music experts is judged as "noisy", "shrill", "unstable", "hurried" and "loose".
3) Some "Sound-logo" music is accepted as more pleasant and slightly unsatisfactory impression compared to the "Melodious music".
4) The "metallic factor" has some relation to the lowest frequency.
5) The "pleasant factor" has a relation to the way of modulation, and the progression of cadence.
6) The "powerful factor" has some relation to the tone density.
4. Discussion
As metropolitan areas continue to grow, and as people commuting to the central city continue to increase, public transportation provides a greater service to the passengers. Ten years ago, the
Japan Railway Company reformed a departure sign system which rung electric bells to music. It is said that some users complained about the tone quality of the bell ringing. Nowadays, more
people find themselves in a daily crowded and noise-packed train. While getting on/off a train every 20 seconds, we can hear the repeating announcements, ringing departure music, shouting
whistle, and the warning bell.
What are the causes of the noise problems? It is possible to point out 3 reasons. In the first place, Japan public corporations offer a surplus service to us. In the second place, some departure music
does not work as departure signals. Furthermore, a lot of people are unconcerned about the environmental sound issue. For the moment, let us look closely at the effect of music.
There are many studies on the effect of music in social environments, of course. According to many researchers, music can influence the extent to which consumers interact with commercial
environments. Dube, Chebat and Morin found some interactive effects associated to musically induced pleasure and arousal on consumers' desire to affiliate with bank employees (1995). North
and Hargreaves found that the liking for music in a student cafeteria was related positively to the diners' willingness to return (1996,1998). Areni and Kim found that classical music led to
customers buying more expensive wine in a wine cellar (1993). Other studies have indicated that music mediate affect toward the store or store image (for instance Bawa et al., 1989; Milliman,
1986; Golden & Zimmer, 1986) These studies have focused on some relationships between music listening and its commercial, or social context. In short, the conclusion that can be drawn from
these social conditions is that some music evokes positive affection between music and its context. However, on the platform, there is some doubt whether we can enjoy listening to musical
fragments while boarding a crowded-train. Does the music have positive effects on the irritated passengers? Does the departure music fulfill its' function well or does the music further aggrieve
the situation? It needs further consideration. Judging from the above, it is no exaggeration to say that Japan Railway Company's idea that change to the bell into music is a short-circuit.
Our results indicated that some musical features of departure music are deeply connected to the passengers' emotions. Subjects' judgements could be divided into 3 main factors: metallic, pleasant
and powerful. The metallic factor had some relevance to the lowest tone height. If the music is played at higher pitch range, we would feel the metallic impression strongly. The pleasant factor
had some relation to the progression of cadence and modulation. It is likely that if the music is played with an unexpected cord or key-change, we are displeased. The powerful factor was strongly
connected to the fast tempo and its tone density. It seems also reasonable to suppose that if the music is played at a fast tempo or quick rhythm, we feel a strong energy.

These views have much in common with the data of classification by the 4 musicians. The melodious music was evaluated as being more noisy, unpleasant and powerful than the sound-logo.
Most of all, the subjects responded that an unpleasant feeling was evoked while listening to MM6 which are played in Tokyo station, and MM4, which are played in Otya-no-mizu station. These
excerpts are composed of the vivid rhythm and realized an unexpected chord.
Departure music MM1 and MM2, which are played in Shinbashi and Gotanda stations, were received as powerful. The fast tempo and high tone density had characterized them.
On the other hand, some sound-logo music was received in various ways. Departure music S1 that are played in Shinjuku station was assessed as being unpleasant like MM6 and MM4. The
simple melody and the irregular meter characterized the music excerpt. Some subjects felt S3 and S4 pleasant, which are played in Midousuji-line and Turumi-ryokuti-line. These departure music
are characterized by a simple melody. It is not clear whether there is any difference among S1, S3 and S4 except meter. There is a room for argument on this point.
Finally, we would like to warn against the departure music that will become a part of station area like a commercial space. We should be careful to recognize the treatment of music in a public
space. Although music might lead to positive effects if it is played in a store, we have to stay and think through the meaning of it; why does the departure music ring? And how do people feel?
It is useful to quote from M. Schafer (1992). "In the external soundscape the ear is always wavering between choices." We must not forget that various kinds of departure music have connected to
the peoples' pleasant/unpleasant feeling strongly. As Namba and Kuwano acutely pointed out, noise problems are an inter-disciplinary study, and related to ecological aspects of environment
(1993). Future research should focus on the new melody that reflected this result. It would be helpful to know what type of departure music would be most accepted.
References
Areni, C.S. & Kim, D. (1993). The influence of background music on shopping behavior: Classical versus top-forty music in a wine store. Advances in Consumer Research, 20, 336-340.
Bawa, K., Landwehr, J.T., & Krishna, A. (1989). Consumer response to retailers' marketing environments: An analysis of coffee purchase data. Journal of Retailing, 65, 471-495.
Dube, L., Chebat, J.C. & Morin, S. (1995). The effects of background music on consumers' desire to affiliate in buyer-seller interactions. Psychology and Marketing, 12, 305-319.
Golden, L.L., & Zimmer, M.R. (1986). Relationships between affect, patronage frequency and amount of money spent with a comment on affect scaling and measurement. Advances in Consumer
Research, 13, 53-57.
Igarashi, J. (1993). Development of community noise control in Japan.
The Journal of the Acoustical Society of Japan, Special Issue on the creation of comfortable sound environment, Vol. 14-3, 177-180.
Milliman, R.E. (1986). The influence of background music on the behavior of restaurant patrons. Journal of Consumer Research, 13, 286-289.
Namba, S. & Kuwano, S. (1993). Global environmental problems and noise.
The Journal of the Acoustical Society of Japan, Special Issue on the creation of comfortable sound environment, Vol. 14-3, 123-126.
North, A.C., & Hargreaves, D.J. (1996). The effects of music on responses to a dining area. Journal of Environmental Psychology, 16, 55-64.
North, A.C., & Hargreaves, D.J. (1998). The effect of music on atmosphere and purchase intentions in a cafeteria. Journal of Applied Social Psychology, 28, 24, 2254-2273.
Ogawa, Y., Mizunami, T., Yamasaki, T., & Kuwano, S. (1999). The preference of signal music at railway stations in Tokyo. Children and Music: Developmental Perspectives, 233-240.
Sasaki, M. (1993). The preference of the various sounds in environment and the
Discussion about the concept of the sound-scape design. The Journal of the Acoustical Society of Japan (E), Vol. 14, 189-195.
Schafer, R. M. (1992). Music, Non-music and the Soundscape. In John Paynter, Tim
Howell, Richard Orton and Peter Seymour (Eds.). Communion to Contemporary Musical Thought. Routledege. 34-45.
Sterne, J. (1997). Sounds like the Mall of America: Programmed Music and the Architectonics of Commercial Space. Ethnomusicology, Vol.41-1, 22-50.

Van de Borre, J. (1998). Human ecology and modern sound pollution. A critical view
on the influence of music and noise on modern society and on its music education. Paper presented to the International Society for Music Education, Pretoria, South- Africa.
Appendix Table 1-1
Sign Name of Station Time Modulation Tone The The Variance Mean
density Lowest Highest
tone tone
MM1 Shinbashi 17.173 0 6.871 67 88 4.4 76.47
MM2 Gotanda 9.707 1 8.551 56 84 6.97 70.31

MM3 Otya-no- 7.179 1 6.825 62 87 7.68 74.31
mizu 1
MM4 Otya-no- 9.717 1 6.072 65 79 3.4 73.85
mizu 2
MM5 Meguro 9.589 1 4.171 67 86 4.63 80
MM6 Tokyo 7.233 1 8.434 64 81 4.43 73.86
S1 Shinjuku 4.021 0 4.476 54 75 5.11 66.92
S2 Midousuji 8.256 0 1.211 69 78 3.35 73.33
line 1
S3 Midousuji 2.176 0 3.676 74 81 2.52 78.13
line 2
S4 Yotu-bashi 5.131 0 2.729 66 81 4.42 72.37
line 1
S5 Yotu-bashi 3.477 0 3.451 66 81 5.55 74.26
line 2
S6 Harajuku 7.307 0 4.379 57 76 6.36 68.05
S7 Turumi-ryokuti 4.821 0 2.904 65 70 2.2 67.78
line1
S8 Turumi-ryokuti 4.885 0 2.866 58 67 2.99 63.44
line2
Appendix Table 1-2

Sign Harmony Rate of Cadence 1 Cadence 2 Factor 1 Factor 2 Factor 3

Harmony Modern Jazz/pop
transition
MM1 1 1 0 0 1.35 -0.548 1.86
MM2 0 1 0 1 0.59 0.419 1.797

MM3 0 1 0 1 0.77 0.99 0.687
MM4 0 1 0 1 -0.287 1.935 0.953
MM5 1 1 0 0 1.272 -0.034 1.072
MM6 1 1 0 0 0.04 2.095 0.478
S1 0 1 1 0 -1.253 2.178 -0.393
S2 1 0 0 0 1.643 -0.059 -1.597
S3 1 0 0 0 2.459 -0.655 -1.178
S4 0 1 0 1 0.53 -0.45 -1.181
S5 1 0 0 0 0.215 0.326 -2.032
S6 1 0 0 0 -0.566 0.632 1.14
S7 1 0 0 0 1.316 -0.646 0.066
S8 1 0 0 0 -1.421 -0.113 -0.921
Back to index

RELATING KNOWLEDGE AND LIKING OF A PARTICULAR MUSICAL STYLE TO
Proceedings paper

FUNCTIONALITY: A STUDY OF ADOLESCENTS AND TECHNO-MUSIC
Mirjam Schlemmer
Background:
During the last ten years, Techno has developed into an independent style of popular music. In
comparison to other kinds of popular music, such as Pop or Rock music, Techno is a very unique
style. Techno music has neither lyrics nor musical structure or form like Pop songs, but a strong,
repetitious rhythm (Keller, 1995). It does not have groups or singers to identify or fall in love with.
The typical Techno "band" is comprised of a computer and DJs, who programme computers to create
Techno. The first pieces of Techno were produced not to be sold in shops, but to be performed
directly in a club as a unique performance (Jerrentrup, 1995).
The presented study includes questions on the psychological context and function of Techno music, or
simply: Who likes Techno and why do they like it.
Everybody prefers one musical style to another, a person might like Pop songs but completely reject
Reggae. Differences in musical taste can be distinguished by looking at the music people listen to, at
the recordings they buy or at the specific live performances they attend. This does not mean that the
preferred music in all three examples is just one and always the same musical style. However, a "real"
fan prefers his or her style to others in all three cases. The measurement of musical taste is not straight
forward. Different studies can include different musical styles, which are more or less specifically
described. A lot of studies distinguish between Pop, Rock, Reggae, Hip Hop and Heavy Metal, which
could all be classified as "popular music" (Lewis 1996, Hakanen and Wells 1990). They rarely
distinguish between genres of "classical music" such as Baroque or Romantic (Kemp 1997). Different
studies on musical preferences are also hard to compare because of the methodology that is used. For
example, one set of researchers may look at verbally expressed preferences, another at the music that
participants actually listen to, and a third on the purchase of recordings.
Assumptions of studies on musical preferences are often based on given data sheets, where
participants rated their preferences concerning musical style. It is unclear whether participants know
all characteristics of a certain style and know differences of i.e. Pop and New Age. An unknown term
could easily be rated as 'disliked', whereas an audio example would be rated as 'liked'. Another critical
point about some studies is the use of questions like 'Do you like or dislike' something. To rate a
specific style as 'liked' does not necessarily mean that people have ever bought a recording of that
style or would go to a concert. It does not give any other information about how integrated this
particular music in peoples' everyday life is.
Behne (1997) suggested that different methodologies could influence the result of musical
preferences. He investigated musical preferences in a cross-sectional design, using a procedure which
file:///g|/poster2/Schlemme.htm (1 of 10) [18/07/2000 00:35:31]

looked at the preferences of played musical examples without telling the participants the name of
composition or composer, and preferences answering to questions like 'Do you like Jazz'. The
preferences, as reported by Behne, "are far from being identical". Even though Behne's main interest
was a specific use of music ('Musikerleben'), the data supported the need of a methodology for
investigations in musical preferences where both sides -participants and researchers- agree about the
specific characteristics of musical styles.
To some extend Lewis (1995) took a double check of musical preferences into account. Participants
picked their favourite style out of a list of ten major musical types and answered a question on their
favourite recording. This methodology shows more accurately whether the favourite recording
represents one of the styles picked out of the given list. Having a higher number of favourite CDs or
recordings to name could give even more information about the strength of a preference, be it an
artist, a group or a style.
The present study considers the importance of using a methodology that includes different ways to
specify musical preferences, such as verbally expressed preferences, preferences made after played
examples and the integration into everyday life.
Being a fan of a certain musical style includes almost always being part of a specific subculture.
According to Ruseell (1997), subcultures stay apart from mainstream culture in various ways. The
main variable can be being a fan of Manchester United, Bruce Willis or a musical style like Country,
Heavy Metal or Hip Hop. To every main variable other variables are added, i.e. specifically coloured
T-shirts for football-fans and particular types of clothing for different musical styles. Sometimes
ideological or political components are important as well.
Some added variables for Techno fans have already been investigated. Rose (1995) analysed the
typical Techno fashion, which is based on the character of the music itself. One basic element of the
music is sampled sounds. The fashion uses samples, as well as taking a lot of ideas from the sixties
and seventies combined with "pop-art" sportswear and label-logos. The material of cloth and other
fashion products reflects the technical and artificial production of the music. Plastic and shiny
materials are very popular in Techno fashion.
Other factors, besides visible elements of the culture like political involvement have not been
investigated yet.
Various studies deal with the functionality of music and personal choice in certain situations.
Research can look perspectives at participants' use of different musical styles:
a. from actual music they listen to in particular and
b. participant's reflection on the type of music they would chose in specific situations.
Rosenbaum and Prinscy (1987) made use of the first approach. Participants had to list three of their
favourite songs, followed by a personal description of the song and a reason why they liked it.
Participants were not asked to write an explanation why thy liked the song themselves, but to choose
one explanation out of a list of seven, as the researchers were additionally interested in the importance
of lyrics for choosing a favourite song. Some participants did not pay any attention to the lyrics and
explained their personal choice with "It's good to dance to", which suggestd that music can have
different functions.
Sloboda (1999) applied a comparable methodology. Participants were asked to answer a general
question like "Could you please tell us all about you and music". Included were some cues such as
"Do you use music in different ways?", concerning questions on music of their own choice and cues

like "Do you enjoy music in pubs or supermarkets" concerning music in public places. Participants
named their private use and preference of music as well as their liking of music in public places in a
personal letter to the researcher. This study suggests that people choose their music carefully to reach
a psychological function. Music can be used as a 'reminder of valuable past events' or for mood
changes. The most popular activities while listening to music are doing housework, driving, running
or cycling. The specific functions can be labelled, but it is not possible to ascribe one style of music to
one particular activity. To concentrate on one group of fans can give information on whether there are
preferred or excluded activities while listening to that particular style.
Another way of looking at the use of music was introduced by Behne (1997). The research is based on
the 'Uses and Gratification approach' by Katz et al (1974) and gives situations with different
emotional connotations. Participants rated the kind of music they would like to listen to in given
situations on a seven-point Likert scale with eight pairs of opposite adjectives. The activity Behne
focused on was a specific type of listening to music with joy and appreciation. These data show a
similar tendency as Sloboda's: participants know what kind of music they want to listen to in a
specific situation.
North and Hargreaves (1996) employed a more complicated methodology. The main idea of the study
is that musical preferences are associated with the listening environment. In contrast to Behne, North
and Hargreaves did not focus on only one specific situation, but gave 17 different situations. To
specify the music participants would like to listen to in these situations, 27 musical descriptors were
given to rate their musical preferences. The musical descriptors were a mixture of adjectives such as
'familiar', 'sad', 'beautiful', styles like 'Jazz', 'Pop', and 'Classical' and descriptors related to activities
like 'can dance vigorously to it'. The importance of a musical descriptor for a situation was rated on an
11-point Likert scale. Two steps led to situations with similar emotional connotations firstly, and
secondly to information on the preferred music in the given situation. Two factor analyses were made.
The first investigated the relationships of the musical descriptors and yield to six factors. The first
factor is the most important for my study. Descriptors which loaded positively on factor I were loud,
strong rhythm, invigorating, can dance vigorously to it, attention-grabbing, exciting/festive, and pop
music and negatively loaded quiet, relaxing/peaceful, classical music, beautiful, and lilting. The factor
was interpreted as arousal. The musical style Techno might be an example for this factor, the
descriptor 'pop music' neglected.
The second factor analyses investigated the relationship between the proposed 17 situations and yield
to five factors. Factor I with the situations jogging, with your Walkman on, at a nightclub, at an end of
term party with friends, doing the washing-up, ironing some clothes, driving on the motorway, and on
Christmas Day with your family loaded positively on this factor, which was interpreted as activity.
Between situations with similar emotional connotations and similarities in reported preferred music in
the situations product-moment correlation were calculated. The result showed a small but significant
value (resulting coefficient = .21). The results can be interpreted as follows: the more similar the
emotional connotations in two proposed situations are, the more similar is the preferred music
reported by the participants. For example, situations that seem to be arousal in nature (like dancing
and jogging) are associated with musical descriptors that increase arousal as well. The musical
selection supports the atmosphere of a situation. The presented data show the character the music
should have if used in specific situations, but it can not give information on one particular style and
the actual preferred activities.
The present study

Hypotheses
1st Hypothesis: There will be a positive relationship between knowing Techno and liking it.
Techno music is a musical style with specific characteristics which are quite different from other
popular styles such as Rock because it does not have lyrics, verses, clear musical form or a
'song-melody'. On the other hand Techno is similar to styles such as House and Drum'n'Bass because
of e.g. the repetitive rhythm. Since these styles are not broadcasted in general Radio or TV
programmes you must be very involved in the culture which use it in order to understand the
differences between such styles. These people are likely to use the music and are likely to like it. It
does not mean that all other participants must not like it, but they need not know the correct name, as
shown by Behne (1997).
2nd Hypothesis: Because of the musical characteristics of Techno is dancing is the most preferred
activity when fans listen to Techno.
This Hypothesis was suggested on the base of two matters: suggestions based on the study by North
and Hargreaves (1996) and two pilot interviews with Techno fans. Pilot interviews suggested a strong
preference for Techno as dance music. One reason a participant's friend does not like the music was,
"She couldn't dance to it". The repetitive rhythm and the flowing shapelessness makes it possible to
dance for a long time, even though it could be tiring just to listen to.
3rd Hypothesis: The more somebody feels comfortable to dance to Techno, the more he/she likes to
listen to this music in other situations.
In North and Hargreaves' study music, described with factor I by the musical descriptors, is
appreciated in more situations than just dancing. Techno, as a possible representative of factor I,
might be preferred in other situations. However, feeling comfortable dancing to Techno is the
prerequisite for liking to listen to Techno in other situations. Whoever does not feel comfortable
dancing to Techno music does not like to listen to it. Listening to Techno in other situations could
remind you on the pleasure you get in a club what makes you feel better.
The most important research questions for the interviews are not formulated in an hypothesis. For
example 'Why do people like Techno' can lead to different answers which can hardly be predicted by
hypotheses, since various topics concerning Techno are not investigated yet and the uniqueness and
specialities are unknown.
The study was conducted in two steps: a quantitative part basically to find participants for interviews
and to answer the question 'Who likes Techno?' and a qualitative part to understand why they like it.
Step one
The main purpose of the quantitative part is to find real Techno fans. The literature discussed earlier
suggests three steps for finding fans of a certain style. Firstly, participants listen to musical extracts,
determine the style and rate how much they like the music. Secondly, participants answer general
questions on their preferred musical styles and thirdly list their favourite recordings.
The purpose of the questionnaire is to divide participants into two groups: a) students who really
know what Techno is (connoisseurs) and b) students who do not know what Techno is (non
connoisseurs). Participants are also divided into one group of Techno fans and another group of
non-fans. Further questions were i.e. how much they liked the extract, how likely they would listen to
this kind of music in particular environments and how likely they would dance to it and to buy

recordings of it.
Method:
153 students from a College and a University in Middle England, students of a German Gymnasium
and Universities in Berlin (age 16 to 30 participated in the first part.
Five musical extracts were presented to the participants in groups of up to 20 students with an audio
tape-recorder. After every extract participants had to fill in a questionnaire. The answer to the
question 'How would you call the style of the extract' divided participants in connoisseurs and non
connoisseurs. Five one-minute extracts were used as stimuli. They represented the musical styles: a)
Drum 'n' Bass ("drum in a grip" by Logical Progesstion), Techno ("Deltroid" by Hardline2), House
("Blow Ya shistle" by J. Dubs), Pop (instrumental part of "Fantasy" by G. Michael) and Trance
("Ritual of Life" by Sven Väth).
Step two
The second part consisted of interviews with Techno fans to answer the question why people like it. In
which situations is Techno the preferred music and which situations can be excluded? What is so
special about Techno and its culture in the eyes of Techno-philes?
Method
11 students took part in the interviews. Participants were chosen because they were able to distinguish
between the extract of Techno and Drum 'n Bass, House, Trance, and Pop. As long as they were able
to identify the 'Techno' extract, they did not have to name the other styles correctly, but should not
entitle any of them as Techno. One of their favourite styles should be Techno and they should count
specific Techno pieces to their preferred recordings.
The interviews integrated 15 open-end questions and were taped and transcribed afterwards. The
interviews included questions on the participants' personal associations with Techno, social
influences, how long the have known the music, circumstances of the first contact, perception,
functionality and use of Techno.
Quantitative Results
The quantitative part was conducted for two reasons: Firstly, to find participants, who are 'real'
Techno fans, including knowing and liking of played extracts, questions on the preferred style, a list
of favourite recordings and questions on everyday use of the musical styles represented by played
extracts. Secondly, to use parts of the questionnaire to answer the Hypotheses.
1st Hypothesis
Looking at relationships between liking the musical extracts, knowing the style and preferring the
musical style 1st Hypothesis can be supported.
One way ANOVA with knowledge of style as the independent variable and liking of the extract as the
dependent variable showed a significant effect for the Techno extract (F(2, 138) = 11.78, p< 0.001).
Tukey post-hoc tests showed that the significance between the group that correctly identified the
Techno example (mean(liking) = 3.48 with 1 = liking a lot and 5 = not at all) and all other participants

(mean(liking) = 4,26,).
A one way ANOVA with 'Techno' as favourite style as the independent variable and liking the
extracts as dependent variable suggests that there is a significant difference between participants
whose favourite style is Techno and all other participants, with regard to liking the extracts of Techno
(F(1,147) = 40.39, p < 0.0001) and extracts similar to Techno but not the Pop extract (F(1,143) =
2.656, p = 0.151).
Both results suggest that the prediction made in the 1st Hypothesis - that participants who knew the
name of the style 'Techno' liked the extract significantly more than participants who did not know the
style name- is relevant. The connoisseur-group was the one who generally likes Techno.
2nd Hypothesis
Spearman's rho between liking the Techno extract and liking to dance to Techno shows positive
correlation between liking the Techno extract and the likelihood of wanting to dance to Techno
(Speyrman's rho = 0.661, n = 151, p < 0.0001).
Results of one way ANOVA of liking a certain style (included are Techno, Classic, Rock, Pop, Jazz,
Indie, and Heavy Metal) as the independent variable and how likely somebody would dance to the
Techno extract as the dependent variable, suggests that a predisposition towards liking to dance to
Techno depends on the musical style somebody prefers. Results propose that Techno fans rather
dance to Techno music than non-Techno fans (F(1,153) = 21.24) p< 0.0001, mean of fans = 3.33,
mean non-fans = 4.24). Participants who listed other musical styles as their favourite were not very
likely to dance to Techno. Fans of Classical music (mean fans = 4,35, mean others = 3,78, F(1,
153)=4.714, p = 0.031), and Heavy Metal (mean fans = 4.35, mean others = 3.75, F(1,153) = 5.836, p
= 0.017) were significantly less likely to dance to Techno than non-fans of this styles.
The 2nd Hypothesis is supported by both analyses; they suggest that the more participants liked the
Techno extract, the more they would like to dance to Techno. Only participants whose favourite style
is Techno would like to dance to Techno; fans of different styles were not very likely to dance to it.
3rd Hypothesis
Spearman's rho correlation between being likely to dance to Techno and being likely to listen to it in
other situations -given in the questionnaire- showed that the more somebody likes to dance to Techno
the more he or she would like to listen to it in other situations and environments. The included
situations were: at home (Spearman's rho (Sr) = 0.824, n = 157 , p < 0.0001, n and p are the same
values for the following situations), in a bar (Sr = 0.451), with friends (Sr = 0.678), to relax (Sr =
0.438), to energize (Sr = 0.717), being likely to go to a concert (Sr = 0.825), how often they attend (Sr
= 0.705), and being likely to buy (Sr = 0.724).
A one way ANOVA of being a fan of a certain style besides Techno as the independent variable and
how likely somebody would listen to Techno in different situations was applied to investigate whether
fans of different styles are also likely to listen to Techno in different situations. None of the results
show that fans of another musical style like listening to Techno in other situations. Some results even
propose an opposite tendency. Fans of Classical music or Heavy Metal, for example, are less likely to
listen to Techno to arouse and energize or to listen to it with friends than non-fans of these styles

(arouse/energize for Heavy Metal fans: mean fans = 4.45, mean others = 3.88, F(1,153)=5.68, p =
0.018; for classic fans: mean fans = 4.5, mean others = 3.91, F(1,153= = 4.738, p = 0.03; 'with friends'
for Heavy Metal fans: mean fans = 4.18, mean others =3.62) F(1,153) = 5.551, p = 0.02; for classic
fans: mean fans = 4.32, mean others 3.63, F(1,153)=6.739, p = 0.01; with 1 = very often and 5 = not at
all).
All results presented here support the 3rd Hypothesis, that participants who like to dance to Techno
like to listen to Techno in different situations as well. Participants who like other musical styles
except Techno are not likely to listen to Techno in any other situation, sometimes even less likely than
non-fans of a particular style. Techno fans also report that they would be likely to listen to Techno in
order to relax. This data seem to contradict North and Hargreaves idea that the musical selection
'augment' the connotations and that therefore relaxing should not be one of the favourite activities
while fans listen to Techno.
Qualitative Results
The quantitative part with the three-steps methodology divided participants into different groups.
Firstly a group of connoisseurs and another group of non-connoisseurs. Secondly questions divided
participants in 'committing Techno-fans' and fans of other musical styles, and thirdly in those who
count Techno recordings as their favourite and another group who do not.
Only participants who gave 'positive' answers in all three groups took part in the interviews: they
knew the right name for the Techno extract, did not get confused by similar extracts, one of their
favourite musical style is Techno and their favourite recordings include Techno.
Interviews gave a) more specific information which leads us to accept the Hypotheses and b) gave
information emphasizing the uniqueness of Techno, which does not show in questionnaires.
The 2nd and 3rd Hypotheses can be supported by statements given in interviews. Over and above
underline interviews the importance of dancing as the favourite activity.
Compared to all other activities mentioned during the interviews 'dance' is used more often than all
other activities together (mean 'dance' = 16,45, mean all other activities = 6.0, M = 11.1, SD = 6.51,
t(10) = 2.11, df = 9, p = .063). Included were relax/chill, work and driving in a car.
Even though 'dance' has been an important issue in all interviews, the context in which this word was
used was different. Some situations were described without using the keyword 'dance'. Different
statements of first association of Techno underline the importance of Techno as dance music. e.g. the
first association by Chris (30 year old English student)
Ahm, dancing, you know, yea dancing ... in Clubs.
Reasons to prefer Techno as a music to dance to are strongly related to musical elements. A possible
explanation why people do not like Techno was:
Why? I think, al lot of people don't like it because it is very repetitious, but I think that's a good thing
about it, because it is a regular rhythmic beat, ah, you tend focus on that, and your mind can take you
elsewhere, whereas in lots of like guitar music or folk music you tend, to concentrate on the lyrics, or
the performance but in dance music it is more the rhythm.
Besides backing the formulated Hypotheses, interviews gave answers of the research question why

people like Techno. The given reasons can be classified into two groups. One is related to dancing to
Techno and the other to the subculture, as perceived by participants.
What makes dancing in Techno clubs special for its fans is a more relaxed atmosphere. In contrast to
other Clubs nobody is there to record your dancing movements' (Clair, 24 year English student) or in
other words: Techno is not focused on dancing with a partner or integrates a 'give someone a come
on-game'.
According to Thomas, a 25 year old German student, Techno creates a 'community' where people
have a similar way of thinking. This statement seems to be supported by other participants, who even
specified cultural characteristics: they perceive Techno culture as open minded and able to break
down cultural and sexual barriers.
Techno fans are able to specify situations when they listen to Techno and situations when they prefer
different music. For them, situations like doing housework, driving a car, preparing to go out at night,
getting into a good mood or even 'producing a creative mood' (Thomas) are strongly connected with
Techno music. The choice of specific styles depends on personal preferences. Chris reported not to
listen to 'aggressive Techno' at home and Clair also prefers soft Techno for driving or chatting with
friends.
Almost all participants mentioned that they do not listen to Techno in order to relax, because it is not
slow or calm enough. Participants were never asked directly whether or not they relax to Techno, but
most of them mentioned it during the interview, like Beate (17 year old German student):
I've got such a CD with various things on it, nice things, when it should be calm, for relaxation and
so. Well, Techno is nothing to relax to [...].
Discussion
The analyses of the quantitative part suggest a strong relationship between knowing and liking a
musical style. Knowing does not only include the ability of telling what Techno is, but also the ability
to distinguish Techno from different musical styles like Pop as well as from styles close to Techno
like House or Trance. This ability can be found by Techno fans and did not appear among non-fans.
The most preferred activity of Techno fans seem to be dancing. The preference of Techno as dance
music could be understood as 'key'-reason for liking to listen to Techno generally. He or she who does
not like to dance to Techno does not tend to listen to Techno in other situations.
Data from the questionnaires did not really show the 'front-runner' position of the activity dancing.
However, comments in the interviews established the implication that dancing is the most preferred
activity while fans listen to Techno.
Fans perceive dancing to Techno as different from dancing to e.g. Pop music. The uniqueness of
dancing to Techno is caused by the special musical elements, which are different from structured
popular music. The combination of a strong and dominant rhythm with dancing creates a feeling
which could described as hypnotic. The attention is not drawn to the lyrics or the performing band.
Not only the dance situation is perceived as different compared to Pop or Rock concerts or clubs, but
also the culture itself. Fans look at their culture as open minded and less prejudiced against minorities
than other musical cultures. Compared to mainstream clubs the atmosphere is described as more
friendly. This could be the effect of a different drug use and can be further investigated.

What type they choose, differing in tempo or soundcombinations, depends on whether they are
driving in a car, talking with friends or dancing. A situation where even fans do not like to listen to
Techno is that of trying to relax. The music Techno fans choose to chill out tends to be calmer and
slower than Techno. This supports North and Hargreaves' explanation that musical selection supports
the atmosphere of a situation.
The methodology, which is divided into two parts, made it possible that the first part selected suitable
participants for the interviews. The important first part - the selection of interviewees - was structured
in order to find participants who were firstly able to distinguish between similar types of Techno and
styles close to it. The ability was proven by being able to identify styles of played musical extracts.
Secondly, participant who were included in interviews, appreciate the Techno extract, were likely to
listen to it in different environments and were likely to dance to it. And thirdly one of their favourite
styles is Techno. This information was asked among personal questions. The style was doublechecked
with participant's favourite recordings, which included e.g. Techno CDs, to make sure that a research
expectation did not influence the chosen style, as it was not difficult to conclude - after four musical
extracts all similar to Techno - that the study was on Techno music. The three steps methodology
enabled me to find participants who are really involved in this style and represent a good sample of
Techno fans.
Besides using the quantitative part to identify fans, the combination of quantitative and qualitative
approach gave evidence that in some parts qualitative approach can clarify quantitative outcomes.
Qualitative data was essential not only to support the 2nd Hypothesis that dancing might be the most
important activity of Techno-philes, but also to come to the controversial result that people might
listen to Techno in order to relax. North and Hargreaves' research could suggest that Techno is not a
relaxing music. However, differences between the mean of Techno fans and the mean of non fans
suggest that fans are significantly more likely to relax to Techno than other participants. On the other
hand, by interpreting the interviews it becomes obvious that even fans don not count Techno to music
to relax to. The differences between quantitative and qualitative outcomes could be caused by the
nature of question. The quantitative part integrated a more hypothetical question, participants were
asked whether they were likely to listen to Techno for relaxation. However, the interviews gathered
information on personal use and the answers are therefore not presented as hypothetical, but show a
picture of real attitudes.
The deciding point is that the methodology applied discovered the real fans among all participants and
questioning only those led to a more detailed view on Techno fans and cultural elements.
References
Behne, K.E. (1997a). The development of "Musikerleben" in adolescence: How and why
young people listen to music. In Deliège, I. and Sloboda, A. (Eds.) Perception and
Cognition of Music., Hove: Psychology Press.
Hakanen, E.A. and Wells, A. (1990). Adolescent Music Marginals: Who Likes Metal,
Jazz, Country, and Classical. Popular Music and Society. 14.4, 57-66.
Jerrentrup, A. (1995) Techno Musik. In Henger, M. and Prell, Matthias (Eds.) Popmusic
- yesterday - today - tomorrow, p. 107-121, Regensburg.
Katz, E., Blumler, J.G., Gurevitch, M (1974)Utilization of mass communication by the
individual. In Blumer, J.G. and Katz, E. (Eds.) The uses of mass communication. London.

Keller, M. (1995). Endlos schlaufende Vierviertelmaschinen - Eine Musikanalyse. In

Anz, P. and Walder, P. (Eds.), Techno, p. 123-131, Zürich: Verlag Ricco Bilger.
Kemp, A. (1996) Individual differences in musical behaviour. In D.J. Hargreaves and
A.C. North (Ed.) The Social Psychology of Music., Oxford University Press, p.25-45.
Lewis, G.H. (1995). Taste Cultures and Musical Stereotypes: Mirrors of Identity?
Popular Music and Society. 19.1, 37-72.
North, A.C. and Hargreaves J.D. (1996). Situational influences on reported musical
preferences. Psychomusicology, 15, 30-45.
Rose, E. (1995). Die Ästhetik von Techno. In Anz, P. and Walder, P. (Eds.), Techno, p.
162 - 169, Zürich: Verlag Ricco Bilger.
Rosenbaum, J. and Prinsky, L. (1987). Sex. Violence and Rock 'n' Roll: Youths'
Perception of Popular Music. Popular Music and Society. 11.2, 79-89.
Russell, P.A. (1997). Musical taste and society. In D.J. Hargreaves and A.C. North (Ed.)
The Social Psychology of Music., Oxford University Press.
Sloboda, J.A. (1999). Everyday Uses Of Music Listening: A Preliminary Study. In print.
Back to index

Mr Mark Tarrant
Proceedings paper
MUSIC AND ADOLESCENTS' INTERGROUP BEHAVIOUR
Mr Mark Tarrant
mt37@leicester.ac.uk
Background:
A dearth of research has considered the contribution that music makes to social
identity in adolescence. The research that has addressed this has indicated
that adolescents may use music as a means of distinguishing their own peer
group from other groups. More specifically, adolescents have been shown to
associate the ingroup with positively stereotyped music to a greater extent,
and negatively stereotyped music to a lesser extent, than they associate an
outgroup (see Tarrant, North, and Hargreaves, 1999). Such findings are
consistent with social identity theory (Tajfel, 1982). However, to date
research has not investigated the importance of music relative to other
interests in this process. This is the purpose of the present study.
Aims:
The study will demonstrate the extent to which music is considered a valued
dimension in the intergroup behaviour of adolescents.
method:
175 male adolescents aged 14-15 years took part in the study. They were
recruited from a school in the West Midlands region of the UK. A questionnaire
presented participants with 27 statements concerning adolescents'
attitudes/leisure interests, and participants were required to estimate how
much each statement described members of the ingroup and members of the
outgroup. The statements covered a wide variety of musical and non-musical
interests (e.g. "they enjoy listening to classical music"; "they enjoy
listening to indie music"; "they enjoy watching current affairs programmes").
The participants then rated the 27 items for how desirable or undesirable the
ingroup believed each one to be. The final section contained six items which
assessed level of ingroup identification.
Results:
The results are currently being collated. It is hypothesised that these will
demonstrate support for social identity theory, and will confirm music's status
as a valued dimension in young people's intergroup behaviour.
Conclusions:
The study will be discussed in terms of the wider process of adolescent

development.
Back to index
file:///g|/poster2/Tarrant.htm [18/07/2000 00:35:31]

Listeners may have a variety of reactions to different musical styles
Proceedings paper
Perception Of Musical Styles, People Listening To Them, And Reasons For Listening
Hasan Gürkan Tekman & Nuran Hortaçsu, Middle East Technical University
Listeners may have a variety of reactions to different musical styles. Hargreaves and North (1999)
have listed these reactions as stylistic sensitivity, discrimination, knowledge, liking, tolerance, and
competence. Stylistic knowledge can be defined as knowledge of verbal labels associated with
different musical styles. We took stylistic knowledge to be not only about labels that are used to refer
to musical styles but also about the relationships between different styles, characteristics of different
styles, reasons for listening to different styles, and characteristics of people who listen to different
styles. Each one of these four issues was investigated in a method that involved directing first open
ended than more structured questions to participants. The procedures that were followed and results
that were obtained relating to these four issues are described in separate sections below.
Musical styles and how they are related
The initial task in this research project was to identify a small number of distinct musical styles
familiar to the college student population that responded to our questionnaires. For this purpose, first,
participants were asked to list as many names of musical styles as they could. Then, the names of the
styles that were listed most frequently were given to another group of respondents and they were
asked to group them so that similar styles would be in the same group. The data obtained thus were
used in hierarchical cluster analysis. In this analysis six distinct clusters were identified.
1. Pop music: This cluster contained the labels pop, foreign pop, and Turkish pop. "Pop" was
selected to represent this cluster in the following stages of the research.
2. Rock and metal: This cluster contained rock, metal, and heavy metal. "Rock" was selected to
represent this cluster.
3. Western art music: This cluster contained classical, jazz, and blues. "Classical" was selected to
4. Contemporary dance music: This cluster contained rap, techno, and underground. "Rap" was
selected to represent this cluster.
5. Turkish musical styles: This cluster contained Turkish folk music, Turkish art music, and
Özgün music, which is a more recent development in Turkish music. It shows influences of
both folk and art music and its lyrics contain political comment. "Turkish folk" was selected to
6. Arabesk: This is another style that is indigenous to Turkey. It combines aspects of folk music
and traditional art music of Turkey with the musical styles of Egyptian and Indian movie
musicals, which were popular in Turkey during the 1950's. Lyrics in Arabesk typically describe
the sufferings of a protagonist of lower socioeconomic origins whose aspirations are frustrated
by an unjust fate.
How are musical styles described?
For the purpose of answering this question, first, the respondents who listed musical styles were asked
file:///g|/poster2/Tekman.htm (1 of 4) [18/07/2000 00:35:32]

to list three adjectives that those styles brought to their minds. Second, the adjectives that were used
most frequently were presented to a different group of respondents together with the names of the six
musical styles that were selected for further investigation. Respondents had to rate the appropriateness
of each adjective to each musical style on a five-point scale. The data collected from these scales were
submitted to factor analyses for the six musical styles separately. Then, scale reliabilities were
calculated for the groups of adjectives that were consistently put together in these analyses. As a
result, three main dimensions with satisfactory scale reliabilities emerged.
1. Evaluation: This dimension brought together the adjectives meaningful, pleasant, high quality,
(not) boring, (not) irritating, (not) simple, lasting, and (not) monotonous.
2. Activity: This dimensions brought together the adjectives lively, exuberant, dynamic, exciting,
entertaining, and rhythmic
3. Peacefulness: This dimension brought together the adjectives harmonious, sentimental, restful,
peaceful, and soothing.
The six musical styles clearly differed in terms of the appropriateness of these three dimensions for
them. In terms of the evaluative dimension classical and Turkish folk were rated most positively while
rap and Arabesk were rated least positively. On the activity dimension rock, pop, and rap were rated
highest while Arabesk was rated lowest. Arabesk and rap were rated as the least peaceful as well,
while classical was rated highest on this dimension.
Why do people listen to different styles?
For the purpose of finding out what our respondents thought about this question, first, respondents
were given the names of the six musical styles selected and asked to list the reasons people may have
for listening to them. The most frequently mentioned reasons were presented to a different group of
respondents as five-point scales in the next step and respondents were asked to rate how appropriate
each reason was for listening to each style. The reasons for listening were grouped in factor and scale
reliability analyses as described in the previous section. Four dimensions emerged as main reasons for
listening to musical styles.
1. Listening in the background: This dimension brought together listening for, background
accompaniment, relaxation, feeling good, diverting one's mind from troubles, and passing time.
2. Listening for movement: This dimension brought together listening for dancing, movement, and
catharsis.
3. Listening for appreciation: This dimension brought together listening for thinking and
appreciating art.
4. Listening for identity: This dimension brought together listening for reviving identity, having a
sense of community, nostalgia, and finding expression of one's feelings.
The reasons for listening were well differentiated for different styles of music. Classical and pop were
best suited to listening in the background and Arabesk was least suited to this purpose. Pop, rap, and
rock were the best candidates for listening for movement and Arabesk was the least preferred for this
purpose. Arabesk was also least suited for listening for appreciation while classical was the choice for
this purpose. Turkish folk was rated as most suitable for listening for identity and pop and rap were
least suited to this purpose.
Who listens to different styles?
For the purpose of answering this question, first, respondents were given the names of the six selected
styles and they were asked to describe what kind of person would listen to each one of them. The

adjectives that were used most frequently were presented to a different group of respondents as
five-point scales in the next step. Respondents were asked to rate the listeners of the six musical styles
in terms of these adjectives. They were also asked to rate how much these qualities fit themselves and
how much they would desire to have these qualities. Then, the desirability ratings of the adjectives
were submitted to a factor analysis. Scale reliabilities were calculated on the ratings of the six musical
styles on the adjectives that were grouped together in the factor analysis. Three dimensions describing
the listeners of music emerged with consistently high scale reliabilities.
1. The loser: This dimension brought together the adjectives pessimistic, aggrieved, disturbed,
poor, and defeated.
2. The sprightly: This dimension brought together the adjectives dance-loving, fun-loving, wild,
and vigorous.
3. The sophisticated: This dimension brought together the adjectives (not) unenlightened, young,
educated, refined, and mature.
The six styles differed in terms of how strongly they were associated with each listener type. The loser
type was most closely associated with Arabesk and least with classical. The sprightly type of listener
was most appropriate for pop, rock, and rap, and least appropriate for Arabesk. The sophisticated
listener was most appropriate for Classical and least appropriate for Arabesk. In addition, although
respondents who liked and disliked a musical style did not describe themselves differently they
disagreed when they were asked to rate listeners of that style on these three dimensions. Respondents
who liked a musical style tended to describe listeners of that style more favorable compared to those
who disliked that style.
Conclusion
We observed remarkable consensus on how musical styles and people who listen to them are
perceived. One can say that stylistic knowledge involves a multifaceted representation about musical
styles, why people listen to them, and what kind of people would listen to them. Possible reasons our
respondents reported for listening to different musical styles are consistent with proposals that music
serves the functions of emotional and intellectual stimulation (Berlyne, 1971; Meyer, 1956), mood
manipulation (Konecni, 1982), and creating and consolidating social identity (Crozier, 1997; Sloboda,
1985). The differences between how respondents who liked and those who disliked a musical style
described fans of that style point out the importance of music as a way of expressing social identity
and group membership (Tajfel, 1981).
References
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.
Crozier, W. R. (1997). Music and social influence. In D. J. Hargreaves & A. C. North (Eds.)
The social psychology of music. Oxford: Oxford University Press.
Hargreaves, D. J. & North, A. C. (1999). Developing concepts of musical style. Musicae
Scientiae, 3, 193-216.
Konecni, V. J. (1982). Social interaction and musical preference. In D. Deutsch (Ed.), The
psychology of music. New York: Academic Press.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Sloboda, J. A. (1985). The musical mind: The cognitive psychology of music. Oxford:
Clarendon Press.

Tajfel, H. (1981). Human groups and social categories. Cambridge: Cambridge University
Press.
Back to index

Professor Slawomira Zeranska-Kominek
A STUDY OF TUNE IDENTIFICATION IN ORAL TRADITION
kominek@astercity.net
Background:
Tune identification is a basic component of musical information processing. For

psychological research purposes it is assumed to be a universal, species
specific, cognitive ability of the mind that is not variable socially or
culturally. However, tune identification implies man's auditory knowledge
acquired through experience and learning. The latter depend on the social model
of musical communication, hence on the socially accepted musical identity
concept. In oral cultures people are identity sensitive - they will always
recognise a tune originating in their own culture and will usually tolerate a
change of its parameters, providing that it does not exceed the limits of their
tradition. Tunes which do not meet the identity criteria are immediately and
mechanically rejected as being "different" or "alien". This phenomenon,
frequently discussed in ethnomusicological literature, is very seldom explored
by music psychologists.
Aims:
The above research aimed at (i) identifying the criteria which determine the
process of recognising folk songs which have a range of local and individual
variants, and (ii) examining the extent of tolerance towards alterations of the
parameters which are crucial to the tune identification process.
method:
10 experienced female folk singers from south-eastern Poland were chosen for
the research. They were asked to assess a dozen or so variants of songs
selected from the region's popular folk repertoire. The songs, recorded over
the last 50 years, are rather diverse in terms of melodic-rhythmic structure
and performance style. The research method combined an informal interview and a
"same-or-different" test.
Results:
in process
Conclusions:
In oral tradition, musical thinking is comprehensive, "non-analitical" and

contextual. The basis of tune identity comprises a set of features of a tune's
structure and performance style which cannot be separated and which virtually
cannot be fully controlled. The conducted study may suggest that the
contour/tonal structure of a tune is extremely important because it is a major
song identity criterion.
file:///g|/poster2/Zeranska.htm (1 of 2) [18/07/2000 00:35:33]

Back to index
file:///g|/poster2/Zeranska.htm (2 of 2) [18/07/2000 00:35:33]

SYMPOSIUM: CATEGORICAL RHYTHM PERCEPTION AND QUANTISATION
Symposoum introduction
Symposium: Categorical rhythm perception and Quantisation

Organisers: Chris Jansen, Peter Desain, Henkjan Honing
Music Mind Machine
Nijmegen University
The Netherlands
Rationale
Recent attempts of developing automated music transcription systems reemphasize the need for
detailed knowledge about categorisation in rhythm perception since straightforward grid quantisation
often results in too complex notations. Knowledge about how human musicians perceive and interpret
rhythms has not only theoretical significance, but is also likely to improve automated transcription
systems.
Aims
The symposium aims to gain insight into human categorisation in the perception and interpretation of
rhythmic patterns. First, it attempts to reveal the relationship between quantisation and categorical
rhythm perception. Second, it tries to make explicit the role of context. Third, it aims to discover how
empirical insights can be incorporated into models that may also be part of automated music
transcription systems.
Speakers
Eric Clarke will give an overview of the field, and provide the theoretical platform for the discussion
by making a distinction between continuous (non-categorical) and discrete (categorical) aspects of
rhythm perception. Peter Desain will present experiments which aim to describe in detail the shape of
the rhythm categories and how they are affected by tempo and metrical context. George Papadelis will
show how these categories form and change with increasing musical experience of the listener. Next
to presenting empirical work, Ed Large will propose (an outline of) a dynamical model which should
be able to bring together these different aspects of categorisation in rhythm perception.
Discussant
Bruno Repp, who has largely contributed to the study on categorical perception, will chair the
symposium.
file:///g|/Tue/Jansymp.htm (1 of 2) [18/07/2000 00:35:33]

Back to index
file:///g|/Tue/Jansymp.htm (2 of 2) [18/07/2000 00:35:33]

From: Dr
PIANO LESSONS ENHANCE SPATIAL IMAGERY OF AT-RISK CHILDREN

Dr. Frances H. Rauscher
rauscher@uwosh.edu
Background:
Research suggests that preschool children provided with piano instruction score higher on the Object
Assembly (OA) task of the Wechsler Preschool and Primary Scale of Intelligence.
Aims:
The present study's goal was to understand the nature of this enhancement. The OA task involves (a)
sequential problem-solving, (b) mental imagery formation, and (c) mental image transformation. We
administered a large battery of other tasks which draw upon these to determine the specific cognitive
abilities that are enhanced by music training. Because the failure to develop spatial/abstract reasoning
represents the most glaring deficiency of deprived children, children enrolled in a federal at-risk
intervention program served as subjects.
method:
Eighty-eight at-risk three-year-old children were pre-tested using several visuospatial cognitive,
perceptual, and standardized tests. Instructors visited the children's preschools to provide private
15-min weekly instruction in either the piano or the computer for 24 weeks (14 weeks/year). A third
group of children received no special training. All children were then post-tested. Gordon's Primary
Measures of Music Audiation was also administered.
Results:
Children in the music group scored significantly higher on several tests measuring mental imagery
formation. The magnitude of the effect was similar to that found in previous studies with
middle-income children. Sequential problem solving and mental image transformation were not
affected by music instruction. Scores of the children who received computer lessons or no training did
not differ significantly on most measures. Musical aptitude scores did not significantly correlate with
spatial task scores.
Conclusions:
These findings suggest that music training significantly improves mental imagery formation in young
children. This research will help researchers understand the links between intellectual abilities, and
how development in one sphere might influence the development of related processes in another
sphere.
file:///g|/Tue/Rausche1.htm (1 of 2) [18/07/2000 00:35:34]

From: Dr
Back to index

catperc/paper/3.5.00
Proceedings paper
Categorical Rhythm Perception and Event Perception

Eric F. Clarke,
Music Department,
Sheffield S10 2TN, UK.
e.f.clarke@sheffield.ac.uk
Abstract
There has been a limited quantity of work on the categorical perception of rhythm, and
the work that exists suggests that categorisation is closely linked to the perception of
metre in rhythmic sequences. A possibly misleading consequence of the existence of the
term â€˜categorical perception' is the implication that categorical perception is somehow
â€˜special' and that it only applies in certain circumstances.
The aim of this paper is briefly to review previous work on categorical perception of
rhythm and to argue that perception always has a categorical (and a non-categorical)
component. There is in this sense nothing â€˜special' about the categorical perception of
rhythm. The theoretical framework for this perspective is event perception, and the paper
will argue that rhythm perception is categorical because it is events (which have a
discrete character about them) which are perceived in music. The nature of these musical
events will be briefly discussed. Perception is, however, not only categorical, and it is
important to consider those aspects of perceptual experience that are continuous rather
than discrete. For rhythm, the non-categorical component is experienced as an expressive
or characterising modifier, from which a listener can pick up such things as the
competence or nervousness of the performer, and the quality of movement (real or
imagined) that has given rise to the music.
The implications of this view are on the one hand to make categorical perception less of
an issue in its own right, and on the other hand to show that it is an endemic and intrinsic
feature of perception and therefore of more general significance than its rather technical
name might suggest.
1. Introduction
The term â€˜categorical perception', originating in work on the perception of speech and colour, has
frequently been used to suggest that some perceptual domains are â€˜special' and demonstrate
categorical perception, while others do not manifest this feature and (presumably) demonstrate
file:///g|/Tue/Clarke.htm (1 of 9) [18/07/2000 00:35:36]

continuous perception. This outlook has implied or directly suggested that categorical perception
confers advantages on the perceptual domains to which it applies, such as speed of processing and
distinctness, and (according to some authors) may be the result of â€˜hardwiring', or special
sensitivity. For example, it has been argued that categorical perception in speech is a consequence of
the adaptive significance of language, and has been proposed as one of the components of a putative
language module. By contrast, the approach I will suggest here is that categorical perception is
nothing special at all, and is the inevitable consequence of the sensitivity to events that characterises
an ecological understanding of perception.
2. Categorical Perception
Harnad (1987) provides an overview of the literature on categorical perception in its various
manifestations. Operationally defined, categorical perception is characterised by two effects: i) in an
identification task, as the stimulus material is gradually transformed from one point on a stimulus
continuum to another, subjects show a relatively sharp discontinuity between the probability of
making an 'A'-type judgement to any one of a number of continuously variable stimuli and the
probability of making a 'B'-type judgement; ii) in a discrimination task, subjects show better
performance when a pair of stimuli separated by a given amount âˆ‚ on the stimulus continuum is
taken from across the putative category boundary than when a pair with the same objective separation
âˆ‚ is taken from within either/any of the putative perceptual categories. It has sometimes been
claimed (and for some particular phenomena) that categorical perception is innate, and at other times
that it is learned (see Livingstone, Andrews & Harnad, 1998).
3. Categorical Rhythm Perception
Some 50 years ago Fraisse, in his work on rhythm, proposed a categorical distinction between two
classes of rhythmic duration, which he termed temps longs and temps courts. The distinction between
the two categories was not framed within the classic operational definition given above (Fraisse did
not carry out identification and discrimination experiments), but it nonetheless demonstrates some of
the same principles and properties. Fraisse proposed that events with a duration of less than about 400
msec (temps courts) are qualitatively distinct from events with durations of more than this value
(temps longs) in that events in the former category are not perceived as having duration at all, but
rather what he calls â€˜collection', while events in the latter category are perceived as having true
duration. Fraisse claims that listeners perceive temps courts as event collections which spontaneously
group together into rhythmic Gestalts in which there is no awareness of the duration between attack
points (event onsets), while temps longs demonstrate no such spontaneous aggregation into groups,
and lead to a definite sense of duration between attack points. He summarises the idea as follows:
"Rhythmic structures ... consist of the interplay of two types of value of temporal
interval, clearly distinct from one another (in a mean ratio of 1:2). Within each type the
durations are perceptually equal to one another. The collection of shortest intervals
appears ... to consist of durations less than 400 msec." (Fraisse, 1956, p. 30. Author's
translation)
Fraisse's monograph of 1956 reported further data which can be seen as the precursors of more recent
interest in categorical rhythm perception. He measured the inter-onset durations of subjects'
spontaneous tapping, which when plotted on a histogram showed a strongly multimodal distribution,
with peaks in the distribution at integer ratios between durations (primarily 1:1 and 1:2, with a less
well-defined peak at 1:3). In other studies also reported in the monograph, Fraisse noted the tendency
for subjects to transform their rhythmic productions in the direction of integer ratios when trying to
reproduce non-integer stimulus rhythms, which he explained according to a principle of assimilation

and distinction. The idea (which has a clear Piagetian legacy) is that ratios between adjacent durations
that are non-integer values â€˜migrate' towards nearby integers in reproduction by a process in which
the durations become either more similar to one another (assimilation) or more different (distinction).
For example, two durations of 300 and 400 msec (in a ratio of 1:1.33) will tend to drift towards 350 +
350 (= 1:1) by a process of assimilation; while two durations of 300 and 500 (a ratio of 1:1.66) will
tend to drift towards 270 + 540 (= 1:2) by a process of distinction. While the relationship between the
temps longs/temps courts distinction and the idea of integer ratios is not clearcut, taken together
Fraisse's work suggested the possibility that rhythm might be perceived in terms of categories which
were both qualitatively distinct, and integer related.
In a paper published in 1981, Povel took up Fraisse's suggestion of the â€˜migration towards integers'
and in a reproduction task investigated these migrations in a more systematic and thorough fashion,
incorporating the effect into an explanatory framework that recognised more explicitly the central role
of metre. As with Fraisse, Povel's work cannot be considered to embody the standard demonstration
of categorical perception (he never uses the term in the paper, and uses a reproduction task rather than
the standard identification and discrimination methods), but the â€˜migration' effects in his empirical
results are suggestive of the within-category instability and between-category distinctiveness that is
fundamental to the whole notion of categorical perception.
In a paper published in 1987 (Clarke, 1987) I took up the implicit indication that rhythm might be
perceived categorically and tackled the question with the standard methods of identification and
discrimination. The data reported there showed the characteristic features of categorical perception (a
disjunction in the identification function as subjects switch from one perceptual category to another,
coupled with a peak of discriminability when pairs of stimuli are taken from either side of the
category boundary), together with a metrical effect causing the category boundary to shift so as to
make a larger proportion of the stimulus continuum consistent with the prevailing metre. In simple
terms, the effect of the metrical context is to cause subjects to perceive potentially ambiguous rhythms
in a fashion which supports and confirms the prevailing metre.
Towards the end of the paper, I considered the perception of so-called rhythmic â€˜microvariations' -
those continuous variations in local tempo that have been widely researched in music performance,
and which correspond to small departures from integer proportions at the level of adjacent durations.
The implication of categorical perception, narrowly and rigidly defined, is that these microvariations
should be barely detectable if they occur within categories, and yet they have been widely regarded as
a critical component of expression in musical performance - from both a production and perception
point of view. Furthermore, other studies (e.g. Clarke, 1989; Repp, 1992) have demonstrated that
listeners are sensitive to much finer rhythmic distinctions than the coarse-grained categories would
suggest. The answer to this apparent paradox (as suggested in the 1987 paper) is to recognise that the
microvariations constitute a different kind of rhythmic sensitivity, experienced as expression rather
than rhythm:
"After the temporal information for rhythm has been categorised, any â€˜remainder' (i.e.
any deviations from a perfect categorical fit) is considered to be expressive information,
or perhaps accidental inaccuracy. This expressive information is perceived as
continuously variable, and is used by performers to indicate structural markers in the
music and thus particular interpretations ... It is perceived by listeners as qualitatively
different from the temporal information that specifies rhythmic structure ... The
separation of temporal information into a domain of structure and a domain of expression
resolves the apparent paradox that small whole number ratios are the simplest to perceive
and reproduce, but that real human performances do not conform to these integer

proportions." (Clarke, 1987 p. 30-1)

The point of this is to emphasise that categorical perception, conceived in the manner described here,
does not carry with it the implication that within-category variation is barely if at all detectable and
that between-category variation is clear and obvious: rather, the use of the term category indicates that
acoustical information (in this case the durational component of acoustical information) specifies two
different perceptual domains - expression and rhythm, this latter consisting of two interdependent
components, metre and mode of subdivision (even and uneven).
Schulze (1989) also studied the categorical perception of simple rhythmic patterns, though using
slightly different methods. There were two significant differences in his method as compared with the
study reported in Clarke (1987): first, he used just two subjects who provided a large amount of data
each and were given a period of intensive training prior to the experimental trials; second, subjects
only performed an identification task (from which an index of discrimination was derived) and were
provided with as many response categories as there were stimulus types - by contrast with the
dichotomous response categories that are used in the standard categorical perception approach.
Schulze found that his subjects showed some of the features of categorical perception using these
methods (a nonmonotonic discrimination function for at least one of his four rhythmic continua), but
that the two trained subjects were also able to identify and discriminate between a large number of the
interpolated rhythms (i.e. rhythms lying between the integer valued rhythmic categories) with
impressive accuracy. Interestingly, the two subjects reported "associating a different emotional quality
with each of the rhythmic patterns in a set. This â€˜individuality' of a pattern within a set of patterns is
only established after extensive training with the rhythmic material." (Schulze, 1989 p. 15) One
interpretation of this result, consistent with the view of rhythmic structure and expressive
microvariation offered in my 1987 paper, is that the extensive training allowed the two subjects in
Schulze's study to become attuned to the specific expressive character of each of the eight rhythms
(within each of the four sets of rhythms) used by Schulze, and thus to identify it. The evidence for
categorical perception came from one set of rhythms in particular where both subjects showed a
distinct peak in discrimination around the middle of the rhythmic continuum, though no explanation is
provided for why this set of rhythms in particular should induce this response pattern. In short,
although Schulze's study does provide some evidence for categorical perception of rhythm, it also
suggests that categorical perception is a function of perceptual learning: if sufficient training is
provided, perceivers may learn to identify and discriminate between rhythmic categories which
without training might have been part of a single more undifferentiated category.
As any survey of the theoretical literature on rhythm and metre reveals, rhythm has been defined and
analysed in both temporal and accentual terms, and rhythmic identity can be established on the basis
of either a pattern of durations or a pattern of accents. The only published work to investigate an
accent-based approach to the categorical perception is that of Windsor (1993). The study is an
investigation of the categorical perception of metre, in which an isochronous series of drum sounds
were given different patterns of dynamics so as to correspond to a 3/4 metre at one extreme and a 6/8
metre at the other, with nine intermediate patterns in between. Windsor used standard identification
and discrimination methods, and found good evidence for categorical perception (a discontinuity in
the identification function and a non-monotonic discrimination function) - though in a pattern that was
somewhat different from what might have been expected. Rather than simply indicating two metrical
categories, the discrimination function suggested the existence of three categories, with peaks of
discriminability across each of the two boundaries. These three categories were 3/4, 6/8 and a
non-metrical category lying between them. The idea of a non-metrical category within a categorical
perception study of metre might seem to undermine the idea of categorical perception, but it is
important to note that the central region was not simply an undifferentiated area of variable

perception. The absence of metre is, after all, a perfectly legitimate and quite perceptually striking
rhythmic effect. The discrimination function showed a distinct peak between this central area and both
of the adjacent metrical categories, and the result indicates that categories unanticipated by the
experimenter, and not designed to be in the stimulus materials, may nevertheless make themselves
felt.
The relationship between the microvariations of expressive timing and rhythmic categories is
paralleled in Windsor's study by the relationship between metrical accents and syncopations (or other
kinds of expressive accents). Windsor's results suggest that his subjects were unable to perceive small
differences in accentuation (expressive differences) outside a metrical framework:
"Syncopation and expressive dynamics exist in opposition to metrical accents, giving no
information for metre. ... The important point here is that information redundant in one
domain may be important in another. Just as expressive information can only be
perceived in relation to the non-expressive rhythmic structure, syncopation is
inconceivable outside a metrical framework." (Windsor, 1993, p. 138)
4. Quantisation
Closely linked to the notion of categorical rhythm perception, and using the language and principles
of automated rhythm processing as employed in commercial sequencer programs, is the idea of
quantisation. Quantisation is the process whereby the continuously variable event durations of a real
performance are rationalised into discrete rhythmic values as represented in standard music notation.
For sequencers and notation programs, this stage of processing is required so as to give rise to
practical and musically â€˜sensible' representations that might then be read by subsequent performers,
or to allow one sequencer track to be coordinated with other tracks. However, the process has also
been regarded as analogous to (or even identical with) the perceptual process by which a listener or
co-performer parses the rhythmic structure of a performed event sequence, the quantisation process
filtering out expressive microvariations so as to reveal the underlying rhythmic structure. This
perceptual interpretation of the term (as opposed to its purely operational use) goes back at least as far
as the work of Longuet-Higgins (e.g. Longuet-Higgins and Steedman, 1971) whose pioneering
artificial intelligence work on music took as its starting point the proposition that recovering a
conventional notational representation from a performed sequence should be regarded as modelling a
perceptual process. Standard music notation, it is asserted, is not just a set of conventions, but
embodies musical understanding in a formalised manner. If a program could be written that rendered
performed music into the same notation that a skilled human transcriber would produce, then that
program should be regarded as manifesting musical intelligence, and would be (implicitly or
explicitly) a model of musical cognition.
Longuet-Higgins' early attempts at this modelling made use of a very simple approach to the
quantisation problem (a term that he actually never used): having set the tempo of a performance
either by means of a series of introductory beats, or by using the length of the first note (or few notes)
as a standard, the program looked for successive beats, and multiples and divisions of beats, with a
window of tolerance within which the next note had to fall. If the next note fell within that tolerance
window, the appropriate beat length was confirmed; if it preceded or followed the boundary of the
tolerance window it triggered a shorter or longer beat unit. Longuet-Higgins has never provided a
definitive statement on the value that the tolerance window should take, though in one paper
(Longuet-Higgins, 1979) he proposes that it might take an absolute value of around 100 msec.
Intuition suggests, however, that a window that was at least partially proportional to the current beat

duration might be more plausible, since the tolerances around the timing of semibreves and quavers,
for example, are unlikely to be the same. The principle that Longuet-Higgins proposed, however, is
essentially that adopted by most sequencer manufacturers: the quantisation function in most
commercial sequencers simply sets up a metrical grid, and if a note falls within a certain tolerance
range of one of the â€˜grid lines' its onset is moved forwards or backwards in time so as to
synchronise with the grid.
Though computationally simple and reasonably effective with performance data that are already quite
metronomic, there are problems with this approach. In its basic form it cannot cope in a sensible way
with tempo drift (i.e. continuous changes in the underlying tempo); and it is very sensitive to, and
easily disrupted by, local irregularities - a single misrepresentation resulting in a cascade of errors.
Although commercial products have tried to get around some of these problems with a variety of
quick fixes of one sort or another, the underlying problem is that quantisation is tackled on an
event-by-event basis. Desain and Honing have approached the problem using rather different methods
(e.g. Desain & Honing, 1989), based on the principle that it is the mutual adjustment of adjacent time
intervals in a sequence, and the relationships between time intervals at the same level as well as with
superordinate time intervals formed out of aggregations of basic-level time units which must be
considered in order to arrive at a flexible and robust solution. They use a connectionist network to
achieve this goal, in which the immediate inter-onset intervals (level 1 IOIs) of a rhythmic sequence
are held in a buffer, and over a number of iterations are steered towards integer ratios - the integer
ratios applying not only between adjacent level 1 IOIs but also between level 1 IOIs and composite
durations formed out of combinations of level 1 IOIs. This embodies the principle that in a quantised
sequence, individual IOIs should not only be in integer relationship with their neighbours, but also
with the larger beat units to which they contribute. A quantised sequence of semiquavers, for example,
exhibits 1:1 ratios at the lowest level, 1:2 ratios between each semiquaver and the quaver unit of
which it is a part, and 1:4 ratios between each individual semiquaver and the crotchet unit formed out
of a group of four.
Desain & Honing's connectionist quantiser (which exists in both â€˜static' and â€˜process' versions) is
able to quantise with impressive flexibility and robustness, producing correct representations of
precisely those kinds of continuously varying sequences which conventional quantising methods
cannot handle. One extension of the connectionist model (Desain, 1992a) demonstrates emergent
metrical behaviour, and another (Desain, 1992b) shows how simple short sequences of continuously
variable durations migrate in the quantiser towards a small number of rhythmic â€ñodes' - providing
a highly contoured â€˜rhythm space' in which non-integer rhythms slide down the gradients of the
space towards integer valleys. In this work in particular, the relationship between categorical
perception and quantisation can be seen (as the authors themselves note): the valleys of the rhythm
space are rhythmic categories, and the gradients represent the instabilities and migration paths of
within-category variation.
A fundamental question, however, is whether people are doing anything remotely like this when they
listen to music. The â€˜filtering' approach of most commercial quantisers certainly seems to be
nothing like human perception (and behaves quite differently), but even the approach adopted by
Desain and Honing may bear little resemblance to human perception despite the more attractive
principles on which it is based and the more interesting behaviour that it displays. Their approach
results in a more flexible and effective transcription process, but does it embody principles which
have psychological reality - which can explain perceptual processes? Is transcription into standard
music notation really identical with, or even analogous to, the perception of musical rhythm?
5. Perception of rhythm, Perception of events

What is the perception of rhythm? In attempting to address this question, a fundamental difficulty is
the diversity of definitions and characterisations of rhythm, as Parncutt (1994) has also noted - and in
particular the relationship between rhythm and metre. If discussion is confined for the moment to a
consideration of metrical rhythms, it is clear that very significant component of rhythm perception is
simply the perception of metre in rhythmic sequences. It is extremely unsettling for a listener not to be
able to identify the metre of a sequence that s/he believes to be metrical, and many aural training
exercises are essentially aimed at helping listeners to become quicker and more adept at identifying
the metre of a sequence. Parncutt (1994) expresses a similar view, though he focuses on the less
differentiated notion of pulse rather than metre. Indeed, after a discussion of the difficulties of
defining rhythm, his definition is: "A musical rhythm is an acoustic sequence evoking a sensation of
pulse" (p. 453). One approach to the perception of rhythm is thus to regard rhythm not as an object but
as a medium - or (to use the language of event perception) as information for events.
What are the events that are specified in rhythm? One such â€˜event' is metre itself: metrical rhythmic
sequences specify a particular metre, and this metre is the extended event that we perceive in the
rhythm. The enormous variety of individual durations that may make up a metrical rhythmic sequence
will (if the sequence can be perceived as having a stable metre) possess an invariant property that
specifies the metre. This is the standard ecological situation of â€˜the detection of invariants under
transformation', and from a different theoretical perspective has been investigated by empirical studies
of metre perception (e.g. Lee, 1991; Parncutt 1994). Another kind of event that is specified in rhythm
is a figure or group: in the same way that the rapid collection of impacts at the start of a game of pool
specify the â€˜break' (an event), so too a particular pattern of durations in a musical sequence may
specify a particular figure or group - even if we do not have a specific name for it. These figures or
groups may themselves have a strong â€˜real world' component, too - in that they may be heard to
have particular motional origins (limping rhythms, stately rhythms, rushing rhythms) whether these
are real or virtual.
How does this perspective relate to the issues of quantisation and categorical perception? In a paper
aptly entitled "Events are perceivable but time is not", Gibson (1975) points out that event perception
"asserts that when an event has been perceived there are two kinds of concurrent awareness, one of
variation and one of non-variation. That is to say the observer perceives both what is altered and what
remains unaltered in the environment." (p. 298, Gibson's emphasis). Categorical perception and
quantisation can be seen in exactly the same light: when listening to music, we are concurrently aware
both of fluctuations of tempo (which may specify expression, the hesitations of incompetence,
anxiety, or lack of control) and the rhythmic events (figures, groups and metre). Quantisation presents
this as the separation of microvariations from underlying canonical values: "Quantization is the
process by which the time intervals in the score are recovered from the durations in a performed
temporal sequence; or to put it another way, it is the process by which performed time intervals are
factorized into abstract integer durations representing the notes in the score and local tempo factors."
(Desain, 1992b, p. 240). The conventional approach to categorical perception, similarly, presents the
problem as one of â€˜separating the wheat from the chaff', of the elimination of â€ñoise' so as to
recover stable underlying values. But this is to overlook the obvious perceptual value of the
â€ñon-categorical' component of perception (as discussed above), and the different kinds of events
and tranformations that this conveys. Expression and event structure in rhythm collaborate in a highly
integrated fashion, as when expression acts to clarify (or, equally, intentionally to call into question)
structure.
6. Conclusion
In this paper I have argued that categorical perception is not a â€˜special feature' of some sensory

systems, or of some kinds of materials: it is the inevitable consequence of our sensory systems'
fundamental orientation towards event perception. The events of the environment (in all its natural
and cultural diversity) are what we are sensitive and attuned to, and it is events that we perceive. The
categories of categorical perception are no more and no less than those events, and through perceptual
learning different individuals under different circumstances will demonstrate differential sensitivity to
those events. Since metre is a fundamentally important component of (metrical) rhythm, it is not
surprising to find evidence of what has been called â€˜categorical perception' of rhythm - closely
linked to the distinctions between different metres. By analogy, categorical pitch perception (see
Burns, 1999 for an overview) is the inevitable consequence of the importance of fixed pitches (and
discrete intervals) in the Western system. Metre, however, is not the only kind of event in rhythm
(figures and gestures are some others), and the so-called non-categorical components of rhythm
(expressive features and other aspects of continuous temporal variation) convey a host of other kinds
of events and their transformations. In short, categorical perception as a concept seems to offer little in
the way of explanatory value.
References
Burns, E. M. (1999): Intervals, scales, and tuning. In D. Deutsch (Ed.) The Psychology of Music.
Second Edition. New York: Academic Press, p. 215-264.
Clarke, E. F. (1987): Categorical rhythm perception: an ecological perspective. In A. Gabrielsson
(Ed.), Action and Perception in Rhythm and Music. Stockholm: Royal Swedish Academy of Music, p.
19-34.
Clarke, E. F. (1989): The perception of expressive timing in music. Psychological Research, 51, 2-9.
Desain, P. (1992a): A (de)composable theory of rhythm perception. Music Perception, 9, 439-454.
Desain, P. (1992b): A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic
models of rhythm perception. Contemporary Music Review, 9, 239-254.
Desain, P. & Honing, H. (1989): The quantization of musical time: a connectionist approach.
Computer Music Journal, 13 (3), 56-66.
Fraisse, P. (1956): Les Structures Rythmiques. Louvain, Paris: Publications Universitaires de
Louvain.
Gibson, J. J. (1975): Events are perceivable but time is not. In J. T. Fraser and N. Lawrence (Eds.)
The Study of Time II. Berlin: Springer Verlag, p. 295-301.
Harnad, S. (1987): Categorical Perception. The Groundwork of Cognition.
Lee, C. S. (1991): The perception of metrical structure: Experimental evidence and a model. In P.
Howell, R. West and I. Cross (Eds.) Representing Musical Structure. London: Academic Press, p.
59-127.
Livingstone, K. R., Andrews, J. K. & Harnad, S. (1998): Categorical perception effects induced by
category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24,
732-753.
Longuet-Higgins, H. C. (1979): The perception of music. Proceedings of the Royal Society of
London, B 205, 307-322.
Longuet-Higgins, H. C. & Steedman, M. (1971): On interpreting Bach. In B. Meltzer & D. Michie

(Eds.) Machine Intelligence 6. Edinburgh: Edinburgh University Press.

Parncutt, R. (1994): A perceptual model of pulse salience and metrical accent in musical rhythms.
Povel, D-J. (1981): Internal representation of simple temporal patterns. Journal of Experimental
Repp, B. (1992): Probing the cognitive representation of musical time: structural constraints on the
perception of timing perturbations. Cognition, 44, 241-281.
Schulze, H.-H. (1989): Categorical perception of rhythmic patterns. Psychological Research, 51,
10-15.
Windsor, W. L. (1993): Dynamic accents and the categorical perception of metre. Psychology of
Music, 21, 127-140.
Back to index

TuePM3_1 Milankovic
Proceedings paper
Vera Milankovic
Gordana Acic
Performance in Education: a Field to Explore
Introduction
Serbian music education system embraces a wide range of children whose abilities vary in the acquisition of
musical skills. A major instrument and solfege provides a rather extensive curricula. Major instrument
instruction is in the form of private lessons, and solfege as class ensemble.
There is a difference in performing outcomes due to different vocal and instrumental abilities, teaching methods,
practice hours etc. A detailed investigation of their respective role is necessary.
Considering a busy schedule of all participants in educational process, this subtle investigation is not easy to
convey. The teacher's role could not be overestimated, since they convey key messages which are built in
education theory and philosophy implemented in technical and didactic practice.
In music education, the analysis of performance should articulate main features of the students' performance in a
meaningful context for both teachers and research workers. Teachers should evaluate the effects of their teaching
methods and scholars their conclusions on music abilities and performance.
Theory
Music communicates esthetisized emotions, through a cognitive framework, therefore creative music education
should establish and follow the emotional and aesthetic communion between teacher and student. At first music
is used as a medium to establish the communion, later, as it develops into an aethetisized emotional relationship,
music becomes its goal.
Music, played and listened to, is an interplay of emotional and aesthetic relationships within the performer's and
listener's mind. While playing, the performer balances emotional and aesthetic aspects of his performance. This
process should run parallel in the imagination of the listener, provided the reception is creative. Namely, while
listening, the listener should experience the reinterpretation of the same music.
A creative teacher should search for the ideal - model of perfection in order to engage the students "body and
soul" which in joined effort tries to reach. The Ideal - is experienced as a model of perfection only as the whole.
Only a holistic approach to the Ideal reveals its metaphoric meaning.
The teacher's role is to introduce, develop and engage the student's imagination, so that the student can use it
when inspired to reach the ideal.
The teacher articulates the body motions-feelings which follow the realisation of the ideal, enabling a long term
kinaenstheic impression of the quality of performance.
The teacher initiates the development of student's control over his motion and feeling by enabling him/her to
become his/hers own audience by feed back principle, as it works both sides.
Music is a performing art, therefore music pedagogy is a performing art par excellence.
The aim of Solfege is to
1.develop cognitive processes necessary for reading and understanding music which is profound only if it
includes emotional involvement
2.develop vocal skills enabling articulate understanding of music however uncomfortable to human voice
file:///g|/Tue/Milankov.htm (1 of 6) [18/07/2000 00:35:38]

TuePM3_1 Milankovic
3. the main tool in Solfege is solmisation. It determines the tonal interrelationships (latent harmony etc.)
Our study should be observed within the below proposed model of creative and effective music education.
ideal
as a cognitive-emotional aim (the tone quality, style connotations, technical idiom, form)
teacher
evaluation of the
interpretation
audience
teaching
procedures
solfege
performer
student
This scheme represents

1. A round process:
Ideal-induced by the teacher-in his teaching procedures- referring to both instrument and solfege instructions
-inspiring the performer-student- who evaluates his/her interpretation - in the way the audience would,
comparing it to the - ideal
2. a direct relation between teaching procedures and evaluation, teachers ratings as a tool in performance
analysis, for use in future procedures. Teachers evaluate performance comapring it to the ideal.
Study
Our study used the following tunes:
1.Flintstones (music from a well-known cartoon),
2.Arizona Dream (musical fragment from the feature film),
3. Hush By Baby on the Tree Top (an English lullaby) and
4. A folk song from southern Serbia
They were transcribed as sight reading solfege-exercises.
In order to describe and evaluate student's performance, we used teacher ratings. The ratings were from
unsatisfactory (1) to excellent (5).
Fiftheen first year secondary school of music students aged (12-16), majoring piano (13) and violin (2) were
subjected to our research. Their skills in playing, by school standards, varied from outstanding to average.

TuePM3_1 Milankovic
The students were expected to sing and play each song in succession. Their performance was tape recorded.
Before and after the performance, each student was expected to fill in a small questionnaire regarding his
approach to the preparation of the performance; comparison between his vocal and instrumental performance,
and self - estimation of the quality of their aural control:
1. successive and
2. retrospective
The recordings were taken in the exam period at the end the first term. No special instructions about the
performance or the aim of the study were given beforehand.
The audio - tapes of the performance were given to 5 secondary school teachers (one piano teacher; three solfege
teachers and one music history teacher) for estimation.
1.The teachers evaluated the following parameters of
intonation,
rhythm,
tempo
phrasing,
emotional1 and
aesthetic expression.
These were the parameters in singing and playing respectively. Pianists prevailed so that we omitted intonation
as a significant parameter in playing.
1 Emotional expression meaning perceived emotional involvement in performance
Results:
1. A significant positive correlation was found between performance ratings in singing and playing ( with
Spearman coefficient = 0.86 p = .001)
2. Significant difference (Man-Whitney U test) was found in mean ranks for singing and playing. performance
between groups with self-estimated good and poor on-going aural control in singing.
Therefore, the quality of aural control in singing turned out to be the important correlate of the success in
singing, as well as in playing.
However, the quality of on-going aural control in playing made no significant differences in mean ranks neither
in playing nor in singing performance, and no correlation was found.
In order to test students retrospectively (post factum aural control) we asked them to give their own estimates on
comparing their own performance outcomes in singing vs. playing. The students turned out to be quite aware of
their success in performance and their estimation in great majority of cases - correct.
3. There is evident difference in the overall performance ratings of Flint Stones tune as opposed to the other
three. Flint Stones was the only scored tune all participants recognised before the performance by simply looking
at the score.
4. In attempt to shed light on potential structuring of teachers ratings, we submitted all ratings to hierarchical
cluster analysis. We sought to find out optimal number of latent dimensions to cover all of them . Two
dimensions was the optimal number.

TuePM3_1 Milankovic
The mean ratings of the parameters and standard deviations
RHYTHM playing 4.52 (0.75)
PHRASING playing 4.17 (0.88)
TEMPO playing 3.97 (0.78)
PHRASING singing 3,87 (1.01)
RHYTHM singing 3.80 (0.96)
TEMPO singing 3.76 (0.98)
EMOTIONAL EXPRESSION playing 3.55 (0.91)
INTONATION singing 3.37 (1.28)
AESTHETIC EXPRESSION playing 3.25 (1.06)
EMOTIONAL EXPRESSION singing 3.20 (1.07)
AESTHETIC EXPRESSION singing 3.12 (1.15)
Reliability among teachers was fairly strong , with coefficient for parameter totals ranging from .0/80 for
emotional expression in singing to .0/96 for intonation.
As shown above, the best performed / evaluated parameters in playing were rhythm, tempo and phrasing, while
intonation and aesthetic - emotional aspects were not as good (below 3.5 average)
Intonation was most varied in performance but at the same time, most consistently judged by the teachers.
Students Rank order
No average average average self-timing on self- tuning on self-tuning

mark mark in mark in going aural going aural retrospective
overall singing playing control singing control playing aural control
singing-playing
1 4.85 4.70 5.00 + + -+
2 4.84 4.83 4.84 + + =
3 4.80 4.83 4.76 + + =
4 4.61 4.73 4.48 + + +-
5 4.33 4.27 4.40 + + -+
6 4.24 4.17 4.32 + + -+
7 4.02 3.77 4.38 - - -+
8 3.51 3.30 3.72 - + -+

TuePM3_1 Milankovic
9 3.29 2.50 3.64 - - -+
10 3.16 2.37 3.96 - + -+
11 3.06 3.03 3.08 - - -+
12 2.65 2.47 2.84 - + -+
13 2.59 2.50 2.68 + + -+
14 2.59 2.50 2.68 - + =
15 2.48 2.17 2.80 - + -+
(+) good aural control

(-) poor " "
The tunes were better played than sung, according to the teachers ratings. Only two students were much better in
singing than in playing. Others exhibited a greater or lesser difference between playing and singing, the reasons
of, at the moment we cannot properly elaborate. The tunes were technicly simple for playing. We did not test the
ordering effect, (the tunes were sung first).In order to investigate the respective role of oral control in singing ( in
vocal and instrumental) performance success, a larger sample of students is required. Also a factor analysis on a
larger sample of students would pinpoint further structuring of teachers evaluations.
Conclusions
We support
1. the introduction of the IDEAL in all forms of music education
2. two forms of instruction: private lessons and master class, because they encourage emotional and aesthetic
aspects of music communication
3. a mutual responsibility between teacher and student (in private lessons) and/or between students (at the
master class sessions), stimulating a particular motivation related to performing music
4. performance analysis and self-evaluation a useful tool for testing educational technology investigating
teacher's ratings as reception-evaluation process and respective roles of all parameters involved
We plan a research on performing techniques engaged in achieving the IDEAL common to different kinds of
musicianship.
REFERENCES
1.Sheldon,D.A.,Reese,S., &Grashel,J.(1999).The effect of Live Accompaniment,Intelligent Digital
Accompaniment, and No Accompaniment on Musicians` Performance Quality. Journal of research in Music
Education,vol.47,No.3,251-265.
2. Miroslava Lili Petrovic Skolica za klavir, 1991. Nota Knjazevac
Back to Index

TuePM3_1 Milankovic

There exists a particular attitude to music listening which implictly dominates much psychological... response to music. This is what I have charicatured (Sloboda, 1989) as the "pharmaceutical" model
Proceedings paper
Everyday experience of music: an experience-sampling study.

John A. Sloboda, Susan A. O'Neill and Antonia Ivaldi
Department of Psychology, Keele University
The main aim of this paper is to describe a means of collecting data about everyday music use that is both comprehensive and contemporaneous. It provides a fine-grained map of the musical
world of individuals without relying on their reminiscences and generalizations. We believe such a method is necessary to answer satisfactorily some important questions about music use for
which other methods are not particularly well suited.
Our primary focus is on the situations in which musically untrained listeners (who constitute the vast majority of Western populations) experience music as they go about their everyday lives.
When and where do they engage with music; what are they doing while they engage with it; who are they with; how intentional and purposive is that engagement; and what psychological
outcomes follow from that engagement? Such an approach aims to capture the richness and diversity of everyday musical experience whilst taking into account the social context in which music
listening occurs.
Methodological considerations
Much of the music psychological literature starts with a particular musical work, or fragment of that work, and investigates the psychological response to that work in a sample of listeners who
are exposed to it, usually in a situation devised and controlled by the experimenter. In the vast majority of such studies, the experimental situation closely resembles the paradigm of the Western
classical concert (Frith, 1996; Cook, 1998) where listeners sit in still and silent concentration, attending to musical materials determined and prepared by others, with external distractions
removed to a minimum. For some purposes, such as investigating maximal capacity of perceptual or cognitive mechanisms, this may be a satisfactory, even desirable, simplification. For other
purposes, connected with investigation of the social and emotional functions of music, it is clearly inadequate. This is primarily because the experimental paradigm severely curtails the choice
and autonomy of the listener. Many relevant psychological mechanisms are simply, thereby, removed from the field of potential investigation.
The dominant response within music psychology to the deficiencies of the experimental method has been to tap the generalized knowledge of listeners about their own music engagement through
retrospective verbal data. This data can be in the form of categorical responses to investigator-designed questionnaires (e.g., likert scales etc.) that are subject to multivariate statistical analysis
(e.g. Behne, 1997), or in more open-ended forms of discourse that may be subject to qualitative analysis (e.g. Sloboda 1992, 1999). Such data provide a rich source of themes and hypotheses, but
they suffer from the general deficiencies of retrospective investigations. Participants may recall events on the basis of their centrality to core personal themes and preoccupations, or because their
atypicality makes them more memorable, rather than because they represent predominant or habitual modes of listening. Generalized questionnaires can both limit enquiry to the concepts and
areas specified by the investigator, and also encourage respondents to take positions on issues which they have never really thought about before.
A less utilized but promising line of investigation involves experimental interventions within a wider range of music engagement contexts than the "concert hall simulation" of the laboratory. For
instance North and Hargreaves (in press, 1997) report a series of studies in which musical parameters have been manipulated in everyday situations such as shops and supermarkets, restaurants
and canteens, aerobics and meditation classes. This allows the identification of musical features that are deemed by participants to be more or less appropriate to the activity concerned. It also
allows for the experimenter to identify direct influences of music on behaviour, which may or may not be available for conscious report by participants.
Although there is no form of data gathering which can entirely escape the problems of investigator effects, this paper outlines a method which comes as close to direct observation of daily
musical life without intervention as is practically and ethically permissible. For obvious reasons, surveillance, via video and audio recording, is not a meaningful option, and in any case such
surveillance would have to be accompanied by verbal questioning to elicit information about inner responses.
The method chosen here is an adaptation of the Experience Sampling Method (ESM), as described by Csikszentmihalyi and Lefevre (1989), in which participants are signalled via electronic
pagers at random intervals during the day. At each paging they are required to complete a brief response form relating to current or immediately preceding experience. In this way, we are able to
examine individuals' subjective experiences during 'real' evolving musical episodes in the context of everyday life situations.
file:///g|/Tue/Sloboda.htm (1 of 21) [18/07/2000 00:35:44]

The functionality of music
We take it as axiomatic that people engage with music by and large because they want to. That means that they must have goals and purposes, manifest or latent, which music engagement fulfils.
Music has functionality in their lives. Functionality, as a concept pertaining to music use, has been insufficiently theorized within the psychological literature. The most explicit treatment of
functionality is to be found in the ethnomusicological literature.
For instance, Merriam (1964) offers a particularly concise list of ten functions of music derived from anthropological investigations of non-western cultures. These are: (1) emotional expression,
(2) aesthetic enjoyment, (3) entertainment, (4) communication, (5) symbolic representation, (6) physical response, (7) enforcing conformity to social norms, (8) validation of social institutions
and religious rituals, (9) contribution to the continuity and stability of culture, (10) contribution to the integration of society. It is clear, however, from the specific examples provided by Merriam,
that almost all of these functions arise from communal contexts in which live music is shared within a group of participants who not only listen but contribute to the music making. Thus,
functions are predominantly articulated in terms of group, rather than individual, outcomes. They also generally arise in the context of considerable long-term stability, in the constitution of the
social group, and are embedded in specific social rituals, and in the musical repertoire. Western listening is, by contrast, individualistic, relies almost entirely on recorded music, and is unstable
and rapidly changing, both in terms of context and repertoire. It is, therefore, difficult to derive specific theoretical models from work such as Merriam's.
The anthropological literature points to one reason why psychological theorizing has not developed. No proper account of function can be given in the absence of an understanding of the social
contexts in which music engagement takes place. It is not enough to look inside the work (in terms of its musical and sound structures, important as these are). It is not even enough to discover
what cultural and personal meanings that particular work carries with it. It is necessary to know what else is going on while the music is taking place. Is the participant alone or with others? If
with others, in what relation is the participant to those others? Is the participant engaged in other forms of practical or social action (e.g. working, socializing, praying, exercising, etc.)? Such
factors will be crucial to determining what kinds of function could possibly be supported by music being a part of that particular context.
Despite the underdevelopment of theory in this area, some strands of psychological work provide some pointers regarding possible functions of Western music listening for particular people and
situations. For instance, both Gabrielsson and Lindstrom (1996) and Sloboda (1992) have collected data on strong experiences to music, and have established a range of mood-altering functions
that could be generally classified as therapeutic. Behne (1997) has shown that adolescents reporting high frequency of personal problems also report higher frequency of listening to music as
some kind of escape (a classification described as "sentimental"). Music is also used to reinforce social identities, a phenomenon particularly commented on in adolescence (Zillman & Gan,
1997). However, as Zillman and Gan point out, much research in this area is based on verbal responses to questions such as "Why do you like music?", and yields little consistency across
different studies. Research of this sort is not generally embedded in situations in the way that would allow detailed linking of particular types of music and music-listening to specific life
situations and specific psychological purposes and outcomes.
Sloboda (1999) invited a panel of 500 volunteer correspondents (Sussex Mass Observation, see Sheridan, in press) to write in an open-ended fashion to the question "Please could you tell us all
about you and music?". All other questions were optional cues to focus correspondents on areas they might like to talk about. These included: "Do you use music in different ways? My mother
used to play fast Greek music to get her going with the housework - do you have habits like this? Are they linked to particular times, places, activities or moods? For instance, you might use
music in different ways at home, outdoors, or at work; in company or on your own; while you exercise, cook, study, make love, travel, or sleep; to cheer you up or calm you down".
Explicit functions referred to by correspondents included mood enhancement, mood change, and spiritual or "transcendent" functions. The use of music as a cue to reminiscence (nostalgia) is the
single most frequent use reported. The activities which music accompanied were predominantly domestic or solitary, and included doing housework, studying, driving, resting. Indeed some
respondents made a point of qualifying their use of music (e.g. "the car is the only place where I can listen to it loud enough without annoying other people"). Several respondents made explicit
links between activities/contexts and their psychological functions (e.g. on arrival home from work "music lifts the stress of work: it has an immediate healing effect"). In other cases, a clear and
almost unbridgeable gap was implied between the private world where music can be part of a deeply personal "emotional repair process" and the public world where appearances must be kept up
(e.g. "When I'm down I listen to this and go down as far as I can, then I cry, I cry deep from inside. I wallow in self-pity and purge all the gloom from my body. Then I dry my eyes, and wash my
face, do my hair, put on fresh makeup, and rejoin the world").
A second prompt given to respondents was "Do you enjoy music in pubs? Restaurants and cafes? Supermarkets? Shops? Streets? Do you ever dislike music in public places?". Responses to these
questions made it clear that autonomy and control was a dominant issue for respondents, and that the acceptability of such music was a function of the degree to which it allowed the better
fulfillment of goals and purposes (e.g. preserving confidentiality of conversations in a restaurant, or the enhancement of a desired "atmosphere").
Finally, De Nora (1999) in an interview and participant observation study involving 52 women, focussed on their uses of music in four settings where music is prominent: aerobic exercise
sessions, high street shops, karaoke evenings, and music therapy sessions. Based on the results, De Nora concluded that music is an organizing material in social life and a resource to which
people turn as they create and shape a range of social matters, including their own subjective states. For example, interview respondents described in detail how they use music to alter and/or
reinforce their on-going moods and energy levels and to align themselves with 'appropriate' subjective states associated with carrying out a range of tasks, such as housework, meetings and social
events. The women exhibited considerable awareness of the music they 'needed' to hear in different situations and at different times, often working as 'disc jockeys' to themselves. This study,

which is perhaps closest in its aims and approach to that of the present investigation, may provide converging evidence of functionality in relation to everyday music listening which will
contribute to theoretical development in this area.
Research Questions
Our first question relates to the viability of the Experience Sampling Method (ESM) as a means of generating fine-grained data on everyday musical experience. To what extent are participants
willing and able to undertake comprehensive "self-audit" over a period long enough to sample the predominant modes of music exposure? Compliance rates, response delays, and the proportion
of missing data are relevant to this question.
Our second question relates to the range of situations in which music engagement takes place. Does ESM provide a picture which is consistent with retrospective data, or are there categories
which are over- or under-represented in other forms of data-collection? Does music occupy particular functional niches (i.e. is it more likely to accompany some activities than others)? To
answer this question, data about concurrent non-musical activities and purposes must be analyzed.
The third question relates to choice and autonomy. If, as is suggested above, this is a potent variable, then the degree of choice over the music being heard should affect psychological outcomes.
The study, therefore, assesses both the degree of choice exercised over the music, and a range of outcome measures, including mood-change, and estimates of the degree to which the music
contributes to valued outcomes. It is predicted that there will be a positive relationship between level of autonomy and valued outcomes.
Method
Participants
Eight participants, consisting of staff and students, were approached randomly through various departments within the University of Keele. Four of the participants were between the
ages of 16-21 and four between the ages of 31-40. There were two males and two females in each age group. All of the participants were considered to be non-musicians, i.e. they did not play a
musical instrument.
The rationales for these choices were as follows. Available software allowed for a maximum of 12 simultaneous participants. Two of the authors participated in the entire method as a source of
research information and support (although their data were not included in the study), and two spare places were kept free for potential equipment failures. Participants were chosen from the
same institution as the authors to enhance support and contact in the case of queries or problems (in the event there were none of any significance). Age and gender were considered, on the basis
of prior literature, to be key variables affecting music choice and music-related lifestyle. Non-musicians represent the vast majority of the population in Western societies.
Procedure
The Experience Sampling Method (ESM). This method allows individuals to record 'on the spot' thoughts and feelings in real life everyday situations, through self-report forms with open ended
and scaled items (Csikszentmihalyi & Lefevre, 1989). Participants were asked to carry electronic pagers for seven consecutive days. The pager was connected via a telephone link to a computer,
sending one message at random within each two-hour period between the hours of 08:00 and 22:00, on each day of the study. Thus in total, each participant was paged seven times a day (49
times within the one-week period).
One week was considered to be the minimum period in which an individual was likely to encounter most potential everyday situations in which music might play a role. It was considered vital
for the test period to span both the working week and a weekend. An average sampling interval of two hours was considered to reflect the most appropriate balance between sampling
comprehensivity and life disruption, given that each pager episode required about 2-3 minutes of participant disengagement from the activity at hand.
The response form. Once signalled, participants were instructed to fill out a response form comprising of two sections. The entire form is reproduced as Appendix A. Participants were provided
with a booklet of the forms which they were asked to carry with them during the course of the study.
Section A of the form determined whether participants were listening to music when paged, and, if not, whether they had heard any music since last being paged. It then determined (with respect
to the time of the music, or the time beeped if no music was heard) where participants were, what they were doing and the main reasons for the activity they were engaged in at the time. If no
music had been heard since the last paging, the participant did not carry on into Section B.
Section A allowed a determination of the distribution of music listening over the range of everyday situations. All respondents were also asked to estimate both the time paged and the time that
they filled in the form, so that estimates of response delay could be computed. Generally short delay times would indicate higher reliability of the data collected.
Section B contained a number of music-related items. It asked participants to identify who they were with when listening to music (e.g. alone, with a friend, with family, etc), the mode of music

production (e.g. radio, TV, busker, etc.), and the style of music they were listening to (e.g. pop/chart, folk, classical, etc.).
In addition, participants were required to estimate mood states both before and after listening to the music using 11 bipolar scales. The research team generated these scales in order to represent
dominant themes emerging from the Sloboda (1999) study and other literature reviewed in the introduction. They were: alert/drowsy, happy/sad, irritable/generous, secure/insecure,
nostalgic/in-the-present, lonely/connected, involved/detached, interested/bored, distressed/comforted, tense/relaxed, energetic/tired. Each mood state was rated on a 7- point scale, where each
pole could be assigned the descriptor 'very', 'quite' or 'somewhat' as well as a neutral 'neither' at the midpoint.
Participants also rated a set of statements relating to the impact of the music (e.g. Did the music enhance the activity/moment?), and were asked to specify factors relating to reasons for choice of
music (if any), and other unique factors relating to that particular episode (if any).
Interviews. After carrying the pager for the week, each participant was then interviewed individually for approximately half and hour about their experiences of carrying the pager. The interviews
were semi-structured and the participants were encouraged to talk about their experience of participating in the study and what they had learnt or discovered about themselves or music in general.
They were also asked if the week of the study was a 'normal' week for them, more specifically, in terms of the amount of music they were exposed to. For each participant, five response sheets
were then randomly chosen from the response booklet and she/he was asked to elaborate on the responses given on the form. At the end of the interview, each participant was asked to give
feedback regarding the study. They were asked to address practical issues of wearing the pager, to identify any experiences that were not addressed in the response forms, and to state how their
own musical preferences may differ from the music they are exposed to in their environments/everyday life. The protocol used for the interviews is provided in Appendix B.
Results
Viability of method and compliance
There were 356 episodes included in the analysis. An episode is a time paged when the participant responded. Since there were 392 signals sent to pagers (eight participants each paged 49 times),
this means that 36 (9%) of signals were unaccounted for. Given that some of these may have been due to a mechanical failure of the pager, or to the participant not hearing the pager, 9%
represents the upper estimate of deliberate non-compliance. Thus, the method elicits very high levels of compliance over an extended period. Interview data suggests that
participants were strongly motivated by the study, for example:
Interviewee 7: I actually found it quite interesting really because I'd sit there and think oh I am listening to music and wow it is affecting the way I'm like thinking and feeling at the time so
I was quite impressed with it to tell you the truth, cos I hadn't realized what big affect that it did have at certain times.
Interviewee 1: I think on a positive side it has provided an opportunity to think more about my moods that why I choose music which is something that I was doing on a much more
subconscious level before, so I think that's been very useful.
180 (51%) responses were made immediately after the pager sounded. Of the remaining 176 responses, 101 (57%) were within 30 minutes of paging, and 128 (73%) were within 1 hour. Longer
delays were often caused because the last paging of the day was recorded the following morning. There were strong individual differences between participants, however. The most efficient
participant responded immediately on 79% of pagings. The least efficient participant responded immediately on only 12% of pagings. These differences appeared to be related to neither age nor
gender, and so further interpretation of delay effects is not possible within this pilot sample, given that they are completely confounded with unquantifiable individual differences.
Out of 356 episodes, music was being heard at the time of paging on 80 (23%) occasions. Out of the remaining 266 episodes, music had been heard since the last paging on 75 occasions. This
brings the total number of episodes with music to 155 (44% of episodes). To put it another way, between 08:00 and 22:00 there was a 44% chance that music would be heard in any 2-hour
period.
Functional distribution of music episodes
Our second question relates to the range of situations in which music engagement takes place. Does ESM provide a picture which is consistent with retrospective data, or are there categories
which are over- or under-represented in other forms of data collection? Is music associated with particular functional niches, i.e. is it more likely to accompany some activities more than others?
To answer this question, data about concurrent non-musical activities and purposes must be analyzed.
Participants were able to respond freely to the question "Where were you?". Responses were classified post-hoc into eight main categories. Table 1 shows the distribution of episodes over place,
in total, and for music and non-music episodes separately. Exactly half of all episodes took place at home, with workplace being the next most frequent location (21%). The third most frequent
location was transportation (10%) with most of this being accounted for by the car. There was a significant difference in frequency of music episode as a function of place. A one way ANOVA
(F (7,346) = 27.61, p < 0.0001) with post-hoc Tukey tests showed that transportation, shops, and entertainment contained a significantly higher proportion of music episodes than all other places,
and were not significantly different from each other (87% of episodes were music episodes). The home was intermediate (46% of episodes were music episodes), showing significantly less

frequency of music episodes than the first three, but significantly more than the remaining places. In this analysis, as elsewhere, the episode was treated as a case. However, as a check on
potential generality, the number of participants showing the same pattern of results as the entire sample was computed for each main result. In this case, 6 of 8 participants showed the ordering of
the entire sample.
Table 1
Place of episode as a function of music being present
Home Transport Work Shops Entertain-ment Gym Outdoors Other

venue
Music 81 32 0 19 17 3 2 2
episodes
Non-music 96 3 73 5 1 6 13 1
episodes
Total 77 35 73 24 18 9 15 3
episodes
Participants were also able to freely respond to the question "What was the MAIN thing you were doing?". Reported activities were coded post-hoc according to the classification shown in Table
2. There were three main categories, personal, leisure and work. Personal activities cover those everyday activities which are a necessary consequence of living, and are further divided into states
of being, maintenance activities and travel. Leisure activities were divided into three sub-categories, including music-related activities in virtue of the focus of this study. Work activities were
categorized according to whether they were primarily solitary (self) or primarily group-based (other). Table 3 shows the distribution of episodes over activity category.
Table 2
Categorisation of activities
Category Core Exemplars
Time fillers doing nothing, waiting
Personal - being states of being (e.g. sleeping, waking up, being ill, suffering from hangover)
Personal - maintenance washing, getting dressed, cooking, eating at home, housework, shopping
Personal -travelling leaving home, driving, walking, going home
Leisure - music listening to music (n.b. no examples of performing music were found)
Leisure - passive watching TV/film, putting on radio, relaxing, reading for pleasure
Leisure - active games, sport, socializing, eating out, chatting with friends
Work - self writing, computing, marking/assessing, reading for study

Work - other planning for meeting, in lecture/seminar, making appointment, in meeting
Listening to music as a main activity accounted for a small minority of all episodes (2%). Episodes were divided roughly equally between personal, leisure, and work. There was a significant
difference in frequency of musical episode as a function of activity. A one-way ANOVA (F (8,347) = 18.58, p < 0.0001) with Tukey tests showed that personal-maintenance, personal-travel, and
leisure-active were not significantly different from each other, but showed a significantly greater proportion of music episodes than all other categories, which were not significantly different
from each other (7 out of 8 participants showed this ordering). It is particularly notable that few episodes of "music while working" were reported in this predominantly academic group of
participants.
Table 3
Activity of episode as a function of music being present
Time Personal Leisure Work

fillers
Being Maint-enance Travel Music Passive Active Self Other
Music 1 4 46 35 6 20 28 16 0
episodes
Non-music 7 23 25 10 0 40 9 68 18
episodes
Total 8 27 71 45 6 60 37 84 8
Episodes
Reasons for the activity were roughly equally divided between "I had to do it", and "I wanted to do it", with the other two categories (it was a basic part of my routine, and I had nothing else to
do) accounting together for less than 15% of the total episodes. A one-way ANOVA (F (3, 290) = 8.47, p < 0.0001) with Tukey tests showed a significant effect of reason on distribution of music
episodes. Only one third of "I had to do it" episodes involved music, whereas two-thirds of the other types of episode involved music (effect found in 5 out of 8 participants).
Outcomes of music listening and autonomy
The third question relates to choice and autonomy. If, as is suggested above, this is a potent variable, then the degree of choice over the music being heard should affect psychological outcomes.
The study, therefore, assesses both the degree of choice exercised over the music, and a range of outcome measures, including mood-change, and estimates of the degree to which the music
contributed to valued outcomes. It is predicted that there will be a positive relationship between level of autonomy and valued outcomes.
For each music episode participants rated their mood on ten 7-point bipolar scales both before and after the music. The ten "before music" mood scores were subject to principal component factor
analysis with varimax rotation. A three-factor solution was obtained. Factor 1 accounted for 36% of the variance. Factor 2 for 14% and Factor 3 for 12%. Factor 1 was designated POSITIVITY,
loading highly on distressed - comforted (0.62), happy - sad (-.58), irritable - generous (0.64), secure - insecure (-.59) and tense - relaxed (0.86). Factor 2 was designated PRESENT
MINDEDNESS, loading highly on interested - bored (0.69), involved - detached (0.76), lonely - connected (-.64), and nostalgic - in-the-present (-.75). Factor 3 was designated AROUSAL,
loading highly on alert - drowsy (0.82), and energetic - tired (0.82).
Mood change effects were assessed by a series of analyses examining the difference between mood before and after music exposure. The distribution of mood change was examined by
subtracting mood factor value before music from that after it. In each case the mean change over all episodes was positive. Music tended to increase arousal, present-mindedness and positivity.
However, there were instances of no change and negative change.

Table 4
Distribution of mood-change direction
Number of episodes showing
Positive change No change Negative Change
Arousal 63 82 5
Present-mindedness 90 9 48
Positivity 71 71 11
High instances of no-change were found on the arousal and positivity factors (see Table 4). Present-mindedness tended to be more changeable in response to music. There were only nine
instances of "no change" in the whole set. However, correlations between factor change scores showed that all three factors tended to change together and in the same direction (p<0.001 in all
cases).
We were interested to see whether there were any cases in which mood change factors were dissociated from one another (i.e. cases where one mood increased simultaneously with another mood
decreasing). Inspection of cases showed that there were 25 such episodes in the entire sample (16% of all music episodes). These cases are particularly important in beginning to identify distinct
functional niches for music engagement. The largest group of such episodes (10) involved increases in positivity along with decreases in present-mindedness.
One example of this comes from a male participant reporting being at home relaxing with a group of friends and acquaintances. The activity was being done out of choice. There was ambient
music playing on a CD, although the participant had not chosen it. The participant commented that "the music was very tranquil and relaxing", that others present were "discussing work
boringly" and that he was "very, very tired". This episode was also associated with a decrease in arousal during the music. It would be reasonable to assume that the participant was using the
music as a means of relaxing and disengaging from the surrounding conversation.
A second example from the same category is provided by a female participant who reported being at home, tidying a bedroom as part of the normal basic routine. The participant had chosen to
listen to a piece of pop/chart music on a tape. The participant commented that the music was chosen to "enhance the wonderful experience of cleaning" and was "very lively". This episode was
associated with an increase in arousal during the music. It seems as if the purpose of this music was to allow the participant to focus her attention on the music, and away from the uninteresting
domestic chore, and this focused attention was used to increase energy levels.
It was less easy to find examples of episodes where positivity decreased, but one clear episode involved a female participant at home, alone, doing the washing up as part of the basic routine. She
had chosen to listen to rock music on the radio and commented that the track was "a favourite song I had not heard for some time... It brought back certain memories". The music increased this
participant's nostalgia, sadness, and loneliness, at the same time as making her more alert. It is pretty clear that this episode reminded the participant of a significant past event that brought on
nostalgia. At the same time it appears as if she had chosen the music to engage and arouse during an uninteresting routine task.
In order to investigate systematically the influence of autonomy or choice on response to music, participants were asked to rate the music in each episode on an 11-point scale according to the
degree of personal choice exercised in hearing the music (from 0 = none at all, to 10 = completely own choice). The distribution of scores over the scale was not uniform, with responses
clustering on 0 and 10. In order to increase cell size, this variable was recoded as a three-point scale where 0-3 = low choice, 4-7 = medium choice, and 8-10 = high choice.
One-way analyses of variance were carried out for each mood factor, where the dependent variable was the amount of mood change from before to after the music, and the independent variable
was degree of choice. There was a significant effect of degree of choice on each mood factor. For each mood factor this showed a similar effect - the greater the choice the greater the mood
change. Table 5 shows the mean change scores for each cell, and F-ratios associated with each analysis. For positivity and arousal these effects did not differentiate between participants, each of
whom showed the same effect. For present-mindedness there were significant individual differences (interaction effect F (9,13) = 1.89, p <0.05), with only 6 out of 8 participants showing the
main effect of choice on mood change).
Table 5
Mood change as a function of degree of choice over music experienced
Mean change in...
Positivity Present- Arousal

Mindedness
Degree of choice (n)
Low (63) 0.8 -0.6 -0.8
Medium (30) 0.3 -3,0 -1.2
High (60) 2.3 -3.2 -1.8
F (p value) 5.3 (0.006) 6.7 (0.001) 3.8 (0.02)
One-way analyses of choice as a function of place, activity, reason for activity, and companion, showed that some situations are associated with significantly more choice over the music than
others. High choice situations occur when the person is alone, travelling or working, at home or in a vehicle, undertaking activities for duty. Low choice situations occur when with others, during
active leisure or personal maintenance activities, in shops, gyms, and entertainment venues, and when doing activities because one wants to.
Discussion
Our findings indicate that the Experience Sampling Method is a robust method for examining individuals' subjective experience during 'real' evolving musical episodes in everyday life.
Participants provided responses on over 90% of calls, and we found that 44% of all responses involved music episodes. While generalizations to the population at large are not possible with such
a small sample, it is already clear that music listening is not randomly distributed over contexts. Even on this small sample there were highly significant effects on almost every variable
measured. For instance, music occurred very frequently while participants were travelling, or in public places (such as shops), moderately frequently at home, and less frequently in other
locations. It tended to accompany active leisure (e.g. going out with friends) and maintenance activity (e.g. housework, washing, shopping) more than deskwork or passive leisure (e.g. reading,
watching TV), and tended to accompany activities undertaken by choice rather than for duty. Although most of these findings are not surprising, the link between choice and activities undertaken
for duties is not intuitively obvious. It may be that choosing music to accompany duties is a way of bringing some autonomy and personalization back to them. DeNora (1999) suggests that the
music associated with duties is used as a catalyst to shift individuals out of their reluctance to adopt what they perceive as 'necessary' modes of agency, and into modes of agency 'demanded by
particular circumstances'.
Our findings are consistent with those reported by respondents in the Sussex Mass Observation survey (Sloboda, 1999) where the most frequently mentioned activities involving music were
housework and travel. Although functions mentioned by participants in the survey were varied, they had a predominately affective character, with many participants (particularly women)
explicitly mentioning music as a mood-changer or enhancer. The most frequently mentioned function was essentially nostalgic. As in the Mass Observation survey, almost no episodes in our
present study had music as the primary focus. Participants were doing something else with music as accompaniment to that activity. Similar findings were reported by DeNora who also found
that primarily among her respondents over the age of 70 and those who were trained musicians did it tend to be considered antithetical to conceive of music as 'background' to anything. This has
major theoretical implications for how we conceptualize music use. The focussed attentive and "respectful" listening of the "music lover" figures hardly at all in our present sample of
non-musicians. Furthermore, the study has allowed significant progress to be made on typologies for non-musical activities and locations based on the open-ended questions. The prospects for
using the ESM approach to develop further typologies for specific functional niches are promising given our results.
Mood was measured by 10 bipolar scales, both before and after music. Factor analysis yielded a clear and familiar grouping based on valence (positive-negative), arousal (alert-drowsy), and
attention (loading on interested-bored, involved-detached, connected-lonely, and nostalgic - in-the-present). In the great majority of cases there was mood change as a result of music, and such
change was generally in the direction of greater positivity, arousal, and attention. This is, we believe, the first demonstration of robust mood effects of music outside the laboratory.
Notwithstanding the predominance of generalized mood "improvement", not all episodes showed the same pattern. The major result of this part of the study is a strong moderating effect of
choice. Reported mood change is significantly greater for episodes where participants exercise high choice over the music they hear than for episodes where there is little or no choice. The largest
effects are for self-chosen music. About 20% of episodes involved more complex patterns of mood change (such as increases in positivity together with decreases in arousal). There are clear
suggestions that these different patterns arise from the specific niches in which they occur (location, activity, and purpose of music). For instance, some music is chosen to make one feel better
about, and distract one from the boredom of, some routine task, such as cleaning. In this case one would expect increases in positivity with decreases in attention to task. This is exactly what we
found in some episodes.
De Nora (1999) also found examples of self-regulation involving a number of musical strategies described by participants as 'revving up' or 'calming down', 'getting in the mood' (e.g., for a
particular social event), 'getting out of a mood' (e.g., to improve a 'bad' mood or 'de-stress'), 'venting' strong emotions. For the most part, as in the present study, these were predominantly
described at the 'personal' or intrapersonal level as a means of creating, enhancing, sustaining and changing subjective, cognitive, bodily, and self-conceptual states. According to De Nora, the
women in her study 'drew upon elaborate repertories of musical programming practice, and a sharp awareness of how to mobilize music to arrive at, enhance and alter aspects of themselves and
their self-concepts. This practical knowledge should be viewed as part of the (often tacit) practices in and through which respondents produced themselves as coherent social and socially
disciplined beings' (p.35).
Music, when viewed as a cultural resource, provides numerous ways in which musical materials and practices can be used as a means for self-interpretation, self-presentation, and for the
expression of emotional states associated with the self. According to De Nora, a sense of self is locatable in music, in that 'musical materials provide terms and templates for elaborating
self-identity (p.50). Although viewed as essentially 'private' experiences involving a great deal of autonomy or agency, everyday musical experiences are deeply embedded in a social context
which exerts a powerful influence (albeit often implicitly) on our music listening. Thus, our engagement with music is enmeshed in a social and cultural world where we can 'forget' or become
unaware of the grounds on which our feelings and behaviours are based. This 'forgetting' is the product of years of training, socialization, and the institutionalisation of music. Not only have our
musical practices become routine and invisible, but as musicians and psychologists we are limited in our ability to describe musical materials in a way that is free of the assumptions and biases
associated with our own experiences and training. Further research is needed which seeks to identify and further our understanding of the role of music listening as part of the rules and
conventions of a particular social context and the unfolding episodes in which they occur. The ESM provides a useful approach for capturing the complexity of everyday evolving musical
situations in a way that makes it possible to retrieve some of these 'forgotten' or 'hidden' practices, thus furthering our understanding of the meanings associated with our evaluative judgements of
the functionality of music in everyday experience.
Correspondence concerning this article should be addressed to John A. Sloboda, Department of Psychology, Keele University, Keele, Staffordshire, ST5 5BG, UK. Email address:
j.a.sloboda@psy.keele.ac.uk
References
Behne, K. E. (1997). The development of "Musikerleben " in adolescence: how and why young people listen to music. In I. Deliege and J. A. Sloboda (Eds.), Perception and Cognition of Music
(pp. 143-160). Hove: Psychology Press.
Cook, N. (1998). Music: A very short introduction. Oxford: Oxford University Press.
Csikszentmihalyi, M. and Lefevre, J. (1989). Optimal experience in work and leisure. Journal of Personality and Social Psychology, 56, 815-822.
DeNora, T. (1999). Music as a technology of the self. Poetics, 27, 31-56.
Frith, S. (1996). Performing Rites: On the Value of Popular Music. Oxford: Oxford University Press.
Gabrielsson, A. and Lindstrom, S. (1996). Can strong experiences of music have therapeutic implications? In R. Steinberg (Ed.), Music and the Mind Machine: the Psychophysiology and
Psychopathology of the Sense of Music (pp. 195-202). Berlin: Springer.
Merriam, A. (1964). The Anthropology of Music. Chicago: Northwestern University Press.
North, A. C. and Hargreaves, D. J. (1997). Experimental aesthetics and everyday music listening. In D. J. Hargreaves, and A. C. North, (Eds.), The Social Psychology of Music (pp. 84-103).
North, A. C. and Hargreaves, D. J. (in press). Responses to music in aerobic exercise and yogic relaxation classes. British Journal of Psychology.
Sheridan, D. (in press). Mass-Observation revived: the Thatcher years and after. In Sheridan, D., Street, B., and Bloome, D. (Eds.), Writing ourselves: Mass-Observation and literacy practices.
Cresskill, NJ: Hampton Press.

Sloboda, J. A. (1992). Empirical studies of emotional response to music. In M. Riess-Jones and S. Holleran (Eds.), Cognitive bases of musical communication (pp.33-46). Washington DC:
American Psychological Association.
Sloboda, J. A. (1999). Everyday uses of music listening: a preliminary study. In Suk Won Yi (Ed.), Music, Mind, and Science (pp.354-369). Seoul: Western Music Institute.
Zillman, D. and Gan, S. (1997). Musical taste in adolescence. In D. J. Hargreaves and A. C. North, (Eds.), The Social Psychology of Music (pp.161-187). Oxford: Oxford University Press.
Appendix A
Experience Sampling Method Response Sheet
Date: _____________ Time Beeped: ____________ am/pm Time Filled Out: ___________am/pm
As you were beeped, were you hearing any music? __ YES __ NO

If YES, please fill out this form with respect to the time you were beeped
If NO, have you heard any music since you were last beeped? __ YES __ NO ;
If YES, please fill out this form with respect to the time you last heard music
If NO, please just fill in Section A with respect to the time you were beeped
SECTION A
Where were you? ________________________________________________________________________
What was the MAIN thing you were doing? ___________________________________________________
What were the principal reasons why you were doing this particular activity? (tick as many as apply)
__ I had to do it (i.e. it was part of my work, duty, responsibility)
__ I wanted to do it (i.e. it gave me pleasure, satisfaction, fulfilment)
__ It was a basic part of my routine (e.g. something that is so habitual that I don't really think about my
motivations for doing it)

__ I had nothing else to do (i.e. I was filling in time)
__ Other (please state) ___________________________________________________________________
SECTION B
Who were you with when you were hearing music? (tick as many as are applicable)
__ Alone __ Partner __ person/people you live with __ Family member(s) __ Friend(s) __ Strangers
__ Acquaintance(s) __ person/people you work with __ professional(s) (e.g. dentist)
If not alone, how many people were you with? ____
If with one other person, was that person male ___ or female ___ ?
Here are some ways in which moods or states can be described, paired as opposites.
For each pair, tick the category that most closely describes the way you felt immediately
before the music started. If the way you felt changed while you were hearing music, write
a C in the space that most closely describes where you changed to. If the
way you felt did not change during the music put two ticks in the same place.



Very Quite Somewhat Neither Somewhat Quite Very
Alert ___ ___ ___ ___ ___ ___ ___
Happy ___ ___ ___ ___ ___ ___ ___ Drowsy
Irritable ___ ___ ___ ___ ___ ___ ___ Sad
Secure ___ ___ ___ ___ ___ ___ ___ Generous
Nostalgic ___ ___ ___ ___ ___ ___ ___ Insecure
Lonely ___ ___ ___ ___ ___ ___ ___ In the

present
Involved ___ ___ ___ ___ ___ ___ ___ Connected
Interested ___ ___ ___ ___ ___ ___ ___ Detached
Distressed ___ ___ ___ ___ ___ ___ ___ Bored
Tense ___ ___ ___ ___ ___ ___ ___ Comforted
Energetic ___ ___ ___ ___ ___ ___ ___ Relaxed
Tired

Are there any important ways you felt during the music which are not covered by the above list?
If so, please state them. ________________________________________________________________________________
What was the mode of music production?
__ Radio __ TV __ Walkman __ PA System (public place)
__ LP (Record) __ Concert __ Tape __ CD
__ Busker(s) __ Other ______________________
Which style best describes the music you were hearing?
__ Pop/Chart __ Classical __ Easy Listening __ Jazz/Blues
__ Folk __ Religious/Sacred __ Opera __ Dance/Soul
__ Rock/Heavy Metal __ Other_______________________
How much personal choice did you have in hearing the music?
Personal None at all Completely own choice

choice: 0 1 2 3 4 5 6 7 8 9 10
If you chose to listen to music, what was your MAIN reason? Please state in your own words.

Not at all Very Much


How aware of the music were you? 0 1 2 3 4 5 6 7 8 9 10
How much were you concentrating on the music? 0 1 2 3 4 5 6 7 8 9 10
Was it hard to concentrate on the music? 0 1 2 3 4 5 6 7 8 9 10
Was the music important to the activity/moment? 0 1 2 3 4 5 6 7 8 9 10
Did the music enhance the activity/moment? 0 1 2 3 4 5 6 7 8 9 10
Did the music have personal associations for you? 0 1 2 3 4 5 6 7 8 9 10
Do you wish you had been hearing different music? 0 1 2 3 4 5 6 7 8 9 10
Would you have rather not been hearing music at all? 0 1 2 3 4 5 6 7 8 9 10
Was there anything in the music that you found particularly important or noticeable?

_____________________________________________________________________________________
_________________________
Since you were last beeped has anything happened or have you done anything that could have
affected the way you feel? ____________
__________________________________________________________________________________________________________________________________
___________________________
Please write below any additional information or comments about what was happening and/or how
you were feeling when you were hearing music? ___________________________________
____________________________________________________________________________________________
____________________________________________________________________________________________
_____________
Appendix B
Interview protocol
Part A: Semi-structured questions
1. Tell me about the week. What was the experience like?
Prompts:
Was this a typical week for you? If not why?
Were you exposed to more/less music than normal?
Were you more consciously aware of music than usual?
2. Looking back on the week, what stands out as most significant to you and your experience of the study?
Have you learnt/discovered anything new about yourself or music in general?

Prompts:
Aware of?
Negative/positive feelings?
Functions music has in your life?
3. I am going to pick five of the sheets at random and would like you to talk about them in more
detail; in particular: (use diary as guide)
Prompts:
What were you thinking about in particular?
How important was the music to you at these times?
How important was being alone/being with others at these times?
How important was the music to the activity/moment at these times?
4. Can you imagine your life without music? If not, why not?
Prompts:
What would be missing?
How would you deal with the loss?
Part B: Feedback questions

How easy was it to do the pager questionnaire. Were there questions which weren't clear, or weren't easy to fill in? Were their aspects of the musical experience which just weren't tapped?
Did you consciously change your music habits, or the way you think about music, as a direct result of having the pager?
Were there major categories of music listening in your current life that just happened not to be sampled by the pager? If so, what?
When did you last acquire a piece of recorded music (through purchase, off-air or off-friend taping?) What was it and why did you acquire it?
How many of the listening episodes in the pager study were pieces of recorded music owned by you? Would you say there was a general difference between your use of music you have
deliberately acquired, and other types of music? If so, what?

Has doing the study made you more aware of music in your environment/everyday life?
Back to index

MUSICIANS PERCEIVE SPEECH DIFFERENTLY
MUSICIANS PERCEIVE SPEECH DIFFERENTLY
Professor Denis Burnham

d.burnham@uws.edu.au
Background:
Non-tonal language speakers do not perceive the difference between tones as

well as do speakers of a tonal language. In earlier experiments we have tested
the hypothesis that this is due to listeners automatically engaging a native
language mode of processing whenever they hear speech. We found that non-tonal
language speakers are better at discriminating tones presented as music (violin
sounds), or filtered speech (hums), than as speech, and better at music than
hums. Do musicians show the same pattern of results?
Aims:
To investigate, through the use of a tone discrimination task, whether

musicians (an experienced musician group, and a perfect pitch group) process
speech and music in the same manner as do non-musicians.
method:
Three groups, non-musicians, musicians, and perfect pitch musicians(n=24,

N=72), were tested on the five tones of Central Thai. All possible pairs of
tonal contrasts were presented in a computer-controlled task as speech (on the
syllable [ba]), as filtered speech (hums), or as violin sounds. Participants
were required to make 'same' / 'different' responses in a forced-choice speeded
reaction-time task.
Results:
Non-musicians discriminated the tones better in the music than the hum context
and in turn better in hums than speech. The musicians were better on all three
tasks compared with the non-musicians; and moreover were better in the music
than the speech or hum context. Perfect pitch musicians were equally good in
all three contexts, and better overall than the musicians and non-musicians.
Conclusions:
Speech perception and music perception are not independent. Musical training
and perfect pitch ability "leak" through to speech perception ability. The
specific course of speech perception development is not only influenced by the
surrounding language, but also by other events, eg, musical training and
ability.
Back to index
file:///g|/Tue/Burnham.htm [18/07/2000 00:35:46]

From: Professor Elizabeth Tolbert
MUSIC AND MEANING: AN EVOLUTIONARY STORY

Professor Elizabeth Tolbert
tolbert@peabody.jhu.edu
Background:
The literature on the evolution of music is quite sparse, and the
topic is often mentioned only in passing as part of larger proposals
concerning the origin of language and the emergence modern human
cognitive abilities. Although music, no less than language, is a
uniquely human behavior, most evolutionary scenarios either do not
mention music at all, or make ethnocentric assumptions concerning
the nature of music and its relationship to language, assumptions
that are at odds with findings in ethnomusicology concerning the
social embeddedness and mutual interdependence of music and language
across a wide range of socio-cultural contexts.
Aims:
This paper attempts to articulate an evolutionarily plausible and
socially grounded theory of musical meaning in light of recent
proposals concerning the origins of human cognitive abilities.
Main contributions:
Expanding upon Donald's (1991) suggestion that the capacity for
representation evolved prior to the development of language, this
paper proposes that music is grounded in a capacity for "mimesis",
or motor modeling, and has a social ontology rooted in gesture and
preverbal spatio-temporal concepts. Although both music and language
evolved from a mimetic capacity, musical meaning retains a distinct
link to vocal mimesis through sonic representations of bodily
movement and emotional states.
Implications
This work challenges both structural and cultural accounts of
musical meaning by suggesting that music's power is not derived
solely from syntactical or semantic referents, arousals and
expectancies, or from its indexical relationships to a particular
cultural context, but rather through its immediacy as a performance
of socio-emotional essence and embodied gesture.
file:///g|/Tue/Tolbert.htm (1 of 2) [18/07/2000 00:35:46]

From: Professor Elizabeth Tolbert
Back to index
file:///g|/Tue/Tolbert.htm (2 of 2) [18/07/2000 00:35:46]

TuePM1_2 Rodriguez
Proceedings paper
TRANSACTIONAL COGNITION IN MUSIC: RELATIONSHIPS

BETWEEN AURAL AND TACTILE/HAPTIC EXPERIENCES
Carlos Xavier Rodriguez
College of Education
University of Iowa
Gunter Kreutz
Institut für Musikpädagogik
Johann Wolfgang Goethe-Universität Frankfurt
Background
The integration of perceptual modes underlying various art forms is a fundamental expression of the
inter-relatedness of artistic thought. Historical concepts such as Johan Kepler’s Harmonia mundi, or
Anasthasius Kircher's Musurgia universalis, have asserted a unity of art and science, and a felt unity
of the senses (de la Motte, 1996, p. 311). Throughout history, evidence of continuously evolving
combinations of artistic disciplines suggests that we are capable of "making sense" of art objects in
one medium through the co-influence of another. Recent experiments demonstrate that the structural
and expressive qualities of performance art have strong parallels across media (Krumhansl &
Schenck, 1997), and that meaning can be conveyed across sensory media as well (Smets & Overbeek,
).
1995 This capacity of the human mind to communicate ideas and understandings cross-modally is
a central aspect of artistic communication and, more specifically, is characteristic of higher-order
musical thinking. We believe there is sufficient evidence in previous research and writings, as well as
in our personal experiences, that when music invokes profound reactions, or when it is felt to be
understood fully, it is in part because music verifies and clarifies knowledge and understandings
acquired through the other senses.
As one step toward a more integrated view of this phenomenon, we introduce the concept of
transactional cognition in music. We call transactional those processes which involve the mutual
influence of otherwise independent forms of sensory or artistic experience. We call cognition that
human behavior which demonstrates an awareness of these processes and makes judgements based
upon that awareness. Transactional cognition is thus related to aesthetic perception and reaction,
creative reasoning, and interdisciplinary thinking in the arts. Our primary interest was to test the
strength of the relationship between alternative domains of sensory information. We designed an
experiment to explore cross-modal correspondences between two types of sensory experiences—those
of hearing (henceforth referred to as "aural") and touch (henceforth referred to as "tactile"). Our
file:///g|/Tue/Rodrigue.htm (1 of 7) [18/07/2000 00:35:48]

TuePM1_2 Rodriguez
assumption is that such correspondences might arise from a process of encoding aural information
into tactile information, and decoding tactile information back into aural information. Our principal
goal was to determine the success of this encoding/decoding process when we employed contrasting
groups of subjects, and to consider the psychological implications of this communicative process by
means of both quantitative and qualitative information. The lack of research on this topic obliges us to
review studies in cross-modality, synesthesia, and tactile response as those most closely related to our
study.
Cross-Modal, Synesthetic, and Tactile Response Studies
Qualitative research based on introspective accounts of music (Kleinen, 1994) and brain imaging data
(Petsche, Richter & Filz, 1995) suggest that the cognitive processing of music evokes a wider range of
sensory information than is available from the musical sounds themselves. These studies suggest that
the interactions of aural experience and other types of perceptual experience are active in music
cognition. Cross-modal stimulation through music listening, which we view as closely related to
transactional cognition, has become an important aspect of research in music therapy (James, 1984;
Fisher & Parker, 1994).
Research on cross-modal effects in music perception, referred to as synesthesia (Cuddy, 1994), and
cross-modal analogy (Behne, 1992), has identified strong interrelations between the senses. While
synesthesia is characterized as a stable yet idiosyncratic phenomena which occurs in relatively few
individuals (Behne, 1992), cross-modal analogy is considered more universal (Behne, 1992). For
example, the concept of 'brightness' in pitch perception might be based on a common gradient (i.e.,
salient perceptual dimension) of the visual and auditory domains (Osgood, 1960), making it an
example of cross-modal analogy rather than synesthesia. Such analogy appears to be based on
semantic coding (Osgood, 1960; Martino & Marks, 1999) or common stimulus dimensions.
Tactile perceptions of music, contrary to visual perceptions or "photisms," have been only marginally
addressed in the literature on music synesthesia (Behne, 1992; Cuddy, 1994). Vibrotactile cues in
singing or instrumental performance have been hypothesized as feedback signals that mediate
performance processes (Verillo, 1992). Despite the prominence of physical responses to music
experiences as reflected in numerous studies of spontaneous peripheral-physiological reactions to
music (Bartlett, 1996), tactile responses in music have curiously escaped systematic investigation.
Tactile and other bodily sensations have been reported in verbal accounts of music experiences
(Kleinen, 1999; Sloboda, 1991), and in studies of autonomic processes of the human nervous system
in response to music listening (Goldstein, 1980; Panksepp 1995). It appears likely that tactile
metaphors in music descriptions are not based on mere linguistic convention, but rather appear to
reflect psycho-physiological mechanisms. Spontaneous pilomotoric reflexes to music, e.g., "thrills",
or "chills" (Goldstein, 1980; Panksepp, 1995) appear related to the release of hormones that modulate
stress (McKinney et al., 1997). Tactile responses to music such as "chill" sensations occur across a
very large population irrespective of musical training, thus provide a tangible object for the study of
the aesthetic experience of music.
Method
Stimulus materials
Eight graduate art students specializing in ceramics from a large university in the midwest United
States participated in the first phase of this experiment. Three instrumental musical excerpts of
differing musical styles were selected for presentation to these ceramicists as follows:

TuePM1_2 Rodriguez
Excerpt 1— "Le Repos de la Sainte Famille" from: Hector Berlioz, L ENFANCE DU CHRIST,
English Chamber Orchestra, Philip Ledger, conductor, Thames DCD 452.
Excerpt 2— "River of Orchids" from: XTC, "Apple Venus, Volume 1," TVT Records 3250-2.
Excerpt 3— "Kismet" from: Anokha, "Soundz of the Asian Underground," Quango
314-524-341-2.
These excerpts were selected for their diversity of styles and the predominance of various musical
elements. To briefly describe the character of these excerpts, "Le Repos de al Sainte Famille",is
orchestral program music from early nineteenth century, dominated by strings and slow tempo; "River
of Orchids" is progressive rock music from England, featuring string and brass figures organized in a
cyclical composition; "Kismet" is underground dance music, comprised of mostly electronic timbres,
Indian-derived tabla accompaniment, and moderately fast tempo. Each excerpt was digitized using
sound synthesis software, and edited to exactly 1:35 minutes long, including a five-second digital
fade. The digital excerpts were written to compact disk and presented on a portable Aiwa stereo
system fitted with Sony stereo headphones.
The ceramicists were administered this treatment in two groups of four. They were seated in their
regular studio and asked to listen to one of the three excerpts and interpret the music on the surface of
a 12 x 12-inch square tile of wet clay. They were provided with the following directions:
In this study we are investigating the relationship between tactile and aural experiences. We would like
you to create a ceramic tile which represents your interpretation of a short musical excerpt. We would
also like to record your impressions of the task. Your identity in this study will be kept confidential.
Please listen carefully to the music excerpt. The first listening is just for you to become acquainted with
the music. We will repeat the music excerpt again, during which you may start to work on the clay
which you have in front of you. You may choose to hear the music excerpt repeated at any time.
However, after ten minutes we will play the excerpt again to remind you that you are at the midway
point, and for a final time after twenty minutes, or when you are finished, as a means of confirmation.
You have a maximum time limit of twenty minutes to create your ceramic tile.
We ask you to try to express your own perception of the music or aspects of the music which appear
prominent to you by means of the material in front of you. For this purpose your eyes will be
blindfolded until you are finished with your work. There are no restrictions in the way in which you
work the clay, but we ask you to leave it in just one piece and to preserve, if possible, the overall shape
of it. Any questions? You may begin now.
The single question that arose from the ceramicists was our policy on their use of ceramic "tools,"
which we permitted for those who requested them. The tools were limited to those available in the
ceramics studio—small wooden implements and sponges.
Procedure
Forty elementary education majors (33 female, 7 male) from the same participated in the second phase
of this experiment. The subjects were tested in groups of four. Each subject was seated at an Apple
MacIntosh computer equipped with web browser software and stereo headphones. The three music
fragments were presented on the browser screen as "streamed" audio files sampled at 32-bits per
second. In front of each subject we placed a randomly-assigned tile (produced earlier by the
ceramicists) which was concealed in a box on the workspace in front of them. The subjects were read
the following directions:
In this study we are investigating the relationship between tactile and aural experiences. Please listen
carefully to all three musical excerpts on the web page in front of you. Listen to each one again, this
time with your hands on the ceramic tile inside the box, and after each excerpt, answer the questions on

TuePM1_2 Rodriguez
your response sheet by placing a vertical dash on the darkened scale. If possible, also describe the
experience in your own words on the lines provided. You have ten minutes in all to complete this task. I
will notify you when you have two minutes left. Thank you for participating in this study.
The subjects were asked to answer the question How well does this music relate to the sensation of the
ceramic tile? for each of the three musical excerpts. They indicated their response on a horizontal
scale ranging from "very well" to "very poorly." The subjects were also encouraged to use their own
words to describe the relationship between tactile and aural sensations for each of the three excerpts.
Since one of the three excerpts had been used to create the selected tile by one of the ceramicists, it
was noted as the "target," while the other two excerpts were coded as "distractors."
The subjects’ ratings of the excerpts were converted to scores and used as the dependent variable in
all subsequent analyses. Target vs. distractor, presentation order, and gender served as independent
measures. An analysis of variance (ANOVA) using the three independent variables was conducted.
Results
There was a highly significant effect of target vs. distractors [F(1,38)=10.632; p<.002] (cf. Fig. 1).
However, there was no significant effect of presentation order [F(1,118)=.535; p=.587], nor of gender
[F(2,76)=1.489; p=.232]. A multivariate analysis revealed no significant interactions among these
variables.
Figure 1: Ratings of the subjective relationship between aural and tactile impressions (score in mm;
N= 40). Means, standard deviations, and range are displayed. Mean differences for target vs. distractor
items are significant (p<.002).

TuePM1_2 Rodriguez
Discussion
In this two-phase aural/tactile music experiment we found what appear to be meaningful transactional
cognitive processes between these two domains of perception. Our results suggest that the ceramicists
were able to transform the aural impression meaningfully into a tactile representation of relevant
aspects of the musical events. Further, the novice subjects were in turn able to derive sufficient
information from the tactile representations to successfully identify which music excerpt had been
transformed. In the context of our experiment, there appears to be a connection between tactile and
aural perception and reasoning. These results raise a number of issues.
The specific nature of the connection noted above is not evident, nor is the extent to which common
gradients overlap in aural and tactile experiences. Further, we have yet to identify the specific
psychological factors and task constraints that might empower or inhibit transactional cognition in
music. We will, however, provide some observational data here that attest to the
Visual inspection of the ceramic tiles created in the first phase of the experiment revealed a
heterogeneous mixture of mainly abstract works, which to different degrees also resembled concrete
visual objects. We further observed that the surface appearance of the clay related to synchronous
movement of the hands to the music, while in other cases, there was no direct relation of musical
rhythms to surface appearance. Although it is impossible to give thoroughly comprehensive
descriptions of the tiles, it seems worth noting that task constraints were not considered as a major
negative influence by the artists
The ceramicists had little or no experience with the required task. Yet, the communication of this task
to the subjects was apparently not problematic. Implicitly, it seemed clear to them what was required,
although it was entirely left open how to fulfill the requirement under the constraints of the situation
and the available materials. The imaginative transfer of information from aural to tactile form by
ceramicists might not be equated with the invention or the idiosyncratic construction of a meaningful
relationship between these perceptual domains by the artists. Rather, according to semantic coding
theory (Osgood, 1960; Martino & Marks, 1999), the creative process might be guided to some extent
by semantic gradients. Although the artists lacked extensive formal musical knowledge, they verbally
expressed sensitivity to particular structural features of their aural images. Music cognition research
(e.g. Krumhansl, 1990; Bharucha, 1995) suggests that ordinary listeners acquire some implicit formal
knowledge of music structures by mere exposure to the musical artifacts in their socio-cultural
environments. Such implicit structural knowledge of music might have influenced the process of
matching aural and tactile perceptions. However, since the subjects in the second phase of the
experiment were elementary education majors, the effect of formal music training in their
decision-making cannot be readily determined. In summary, informal observation of (and interaction
with) both groups of subjects revealed that the combined tasks were highly appropriate and enjoyable
by the majority of the subjects. Results of the questionnaires that were administered to both groups of
subjects will be analyzed and presented in a later study, in which time we will address the
psychological factors which mediate the systematic processing and transfer of sensory data across
domains.
Implications for Musical Development and Education
Traditional music instruction has focused on the development of performance and listening skills
while the antecedents of creative activities, those which invoke expressive and interpretative thinking
and gestures, have been comparatively de-emphasized. Given that music responses are often
strengthened if several perceptual domains are involved, it seems important to draw conclusions of
how these research findings might help improve music instruction.

TuePM1_2 Rodriguez
Sensitivity to musical expression in performance and listening, while arguably the most component of
musicality, is not typically an organic part of the learning process, owing to the its highly personal and
intuitive nature. However, expressive and interpretative musical thinking are considered highly-prized
outcomes of competencies in other areas such as reading notation, motor skill proficiency, idiomatic
knowledge, and so forth, which raises the question of whether traditional forms of music instruction
are internally consistent. Music instruction can be made more meaningful and satisfying if students
are encouraged to make interpretations in sound, that is, to explore connections between music and
other types of sensory experiences, for therein lie the creative possibilities of the medium. Guidelines
for these explorations should take into account the strength of relationships between aural, tactile,
visual, and kinesthetic sensations, as well as the myriad extramusical experiences that are invoked in
musical experience.
When we regard someone as "musical" we refer to their skill in communicating musical ideas through
a variety of related behaviors that include performing, listening, creating, notating, and so forth. While
traditional music instruction generally supports the development of these behaviors, ultimately the
ability to successfully communicate something meaningful requires that one make connections
between musical experiences and other kinds of experiences. We intend for this study to serve as an
example of how these connections might be approached didactically, thus allowing more developing
musicians to think more carefully about musical expression and musical meaning. We propose that
future research be conceived with greater attention to the conditions and needs of music pedagogy.
Such research must emphasize the formulation and use of creative teaching strategies that encourage
students to explore the relationships between aural and other types of experience.
References
Bartlett, Dale (1996). Physiological responses to music and sound stimuli. In: D. A. Hodges
(ed.) Handbook of music psychology, San Antonio: IMR-Press, 343-385.
Behne, Klaus-Ernst (1992). Am Rande der Musik: Synästhesien, Bilder, Farben, ... Jahrbuch
Musikpsychologie 8, 94-120.
Bharucha, Jamshed (1995). Neural Nets and Music Cognition. In: R.Steinberg (ed.). Music and
the mind machine. The psychophysiology and psychopathology of the sense of music. Berlin:
Springer, 199-204.
Cuddy, Lola L. (1994). Synästhesie. In H. Bruhn, R. Oerter and H. Rösing (eds.)
Musikpsychologie. Ein Handbuch. Reinbek: Rowohlt, 499-505.
de la Motte-Haber, Helga (1996). Handbuch Musikpsychologie, 2.Auflage. Laaber:
Laaber-Verlag.
Fisher, Kimberly V. and Barbara J. Parker (1994). A multisensory system for the development
of sound awareness and speech production. Journal of the Academy of Rehabilitative Audiology
27, 13-24.
Goldstein, Avram (1980). Thrills in response to music and other stimuli. Physiological
Psychology 8(1), 126-129.
James, Mark R. (1984). Sensory integration: A theory for therapy and research. Journal of
Music Therapy 21(2), 79-88.

TuePM1_2 Rodriguez
Kleinen, Günter (1994). Die psychologische Wirklichkeit der Musik. Wahrnehmung und
Deutung im Alltag. Kassel: Bosse.
Kleinen, Günter (1999). Die Leistung der Sprache für das Verständnis musikalischer
Wahrnehmungsprozesse. Musikpsychologie 14, 52-68.
Krumhansl, Carol (1990). The cognitive foundations of musical pitch. Oxford: Oxford
University Press.
Krumhansl, C. & Schenck, D.L. (1997). Can dance reflect the structural and expressive
qualities of music? A perceptual experiment on Ballanchine’s choreography of Mozart’s
Divertimento no. 15. Musicae Scientiae. 1, (1), 63-85.
Martino, Gail and Lawrence E. Marks (1999). Perceptual and linguistic interactions in speeded
classification: Tests of the semantic coding hypothesis. Perception 28(7), 903-924.
McKinney, Cathy H., Frederick C. Tims, Adarsh M. Kumar, and Mahendra Kumar (1997). The
effect of selected classical music and spontaneous imagery on plasma -endorphn. Journal of
Behavioral Medicine 20(1), 85-99.
Osgood, Charles E. (1960). The cross-cultural generality of visual-verbal synesthetic
tendencies. Behavioral Science 5, 146-169.
Panksepp, Jaak (1995). The emotional sources of "chills" induced by music. Music Perception
13(2), 171-207.
Petsche, Hellmuth, Peter Richter and Oliver Filz (1995). EEG in music psychological studies.
In: R.Steinberg (ed.). Music and the mind machine. The psychophysiology and psychopathology
of the sense of music.. Berlin: Springer, 205-214.
Sloboda, John A. (1991). Music structure and emotional response. Psychology of Music 19,
110-120.
Smets, G.J.F. and Overbeek, C.J. (1995). Expressing tastes in packages. Design Studies. 16,
349-365.
Verrillo, Ronald T. (1992). Vibration sensation in humans. Music Perception 9(3), 281-302.
Back to index

Proceedings paper
How identification of rhythmic categories depends on tempo and meter

P. Desain (1), C. Jansen (1), and H. Honing (1, 2)
(1) Music Mind Machine Group, NICI, Nijmegen University
(2) Music Department, University of Amsterdam
Extended abstract
In previous work we elaborated a way to systematically investigate the perceptual division of a
continuous space of temporal patterns into discrete rhythmic categories (Aarts, Desain & Jansen, in
preparation; Aarts & Jansen, 1999). We studied the perception of short rhythmical patterns by
sampling the space of all possible patterns of three interonset-intervals (IOI), with a total duration of
one second. For this, musicians were asked to transcribe these short patterns of three IOIs (i.e. four
onsets) into music notation. It was shown that subjects were able to identify rhythmic categories as
regions of the rhythm space. Next to an insight in the perception of rhythmic categories, the results are
essential in the design and construction of automatic music transcription systems (see Cemgil, Desain
& Kappen, in press).
The current study was conducted to clarify how the shape of the rhythmic categories depends on
metric context and global tempo. Previous studies (e.g., Clarke, 1987; Schulze, 1989) already
addressed the effect of tempo and meter. However, these results were not unequivocal. Also,
generalization was limited due to a relatively small set of stimuli and a predefined set of responses
(arguably steering the subject to the available categories).
In this study we elaborated on the paradigm used in earlier studies of systematically sampling the
space of temporal patterns using open responses. To investigate the context effects of tempo and
meter we conducted two experiments in which a total of seventeen conservatory-trained musicians
transcribed aurally presented rhythmic patterns by means of a simple computer interface for music
notation.
In the first experiment the rhythms were presented at three tempi (40, 60, 90 BPM). The total stimulus
consisted of nine beats, marking eight measures. In the 3rd, 5th and 7th measure the rhythmical
pattern was presented. The length of a measure was resp. 0.67, 1.0 and 1.5 seconds in the three tempo
conditions. The beat and pattern were marked with different sounds (resp. high and low congo).
In the second experiment, the same patterns were presented in the context of a duple meter and a triple
meter, at 60 BPM. The stimulus consisted of nine beats, marking eighth measures. Each beat was
divided in two (for duple meter) or three (for triple meter) intervals marked with a softer sub-beat. In
the 3rd, 5th and 7th measure the rhythmical pattern was presented.
The results of the first experiment show how the rhythmical regions change shape in the different
tempi, depending on their complexity. In the second experiment, different rhythmic categories were
given as response depending on the metrical context, again in accordance with the relative rhythmic
file:///g|/Tue/Desain.htm (1 of 3) [18/07/2000 00:35:50]

complexity of the patterns in these meters. Metrical context determines what rhythmical categories
occur (see Figure 1), and tempo affects the size and shape of these categories. Notwithstanding these
clear results, the variance of the data between subjects reflects the complexity of decision making by
musicians in quantizing rhythmical patterns. The consequences of these findings for existing models
of rhythmic quantization are the topic of further research.
Figure 1: Ternary plot of the responses (drawn as a line from stimulus to response; line-width
indicates the number of equal responses) for 66 stimuli (open circles) consisting of three IOIs
presented in a duple meter context (left) and in a triple meter context (right).
The direction of a tick mark indicates to which grid line it belong. For example, the stimulus in
the middle of the rhythm space is 1/3-1/3-1/3 or 1:1:1).
These results show that metric context has a strong effect on perceived rhythmic categories.
For example, the rhythmic category 2:1:1 is only present in duple context, 1:3:2 only in the
triple condition.
Acknowledgement
This research has been made possible by the Netherlands Organization for Scientific Research (NWO)
as part of the "Music, Mind, Machine" project. More information can be found at
http://www.nici.kun.nl/mmm.
References
Aarts, R. and Jansen, C. (1999) Categorical perception of short rhythms. In Proceedings of the
1999 SMPC, pp 57. Evanston.
Aarts, R., Desain, P., and Jansen, C. (In Preparation) The quantization of short rhythmical

patterns.
Cemgil, T., Desain, P., and Kappen, B. (In Press) Rhythm quantization for transcription.
Computer Music Journal.
Clarke, E. F. (1987). Categorical rhythm perception: an ecological perspective. A. Gabrielson
(Ed.), Action and Perception in Rhythm and Music. Royal Swedish Academy of Music, 55.
Schulze, H. H. (1989). Categorical perception of rhythmic patterns. Psychological Research,
51, pp 2-9.
Back to index

MAPPING MUSICAL PERFORMANCE
Proceedings paper
Do the Principles of Expert Memory Apply to Musical Performance?
Aaron Williamon
Royal College of Music, Prince Consort Road, London, SW7 2BS, UK
Elizabeth Valentine
Department of Psychology, Royal Holloway, University of London,
Egham, Surrey, TW20 0EX, UK
Exceptional memory is a hallmark of expertise. The demands placed on memory during musical
performance, for example, are remarkable, sometimes requiring the reproduction of over 1000 notes a
minute for periods of up to 50 minutes. Unsurprisingly, performers often accrue hours of extra practice on
a composition, developing multiple retrieval systems that will permit a performance to continue come what
may (Chaffin & Imreh, 1997). Chase and Ericsson’s (1982) Skilled Memory Theory has commonly been
accepted as accounting for expert memory (Schneider & Detweiler, 1987; Carpenter & Just, 1989;
Anderson, 1990; Baddeley, 1990; Newell, 1990; Ericsson & Kintsch, 1995). The theory proposes that
outstanding memory abilities arise from the creation and efficient use of "retrieval structures". These
structures can only be attained under restricted circumstances. First, individuals must be able to store
information in LTM rapidly. This requires a large body of relevant knowledge and patterns for the specific
type of information involved. Secondly, the activity must be familiar, so that individuals can anticipate
future demands for the retrieval of relevant information. Finally, individuals must associate the encoded
information with appropriate retrieval cues. This association adheres to hierarchical and serial principles
and permits the activation of a particular retrieval cue at a later time, thus partially reinstating the
conditions of encoding so that the desired information can be retrieved from LTM (for discussions of
hierarchical and serial principles see Collins & Quillian, 1969, 1970; Smith et al., 1974; Rosch et al., 1976;
Kosslyn, 1980, 1981, 1987, 1994; Pylyshyn, 1981, 1984; Hinton et al., 1986; Sellen & Norman, 1992;
Shaffer, 1976; Sternberg et al., 1978; Sternberg et al., 1988; Palmer & van de Sande, 1995).
Only after a set of retrieval cues are organised in a stable structure, is a retrieval structure formed, thereby
enabling individuals to "retrieve stored information efficiently without lengthy search" (Ericsson &
Staszewski, 1989, p. 239). Several researchers have voiced doubts about the theory’s generalisability to
working memory (see Baddeley, 1990; Schneider & Detweiler, 1987). Consequently, Ericsson and Kintsch
(1995) have extended the theory into the Long-Term Working Memory Theory, asserting that "storage in
working memory can be increased and is one of many skills individuals attain during the acquisition of
skilled performance" (p. 220). Nevertheless, their description of the cognitive mechanisms that permit
information to be extracted for memorised performances (i.e. retrieval structures) remains unaltered.
Chaffin and Imreh (1994, 1996a, 1996b, 1997) systematically observed the practice of a concert pianist to
determine whether she used the kind of highly practised, hierarchical retrieval structure described by Chase
and Ericsson (1982) to memorise and perform the Presto from Bach’s Italian Concerto. Practice for this
file:///g|/Tue/Williamo.htm (1 of 10) [18/07/2000 00:35:53]

piece was divided into 58 sessions, aggregated into three learning periods and spread over ten months.
Sessions were video-taped, and cumulative records were created showing the pianist’s starting and
stopping points in the music. They also examined the pianist’s concurrent and retrospective commentary
on her practice. Chaffin and Imreh confirmed the prediction that the concert pianist would use a
hierarchically ordered retrieval structure to recall encoded information. Moreover, they found that she
organised her practice and subsequent retrieval of the Presto according to its formal structure. The number
of practice segments that started and stopped at boundaries in the formal structure during practice were
significantly higher than the number that started and stopped at other locations.
In a follow-up study, the pianist was asked to write out the first page (i.e. 32 bars) of the score from
memory two years after the project. She was not informed before the project that she would be asked to
perform this task. During the interval between the performance of the piece and recall, the pianist did not
practise or perform the Presto. The researchers found that recall was significantly better for the bars
beginning each section than for bars at other locations, confirming, once again, that the hierarchical
components of the music’s formal structure formed an enduring foundation for the pianist’s retrieval
structure.
Chaffin and Imreh’s (1994, 1996a, 1996b, 1997) work is the first to demonstrate that the principles of
expert memory apply to concert soloists. Subsequent investigations, however, should examine this issue
across more than just one performer and across several levels of skill. Such research would test the
generalisability of their findings and indicate how the implementation of hierarchical retrieval schemes in
practice develops as a function of expertise. This paper details such an investigation, in which the practice
of 22 pianists at four levels of skill was examined as they prepared an assigned composition for a
memorised performance.
Method
The Musicians
Six piano teachers from southeast England were asked to recommend students capable of learning and
performing a selected piece of music suited to their level of ability from memory. Thirty-seven pianists
were recruited for the study. Of those 37, a complete set of data was collected and analysed for 22
participants. Participation was strictly voluntary but encouraged by the piano teachers because the
conditions of participation were seen to contribute to students’ overall musicianship by providing
invaluable and challenging performance experience.
The participating pianists were classified into four levels of ability based on the grading system set forth by
the Associated Board of the Royal Schools of Music (see Harvey, 1994). This system contains eight
grades, with Grade 1 representing the lowest level of skill and Grade 8 representing the highest. Musicians
at Grade 8 are usually considered to possess high performance standards, though falling short of expertise.
The four levels span all eight grades and were stratified as follows: pianists of Grade 1 & 2 standard were
placed in Level 1 (2 male, 3 female); Grade 3 & 4 in Level 2 (3 male, 3 female); Grade 5 & 6 in Level 3 (2
male, 4 female); and Grade 7 & 8 in Level 4 (5 female). (Williamon & Valentine, 2000, found that the
musicians within each of these levels were adequately comparable in terms of overall musical competence
and training so as to satisfy the requirements of the normal distribution. They also found that age
significantly differed between levels; therefore, age will be entered as a covariate in all between-level
comparisons in this paper).
The Music
The pianists were assigned one piece of music appropriate to their level of ability. All selected pieces were

composed by J.S. Bach. The compositions for Levels 1 to 4 were, respectively, the Polonaise in G Minor
from the Anna Magdalena Notebook (BWV Anh. 119), the Two Part Invention in C Major (BWV 772), the
Three Part Invention in B Minor (BWV 801), and the Fugue in D Minor from the Well-Tempered Clavier I
(BWV 851; Level 4 pianists also prepared the Prelude in D Minor, but the values reported in this paper
were obtained for the Fugue only).
Procedure
The pianists were asked to record all practice for their assigned piece on cassette tape. The participants
were invited to comment, either on tape or in writing, on any relevant aspect of the learning process (these
comments were subsequently transcribed by the first author). In addition, pianists were asked to note and
describe all practice carried out away from the piano, including singing the music and analysing the score.
Participants were informed at the outset of the study that they would be required to perform the assigned
piece from memory in a recital setting, attended by their teachers, parents and fellow music students. The
time and location of each recital was arranged by the respective music teacher as part of students’ regular
curriculum. No restrictions on the amount of time or the number of practice sessions were placed upon the
pianists, except for those normally affixed by themselves or their music teachers. Following each
performance, the pianists were interviewed about the practice and memorisation process and asked to
comment on the project itself, including its design and implementation.
Cumulative Records
The recorded practice sessions were transcribed into cumulative records for each pianist. These records
contained both quantitative and qualitative information on the learning process as a whole and on each
individual practice session (a practice session was defined as a discrete period of time, of variable length,
in which musicians practised the assigned composition either at or away from the piano). The records
documented characteristics of practice such as the total time spent practising, the number of days
encompassed within the learning process (from the first practice session up to the final performance), the
number of practice sessions in the learning process, the number of practice sessions per day and the time
spent in each practice session. In addition, graphs were plotted for each practice session showing starting
and stopping points for the segments of music played by each pianist. The graphs − with the x-axis
representing bars of the music and the y-axis depicting the cumulative number of practice segments −
represent the sequence of segments of the music executed by a particular pianist during a given practice
session. Such graphs were originally introduced by Chaffin and Imreh (1994, 1996a, 1996b, 1997). The
pianists often corrected one or two notes whilst continuing to play through the music. This type of practice
is analogous to a stutter in speech. Such stutters were not included in the cumulative records. All graphs
were transcribed from the cassette tapes by the first author.
Results
Segmenting the Assigned Compositions
Following each performance, the musicians were interviewed. One set of interview questions required that
the pianists indicate whether they had thought of their assigned composition as having component sections
during both practice and performance, and if so, why and how they partitioned it. In another set of
questions, they were asked to identify the bars in which difficult passages occurred in the music and
explain why they were difficult. These were open ended questions, not intended to lead the pianists into
particular answers, such as the identification of the music’s formal structure or the cataloguing of
difficulties into specific types. In sum, the findings reveal that participants segmented their assigned
composition into various hierarchical organisations, not always coincident with the formal structure.

Moreover, the interview data indicate that the higher skilled musicians demonstrated an extended use of
hierarchy when reporting information about local detail in their composition (i.e. difficult bars).
The Role of Segmentation in Practice
To determine the extent to which segmentation played a role in guiding the practice of the musicians, bars
of the assigned compositions were categorised as "structural," "difficult" or "other". Unlike Chaffin and
Imreh’s (1994, 1996a, 1996b, 1997) study, the formal structure of the music was not used as the basis of
this categorisation because only three pianists (one in Level 3 and two in Level 4) reported that the formal
structure influenced their segmentation of the assigned piece. Instead, the categorisation system was based
on the pianists’ individual-specific segmentation of the music and identification of difficult bars. Bars were
classified as "structural" if they were the first bar in each of the identified sections and subsections. They
were labelled as "difficult" if they had been previously named as such by the pianists. No differentiation
was made between types of difficulty. All remaining bars were placed into the "other" group. In four cases,
two pianists in Level 2, one in Level 3 and one in Level 4 labelled one bar in their composition as both
structural and difficult. In these instances, the bars were omitted entirely from subsequent analyses. Also,
the frequency of the first bar from each assigned composition was excluded from all analyses. Obviously,
the first bar of a piece may play a guiding role in any hierarchical retrieval scheme; however, it was
excluded because of the multitude of reasons as to why musicians may decide to start their practice at the
beginning of a piece. Two of these are that musical information is organised linearly and that any attempt
to simulate a complete performance is likely to begin on the first bar.
Using this classification system, the frequency with which pianists started their practice on structural,
difficult and other bars was obtained for each practice session. Initially, these frequencies were to be
compared both within and between ability levels, but closer inspection of the values revealed that such
comparisons were not valid for three reasons. In terms of within-level comparisons, the number of
structural, difficult and other bars identified by each pianist varied considerably. Consequently, the
resulting frequencies may have increased or decreased based on the number of each type of bar. As for
between-level comparisons, the findings of Williamon and Valentine (2000) reveal that pianists in the
sample at higher ability levels spent more time practising in each practice session. As a result, they may
have started practice on structural, difficult and other bars more often than pianists at lower levels of ability
in this extra time. Also, the number of bars in each assigned piece was different. Therefore, in the
hypothetical situation that all bars were equally important in terms of encoding and retrieving musical
information, the probability of the pianists starting their practice on any one bar would decrease with an
increase in the length of the piece.
To account for these within- and between-level inconsistencies, a measure was calculated reflecting the
deviation between (1) the observed frequencies of starts on structural, difficult and other bars and (2) the
expected frequencies based on the number of each type of bar identified and the number of bars in the
assigned piece. The equations by which these calculations were performed were derived from that used to
calculate expected frequencies in the Chi-squared test (see Goodman, 1957; Kendall & Stuart, 1963). The
calculated values (referred to from here on as δ s for structural bars, δ d for difficult bars and δ o for other
bars) give an equivalent of "z" scores, by which positive integers indicate more starts on a specific bar type
than would be expected and negative integers indicate fewer starts on a specific bar type than would be
expected. The equations and calculations used to obtain δ s, δ d and δ o for each pianist in each practice
session are as follows:
The Equations
Measure of Deviation of Observed "Structural" Starts from Expected "Structural" Starts

Measure of Deviation of Observed "Difficult" Starts from Expected "Difficult" Starts
(a)
Measure of Deviation of Observed "Other" Starts from Expected "Other" Starts
The Calculations
Step 1: The proportion of "structural," "difficult" and "other" bars to the total number of bars
❍ nsi = number of structural bars identified by pianist "i"
❍ ndi = number of difficult bars identified by pianist "i"

❍ noi = number of other bars identified by pianist "i"
❍ Ni = the total number of bars (nsi + ndi + noi)
❍ The proportion of structural, difficult and other bars to the total number of bars:
(proportion structural)
(proportion difficult)
(proportion other)
Step 2: The number of actual starts on "structural," "difficult" and "other" bars
❍ fsi = number of observed starts on structural bars for pianist "i"
❍ fdi = number of observed starts on difficult bars for pianist "i"

❍ foi = number of observed starts on other bars for pianist "i"
❍ Mi = number of total starts (fsi + fdi + foi)
Step 3: The number of expected starts on "structural," difficult" and "other" bars
❍ esi = psi x Mi (number of expected structural starts)
❍ edi = pdi x Mi (number of expected difficult starts)

❍ eoi = poi x Mi (number of expected other starts)
The means for δ s, δ d and δ o across all practice sessions for each level of ability are listed in Table 1.
These values for the deviation of the observed from expected frequencies for each bar type were compared
using a two-factor mixed analysis of covariance (ANCOVA) with deviation as the dependent variable, bar
type (i.e. structural, difficult and other) as the within-subjects independent variable, level as the

between-subjects independent variable, and age as the covariate. The analyses revealed significant main
effects of bar type [F (2,34)=262.25, p<0.001] and level [F (3,17)=58.88, p<0.001] and a significant
interaction between bar type and level [F (6,34)=265.22, p<0.001]. Subsequent polynomial contrasts (i.e.
comparisons of δ s versus δ d and δ s & δ d combined versus δ o) further defined these findings in that
structural starts were more frequent than difficult starts [t (17)=225.16, p<0.001] and structural and
difficult starts combined were more frequent than other starts [t (17)=373.91, p<0.001]. These contrasts
also revealed that the predominance of structural starts over difficult starts was greater for pianists at higher
levels of ability [t (17)=263.33, p<0.001].
To explore the extent to which structural and difficult bars guided the pianists’ practice throughout the
learning process, the deviations of the observed from expected frequency of structural and difficult starts
were examined at three discrete stages of practice. Stage 1 included values for each pianist’s first three
practice sessions; Stage 2 included values for the middle three practice sessions; Stage 3 included values
for the last three practice sessions. The mean values for δ s and δ d at the three stages for each ability level
are also displayed in Table 1.
Table 1. Means for δ s, δ d and δ o for each level of ability at each Stage of practice.
Level 1 Level 2 Level 3 Level 4
Mean δ s 2.48 6.21 15.64 21.97
Stage 1 δ s -4.33 -4.23 -3.13 -2.81
Stage 2 δ s 1.61 5.20 13.78 17.73
Stage 3 δ s 9.95 14.04 21.45 25.43
Mean δ d 0.13 -1.12 -3.39 -3.33
Stage 1 δ d 0.25 -0.38 -1.38 -1.35
Stage 2 δ d -0.08 -1.78 -3.88 -3.85
Stage 3 δ d -0.43 -1.86 -4.08 -4.13
Mean δ o -1.52 -2.88 -4.18 -6.43
The values were analysed by two two-factor mixed ANCOVAs with the deviation of the observed from
expected starts on structural and difficult bars as the respective dependent variables, stage as the
within-subjects independent variable, level as the between-subjects independent variable and age as the
covariate. For δ s, the analyses revealed significant main effects of stage [F (2,34)=346.70, p<0.001] and
level [F (3,17)=529.87, p<0.001] and a significant interaction between stage and level [F (6,34)=114.90,
p<0.001]. Further polynomial contrasts (i.e. comparisons of Stage 1 versus Stage 3 and Stages 1 & 3 versus
Stage 2) revealed that these deviations increased linearly across the practice process [t (17)=912.49,
p<0.001] and that they increased most for pianists at higher levels of ability [t (17)=263.34, p<0.001]. For

δ d, the analyses revealed significant main effects of stage [F (2,34)=64.79, p<0.001] and level [F
(3,17)=62.44, p<0.001] and a significant interaction between stage and level [F (6,34)=78.21, p<0.001].
Further polynomial contrasts (i.e. comparisons of Stage 1 versus Stage 3 and Stages 1 & 3 combined
versus Stage 2) revealed that these deviations were greater in Stage 1 than in Stage 3 [t (17)=96.22,
p<0.001] and that this decrease occurred most for pianists at higher levels of ability [t (17)=59.61,
p<0.001].
Discussion
In sum, the data indicate that (1) pianists at all ability levels started their practice on "structural" bars more
frequently than on "difficult" and "other" bars, (2) the overall use of structural bars in starting practice
segments increased significantly with stage of practice and ability level and (3) pianists started their
practice on difficult bars less frequently from Stage 1 to 3. The results suggest, therefore, that the
identification and implementation of structure in guiding practice is a salient characteristic of musical skill
and becomes even more so as a function of expertise. Moreover, they demonstrate that the influence of
difficult bars in directing practice was increasingly replaced by the use of structural bars to guide rehearsal
(i.e. "difficult" starts significantly decreased across the three stages; "structural" starts significantly
increased across the three stages).
The findings give some insight into the use of "structure" in guiding performance. Existing research
suggests that if individuals use a retrieval scheme during performance, then they must use the same scheme
to encode the information (Tulving & Pearlstone, 1967; Baddeley, 1990) and must practise using it to
guide retrieval (Ericsson & Kintsch, 1995). Considering (1) that the identification of structural bars was
based on the pianists’ reports of sections in the music that were important in both practice and performance
and (2) that these bars were increasingly exploited across the practice process, one could argue that this
exploitation was not only important for the encoding of musical information but also for its retrieval.
In general, these findings support the arguments of Chase and Ericsson (1982) and Ericsson and Kintsch
(1995) in that musical performers appear to implement hierarchical retrieval structures in practice so that
they may use them to guide retrieval in performance. This study goes beyond existing research by
examining retrieval structures across more than just one performer and across several levels of skill,
demonstrating the generalisability of Chaffin and Imreh’s (1994, 1996a, 1996b, 1997) findings to other
skilled musicians and documenting how the exploitation of hierarchical retrieval schemes in practice and
performance develops as a function of expertise.
References
Allard, F., Graham, S., & Paarsalu, M. E. (1980). Perception in sport: Basketball. Journal of
Sport Psychology, 2, 14-21.
Anderson, J. R. (1990). Cognitive Psychology and Its Implications. San Francisco: Freeman.
Baddeley, A. D. (1990). Human Memory: Theory and Practice. Boston: Allyn & Bacon.
Carpenter, P. A., & Just, M. A. (1989). The role of working memory in language
comprehension. In D. Klahr & K. Kotovsky (Eds.), Complex Information Processing: The
Impact of Herbert A. Simon. Hillsdale, NJ: Lawrence Erlbaum Associates.

Chaffin, R., & Imreh, G. (1994). Memorizing for performance: a case study of expert
memory. Paper presented at the Third Practical Aspects of Memory Conference. University of
Maryland.
Chaffin, R., & Imreh, G. (1996a). Effects of difficulty on practice: a case study of a concert
pianist. Poster presented at the Fourth International Conference on Music Perception and
Cognition. McGill University: Montreal, Canada.
Chaffin, R., & Imreh, G. (1996b). Effects of musical complexity on expert practice: a case
study of a concert pianist. Poster presented at the Meeting of the Psychonomic Society.
Chicago, IL.
Chaffin, R., & Imreh, G. (1997). "Pulling Teeth and Torture": Musical Memory and Problem
Solving. Thinking and Reasoning, 3, 315-336.
Chase, W. G., & Ericsson, K. A. (1982). Skill and working memory. In G. H. Bower (Ed.),
The Psychology of Learning and Motivation (Vol. 16). New York: Academic Press.
Chase, W. G., & Simon, H. A. (1973a). Perception in chess. Cognitive Psychology, 4, 55-81.
Chase, W. G., & Simon, H. A. (1973b). The mind’s eye in chess. In W. G. Chase (Ed.), Visual
Information Processing. New York: Academic Press.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of
Verbal Learning and Behavior, 8, 240-248.
Collins, A. M., & Quillian, M. R. (1970). Does category size affect categorisation time?
Journal of Verbal Learning and Behavior, 9, 432-438.
de Groot, A. (1946/1978). Thought and Choice in Chess. The Hague: Mouton. (Original work
published in 1946).
Deakin, J. M. (1987). Cognitive Components of Skill in Figure Skating. Unpublished PhD
thesis. University of Waterloo.
Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review,
102, 211-245.
Ericsson, K. A., Krampe, R. Th., & Tesch-Römer, C. (1993). The role of deliberate practice in
the acquisition of expert performance. Psychological Review, 100, 363-406.
Ericsson, K. A., & Staszewski, J. J. (1989). Skilled memory and expertise: Mechanisms of
exceptional performance. In D. Klahr & K. Kotovsky (Eds.), Complex Information
Processing: The Impact of Herbert A. Simon. Hillsdale, NJ: Lawrence Erlbaum Associates.
Goodman, R. (1957). Teach Yourself Statistics. London: English University Press.
Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In
D. E. Rumelhart, J. L McClelland & The PDP Research Group (Eds.), Parallel Distributed
Processing: Foundations (Vol. 1). Cambridge, MA: MIT Press.
Kendall, M. G., & Stuart, A. (1963). The Advanced Theory of Statistics (Vol. 1). London:
Charles Griffin and Company.
Kosslyn, S. M. (1980). Image and Mind. Cambridge, MA: Harvard University Press.
Kosslyn, S. M. (1981). The medium and the message in mental imagery: A theory.

Kosslyn, S. M. (1987). Seeing and hearing in the cerebral hemispheres: A computational

approach. Psychological Review, 94, 148-175.
Kosslyn, S. M. (1994). Image and Brain: The Resolution of the Imagery Debate. Cambridge,
MA: MIT Press.
Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press.
Palmer, C., & van de Sande, C. (1995). Range of planning in music performance. Journal of
Pylyshyn, Z. W. (1981). The imagery debate: Analogue media versus tacit knowledge.
Pylyshyn, Z. W. (1984). Computation and Cognition. Cambridge, MA: MIT Press.
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic
objects in natural categories. Cognitive Psychology, 8, 382-439.
Sellen, A. J., & Norman, D. A. (1992). The psychology of slips. In B. J. Baars (Ed.),
Experimental Slips and Human Error: Exploring the Architecture of Volition. New York:
Plenum Press.
Shaffer, L. H. (1976). Intention and performance. Psychological Review, 83, 375-393.
Schneider, W., & Detweiler, M. (1987). A connectionist / control architecture for working
memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 21). New
York: Academic Press.
Simon, H. A. (1976). Neural mechanisms of learning and memory. In M. R. Rozenweig & E.
L. Bennett (Eds.), The Information Storage System Called "Human Memory." New York: MIT
Press.
Smith, E. E., Shoben, E. J., & Rips, L. J. (1974). Structure and process in semantic memory: A
featural model for semantic decisions. Psychological Review, 81, 214-241.
Starkes, J. L., Deakin, J. M., Lindley, S., & Crisp, F. (1987). Motor versus verbal recall of
ballet sequences by young expert dancers. Journal of Sport Psychology, 9, 222-230.
Staszewski, J. J. (1988). Skilled memory and expert mental calculation. In M. T. H. Chi, R.
Glaser & M. J. Farr (Eds.), The Nature of Expertise. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Sternberg, S., Knoll, R. L., Monsell, S., & Wright, C. E. (1988). Motor programs and
hierarchical organization in the control of rapid speech. Phonetica, 45, 175-197.
Sternberg, S., Monsell, S., Knoll, R. L., & Wright, C. E. (1978). The latency and duration of
rapid movement sequences: Comparisons of speech and typewriting. In G. E. Stelmach (Ed.),
Information Processing in Motor Control and Learning. New York: Academic Press.
Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in
memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381-391.
Williamon, A., & Valentine, E. (2000). Quantity and quality of musical practice as predictors
of performance quality. British Journal of Psychology, 91.

Back to index

Although recent advances in the psychology of music have begun t...arn about the impact these factors have (Hargreaves and North, 1
Proceedings paper
Popular music: a pervasive and neglected art form?

Raymond A.R. MacDonald, Department of Psychology, Glasgow Caledonian University,
Cowcaddens Road, Glasgow G4 0BA, UK.
Telephone: + 44 (0) 141 331 3971
Fax: + 44 (0) 141 331 3636
Introduction
Modern technological advances mean that now, more than at any other time in history, music is
pervasive and functions not only as a pleasurable art form, but surrounds many activities of daily life
(Mertz, 1998). Indeed, a recognition of the ubiquitous presence of music has been one of the factors
motivating significant research interest in the effects of music listening on a range of psychological
and physiological variables. Consequently, there has been significant debate within the psychology of
music literature regarding these effects with a wide range of effects of music listening suggested by
researchers (Overy, 1999). These include the effects of listening to music on cognitive skills Rauscher
(2000), on heart rate and blood pressure (Aldridge, 1996), on cortisol and norepinephrine levels
(Vander Ark & Mostardi, 2000) on emotional responses (Radocy & Boyle, 1997) and on consumer
behaviour (North and Hargreaves, 1997).
Recent advances in the psychology of music have begun to highlight the importance of researching
the impact of factors beyond the nature of the music itself on music perception. Issues such as social,
environmental and cultural influences have been investigated but there is a need for more research
which focuses on the impact these factors might have (Hodges and Haack, 1996; Miell and
MacDonald, in press). Social and cultural influences such as peer groups, the family and the listening
context have been highlighted as key areas of research interest (Hargreaves and North, 1997) and it is
important to further develop our knowledge of how these and other extra-musical factors impact upon
our responses to musical stimuli.
When examining the role of these extra-musical influences it is important to consider the impact of
popular music culture. Through, for example, television, radio, the film industry, magazines etc
popular music culture plays an influential role in everyday life (Frith, 1987). Such is the impact of
popular music culture that it could be seen as constituting an 'informal learning environment' - that is a
means through which we all learn and develop our preferences for music (Folkestad, 1998). One
example of the influence of popular music culture is when we look at how children develop their
musical skills, preferences and knowledge. Some authors have speculated on whether these informal
learning environments have more to do with a child's developing musicality than the conventional
classroom setting for music teaching (Folkestad, 1998). Folkestad suggests that previous research has
tended to view a child's musical development as intrinsically linked to the school or institutional
environment and he argues that, instead, the impact of popular music culture is such that we need to
file:///g|/Tue/MacDonal.htm (1 of 8) [18/07/2000 00:35:56]

move away from this narrow definition of musical development and include an analysis of cultural
influences such as pop music on the child's developing musicality. This has direct relevance for music
education research, as it is important to consider this wider social context in which children listen to,
play and learn about music (Miell and MacDonald, in press) These research priorities have relevance
not only for children but also for adult music listening as well.
The points raised above have set a context highlighting the extra-musical aspects of music perception,
emphasising the role that popular culture has in this process. At a more specific level, it is also
important to note that popular music is the predominating genre of music in modern popular culture.
Its centrality in the personal identity of adolescents has been reported by Zillman and Gan, (1997)
and, as a pervasive art form, is present while we perform many day to day activities from shopping to
drinking in a bar (Hodges and Haack, 1996). However, as Hargreaves and North (1997) suggest, this
influence is not reflected within the research literature that investigates the psychology of music at a
general level or the effects of music listening at a more specific level. The aim of this paper is to
highlight, from a psychological perspective, the importance that popular music should have for
researchers interested in the psychological effects of music. I will do this by reviewing existing
literature and presenting two studies that have relevance to this topical issue.
Method
Overviews of two music listening studies are presented here. These two studies are chosen as they
involve investigations of the effects of listening specifically to popular music, extending similar
studies of listening which have been undertaken using classical music. The first study was a small
scale student project while the second study was part of a project funded by the Scottish Executive
investigating the effects of listening to music in a hospital setting for the purposes of pain and anxiety
reduction.
Study one
This study investigated the emotional changes taking place after listening to different types of music
and also compared the effect of listening to music in different locations. Previous research has
suggested that the impact of the listening context is a crucial variable in how we interpret music
(North and Hargreaves, 1997) We were interested in comparing emotional responses to popular versus
classical music. While there is a long history of research investigating emotional responses to music
(Meyer 1956, Sloboda, 1985) highlighting issues such as emotionality embedded in structural features
of music it is suggested here that much of this research has neglected investigations of popular music.
Thirty participants listened to 4 pieces of popular music and 4 classical pieces. The four pieces in the
pop category were: The Verve, Bitter Sweet Symphony; Robbie Williams, Let Me Entertain You;
Massive Attack, Teardrops and Skunk Anasie, Selling Jesus. The four pieces in the classical condition
were extracts from Tchaikovsky's Symphony No.6., Mozart's Magic Flute, Bellini's Norma and
Handel's Hallelujah Chorus. The experiment was counter-balanced to ensure that participants listened
to the pieces in different orders and half the participants listened to the music in their home setting
first whilst half listened to the pieces in a lab setting first.
Participants completed a profile of moods scale (POMS; McNair, Lorr, & Dropleman, 1992) before
and after listening to each piece and an aggregate of mood change was generated. The results
demonstrated that, in general, listening to the popular music produced significantly greater changes in
mood in comparison to the classical music (t(29)=2.51, p<.05). The direction of the emotional change
was dependent upon the piece. For example, listening to the Robbie Williams track, 'Let me entertain

you' produced the biggest positive change within the 'vigour' category of the POMS. When comparing
the changes of mood in participants listening to music in the home compared to the lab, participants
listening to music in the home produced significantly greater changes in mood (t(29)=3.79, p<.01). In
cases where there was a significant differences, more extreme effects of the same emotion were
obtained when participants were in the home as opposed to when they were in the lab.
The participants (all undergraduate students) were, in general, more familiar with the pieces in the
popular category and expressed more liking for them than the classical compositions. One of the
purposes of the experiment was to present pre-selected music in order to make comparisons of
emotional responses at a general level and this result demonstrates that participants were more likely
to prefer and be more familiar with popular music. Familiarity with the music is obviously an
influential factor in determining an individual's emotional responses to a piece a music. However in
many studies the popular genre is not presented at all and issues of personal preference and familiarity
are often ignored. The implications of this for results of existing research will be considered further in
the presentation. In terms of the results of this study, participants were more familiar with popular
music, they appeared to like this style of music over the classical and it produced the biggest
emotional changes for the listeners. In addition, listening to the music in their home environment also
produced a bigger emotional change than listening to the music in the laboratory setting. Thus the
listening context is very important to take into consideration when investigating emotional responses
to music, highlighting the importance of environmental and extra-musical factors.
Study Two
In this study participants undergoing surgical operations were invited to select music from their
personal collection to listen to during the post-operative recovery period (approximately 24 hours).
Many authors, from a range of health care professions, advocate a multidisciplinary approach to pain
management that encourages the use of non-pharmacological and non-invasive strategies such as
relaxation, biofeedback and psychological coping techniques (Standley, 1986; 1995). It has been
suggested that music may play a key role in helping to reduce perceptions of pain for patients in
hospital settings (Chesky, Michel and Kondraske, 1996). These findings are in accord with
meta-analyses reported by Standley (1986, 1995). In addition, a recent survey of music therapists
(n=348) by the American National Association of Music Therapy (NAMT) reported that 45% of
respondents used music specifically for pain management with elderly individuals, individuals in a
hospital setting and physically disabled populations (Michel & Chesky, 1996). Gardner and Licklider
(1960) report an empirical study in which listening to music had a significant effect on pain
perception. In their experiment, 25% of the participants (n=1000) reported that, as a result of listening
to music, they did not require analgesia during dental treatment. Herth (1978) also reports reduced
analgesic requirements in hospitalized patients listening to music. Specific physiological effects of
listening to music in clinical setting have been investigated extensively by Spintge (1985), who
reports changes in patients listening to music, specifically, in a number of neuroendocrinological
measures, including blood pressure and plasma levels of noradrenaline. It is suggested that these
changes are indicative of an anxiolytic effect. Koch and Kain (1998) found that patients who listened
to music while undergoing urologic procedures required less intra-operative sedation and analgesia
than participants in a control group.
Although the number of empirical projects investigating the process and outcomes of this type of
intervention is growing, many authors have suggested that there is still a need for rigorous evaluation
studies (MacDonald, O'Donnell and Davies, 1999; Radhakishnan, 1991; Purdie, 1997; Standley,
1992). Such evaluations can help us understand in more detail the specific effects that music listening
can have within clinical and hospital settings. This study is focussed on evaluating the possible

therapeutic effects of music listening. In many of the experimental studies cited above the precise
nature of the music is not specified and music is pre-selected with an assumption that playing this
pre-selected music will be relaxing for all participants. It is important that we have a clearer
knowledge about the types of music chosen and the individual interpretations and evaluations made
by participants. For this study, the music was selected by the participants from their own personal
collections. Thus, we were interested in subjective responses to particular pieces of music and not
responses to pre-selected music that has been chosen for their apparently general relaxing or pain
reducing effects. With these issues in mind, this experiment was designed to investigate whether
music listening could have a beneficial effect for patients undergoing minor operations in terms of
reduced anxiety perceptions.
Participants
There were 17 participants in an experimental group (11 females & 6 males), and 23 participants in a
control group (14 females & 9 males).
Equipment
The Spielberger trait anxiety inventory was used to measure participants levels of anxiety
(Spielberger, 1983). This assessment instrument was a commonly used and well validated measure of
anxiety (Anastasi, 1990).
Procedure
Patients wishing to take part in the study were asked to sign a consent form and randomly assigned to
either the experimental or the control group. Participants in the experimental group were asked to
select an audiocassette(s) from their personal collection to listen to via personal stereo after the
operation. On the day of the operation, baseline measurements were made on all assessment
instruments prior to the operation (Time 1). All participants then underwent minor foot surgery and,
shortly after the operation, measurements were once again taken (Time 2). Participants were again
assessed on the measurement instruments 4 hours after the operation (Time 3). Although patients were
not required to listen to their music for a set period of time, they were encouraged to listen to the
music as much as possible. All patients listened to music for at least 45 minutes during the 4-hour
postoperative assessment phase. Descriptive statistics for the experimental and control group are
presented in Table 1. A 3 (Time1, Time 2, Time 3) X 2 (Control Group, Experimental) ANOVA
produced a significant interaction effect [F(2,76)=65.36,p<.01] highlighting the anxiolyitc effects of
music listening.
Figure 1: Summary of State Anxiety Scores

Interaction effect (Group x Time); F(2,76)=65.36,p<.01

Main effect (Group); F(1,38)=5.82,p=.02
Control Experimental
Variable Mean S.D. Mean S.D.
SSA1 35.34 3.45 34.00 4.89

Time 1
SSA1 35.22 3.52 32.59 4.42

Time 2
SSA1 35.13 3.39 28.82 4.00

Time 3
SSAI - Spielberger State Anxiety Questionnaire

The results of this study offer evidence in support of the efficacy of music for anxiety reduction in
patients undergoing minor surgery. However, of particular interest to the current paper are the
different genres of music selected by the participants. Examples of classical (Vivaldi), opera
(Paveroti), pop (Celine Dion), rock (Metalica), and jazz (Glen Miller) were all selected, however
popular music was chosen by the majority of the participants. Using a broad categorisation system of
Popular, Classical and Jazz, 80% (n=32) selected popular music, 10% (n=4) selected classical and
10% (n=4) selected jazz. Once again this result highlights the pervasive nature of popular music.
One way of interpreting these results is that the participants listened to a range of genres and pieces of
music and these differing music styles all had a general effect of reducing anxiety (see Figure 1). Thus
the subjective aspects of music listening appear to be very important here. Specifically the individual
interpretations that participants made of the music were more important than any general category into
which each piece could be placed. One possibility is that music was selected because it had

associational meaning for the participants, in that the piece of music selected reminded the
participants of happy memories or positive, relaxed calm feelings. Indeed, analysis of qualitative data
obtaining from asking the participants why they selected their chosen music supports this suggestion.
Many participants said that they selected a particular piece of music because it would help them to
relax in the hospital, as it was a piece they particularly enjoyed listening to at home. The wider
implications of these findings for other studies of music listening will be discussed.
Conclusion
Research that investigates the cognitive and emotional effects of music should take into consideration
that popular music is pervasive and in many cases will be the preferred style of music for participants
to listen to. For example, experiment two highlighted that when given the opportunity to select music
for explicitly clinical purposes, i.e. pain and anxiety reduction, participants overwhelmingly chose
popular music, but that whatever music they chose (ie whichever genre they preferred) had beneficial
effects on their perceptions of anxiety. Asking participants about their musical preferences during
these types of study should therefore be an important feature of this type of work. Although studies do
demonstrate that one particular piece of music can have a very specific general effect it is important to
take into account personal preference.
Reductionist interpretations of music cognition that view music as a unitary stimulus that will have
certain general effects at perceptual, cognitive and even neurological levels should take these issues
into consideration. The complex nature of music listening involves not only structural features of the
music but also social, cultural and environmental factors and, while the precise psychological
mechanisms involved in how individuals interpret music remains to be explored in more detail, these
results do emphasis the extra-musical (eg learned and associational) aspects of music listening as
opposed to the intra-musical (eg structural) features.
It is important for researchers to consider the impact of popular culture and popular music. As
previously suggested we are all exposed to these influences and they can be thought of as constituting
an informal learning environments. This exposure as well as experiences with family and friends play
an important role in developing an individual's specific musical preferences and abilities.
Consequently, research can be enriched by including some analysis of these influences in undertaking
studies of music perception.
References
Anastasi, A. (1990). Psychological testing. New York: Maxwell Macmillan.
Aldridge, D. (1996) Music Therapy Research and Practice in medicine: From out of the Silence.
London: Jessica Kingsley.
Chesky K. S., Michel, D. D., & Kondraske, G. V. (1996). Developing methods and techniques for
scientific and medical applications of music vibration. In R. R. Pratt & R. Spintge (Eds.),
MusicMedicine ,vol. 2 (pp. 227-241). Saint Louis: MMB
Frith, S. (1987) The industrialization of Popular music. In J.Lull (ed) Popular Music and
communication (pp53-77), Sage: CA, USA.
Folkestad, G. (1998) Musical Learning as cultural practice as exemplified in computer-based creative
music making. In B. Sundril, G.E. McPherson and G. Folkestad (eds) Children Composing (pp.
97-135), Lunds University: Malmo, Sweden.

Gardener, W. J., & Licklider, J. C. R. (1960). Auditory analgesia in dental operations. The Journal of
the American Dental Association, 59, 1144-1149.
Hargreaves and North (1997). The Social Psychology of Music. London: Oxford University Press
Herth, K. (1978). The therapeutic use of music. Supervisor Nurse, 9, 22-23.
Hodges, D.A. & Haack (1996) The influence of music on human behaviour. In D.A.
Hodges (Ed) Handbook of Music Psychology (pp. 467-557), Texas: IMR.
Koch, M. E., & Kain, Z. N. (1998). The sedative and analgesic sparing effect of music. Anesthesiolgy,
89, 300-306.
MacDonald, R.A.R., O'Donnell, P.J, & J.B., Davies (1999). Structured music workshops for
individuals with learning difficulty: an empirical investigation. Journal of Applied Research in
Intellectual Disabilities 12(3) 225 - 241.
McNair, D. Lorr, M, & Dropleman, L.F. (1992) Profile of Mood States Manual, Educational and
Industrial Testing Service: San Diego.
Mertz, M. (1998) Some thoughts on music education in a global culture. International Journal of
Music Education, 32, 72 -78.
Melzack, R. (1980). Psychological aspects of pain. Pain, 8, 143-145
Meyer (1956) Emotion and Meaning in Music. University of Chicago press: Chicago.
Michel, D. E., & Chesky, K. S. (1996). Music and music vibration for pain relief: standards in
research. In R. R. Pratt & R. Spintge (Eds.), MusicMedicine, vol. 2 (pp. 218-226). Saint Louis: MMB.
Miell, D. & MacDonald, R.A.R. (in press). Children's creative collaborations: The importance of
friendship when working together on a musical composition. Social Development.
North A.C. & Hargreaves, D.J. (1997) Music and consumer behaviour in D.J. Hargreaves and
A.C.North (Eds) The Social Psychology of Music. London: (pp. 268-290) Oxford University Press
Overy, K. (1999) Can music really Improve the mind. Psychology of Music, 26, 97- 103.
Purdie, H. (1997). Music therapy with adults who have traumatic brain injury and stroke.British
Journal of Music Therapy, 11, 45-50.
Radhakishnan, G. (1991). Music therapy - A review. Health Bulletin, 49(3), 195-199.
Rauscher, F. (2000) Musical influences on spatial reasoning: experimental evidence of the Mozart
Effect. Paper presented at the biannual conference of Society for Research in Psychology of Music and
Music Education, Leicester UK.
Radocy, R.E. and Boyle, J.D. (1997) Psychological Foundations of Musical Behaviour, 3rd edition,
Springfield, Illinois: C.C. Thomas
Sloboda, J.A. (1985) The Musical Mind: The Cognitive Psychology of Music. Oxford: Clarendon
press.
Spielberger, C. D. (1983) State trait anxiety inventory. Palo Alto, CA: Consulting Psychologists Press.

Standley, J. M. (1995). Music as a therapeutic intervention in medical and dental treatment: Research
and clinical applications. In T. Wigram, B. Saperston, & R. West, (Eds.), The art and science of music
therapy (pp. 3-22). London: Harwood Academic Publishers.
Standley, J. M. (1992). Meta-analysis of research in music and medical treatments: Effect size as a
basis for comparison across multiple dependent and independent variables. In R. Spintge & R. Droh,
(Eds.), MusicMedicine (pp. 345-349). Saint Louis: MMB.
Standley, J. M. (1986). Music research in medical/dental treatment: Meta analysis and clinical
applications. Journal of Music Therapy, 23, 56-122.
Spintge, R. (1985). Some neuroendocrinological effects of socalled anxiolytic music. International
Journal of Neurology, 19, 186-196.
Vander Ark, S.D. & Mostardi R. (2000) Physiological effects of music on the human organism. Paper
presented at the biannual conference of Society for Research in Psychology of Music and Music
Education, Leicester UK.
Zillman , D. & Gan, S. (1997) Musical taste in adolescence. In D.J. Hargreaves and A.C. North Eds.
(pp. 161-188) The Social Psychology of Music. London: Oxford University Press.
Back to index

TuePM5_2 Brattico3
Proceedings paper
Processing of musical intervals in the central auditory system: An

event-related potential (ERP) study on sensory consonance
Elvira Brattico1, Risto Näätänen1, Tony Verma2, Vesa Välimäki2, & Mari Tervaniemi1
1 Cognitive Brain Research Unit, Department of Psychology, University of Helsinki, Finland
2 Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Espoo,
Finland
INTRODUCTION
In all musical cultures the human auditory system is suggested to be differentially sensitive to certain
musical intervals. It is known that this different response to sounds is in function of frequency ratios
between partials or tones. In particular, since the Greek philosopher Pythagoras, the psychological
phenomenon of consonance has been associated to the simplicity of the frequency ratio between
tones. Helmholtz was the first to give a physiological explanation to that ancient observation. He
argued that two complex tones in a complex frequency ratio produced beats thus provoking the
simultaneous activation of adiacent hair cells of the organ of Corti. According to Helmholtz, the
overload of the input sent to the brain caused the characteristic disturbances on interval sensation
(Helmholtz, 1954; see also Brattico, 1996; Brattico, 1998). More recently, these ideas, and especially
the V-curve of dissonance as a function of the frequency separation of two sine components, were
confirmed almost completely (Plomp & Levelt, 1965; Kameoka & Kuriyagawa, 1969a; Kameoka &
Kuriyagawa, 1969b). In these recent studies, the maximal dissonance was found at 10% frequency
difference in the middle octave with equal SPL, and at 15% frequency difference in the octave below
the middle one for simple tone dyads (Kameoka & Kuriyagawa, 1969a). A different explanation of
the perceptual effect of consonance and dissonance was also made by Helmholtz, and, similarly, but
from a phenomenological point of view, by Stumpf. The first one considered the more pleasantness of
certain intervals or chords as due to their tonal affinity (Klangverwandtschaft), i.e. to the actual
coincidence between partials. On the other side, according to Stumpf (for a review, see Schneider,
1997) the so-called tonal fusion (Verschmelzung) corresponded with the "psychological" identification
of two or more sounds as one (see also Dewitt & Crowder, 1987). A more recent interpretation of the
phenomenon was also suggested where the dissonance was identified with the psychological
complexity of certain chords being in function with their frequency ratio and with the listeners’
file:///g|/Tue/Brattico.htm (1 of 10) [18/07/2000 00:36:05]

TuePM5_2 Brattico3
experience of them (Ayres, Aeschbach, & Walker, 1980).

Studies aimed at finding a neurological explanation of the emotional effects of intervals and chords
showed that specific human brain structures are involved in the processing of sounds in a way which
is not directly in relation to their physical roughness (the presence of beats) (see Wieser & Mazzola,
1986; Blood, Zatorre, Bermudez, & Evans, 1999). For instance, Blood et al. (1999) found that a
specific pattern of brain activity exists for dissonant versus consonant intervals. Dissonant sequences
of sounds seemed to correlate with the activity of the right parahippocampal gyrus and precuneus
regions, while the consonant sequences with the activity of different regions of the frontal cortex, thus
suggesting a specific involvement of areas known to be related to affective processing.
The event-related potential (ERP) method provides the possibility to measure the first electrical
potentials originated in the primary auditory cortex, as well as later cognitive responses to the sounds.
An ERP component termed mismatch negativity (MMN) is an electrophysiological index of sound
change detection and it reflects the specific brain response to the changed physical or abstract features
of an otherwise invariant acoustic environment.
Morphologically, the MMN consists of a wide negativity at the fronto-central electrode sites and of a
correspondent positivity at the mastoids. This potential distribution indicates that the MMN is
generated in the primary auditory cortex (see also intracranial and magnetoencephalographic data of
Hari et al., 1984; Sams, et al., 1985; Alho, et al., 1996; Kropotov, et al., 1995). The existence of
another generator localized in the frontal cortex was first suggested by Näätänen & Michie (1979),
and recently demonstrated by Giard, Perrin, Bertrand, Pernier, & Bouchet (1990) and Rinne, Alho,
Ilmoniemi, Virtanen, & Näätänen (in press).
The MMN is typically elicited when a sequence of frequent sounds called ‘standards’ is randomly
interrupted by an infrequent sound called ‘deviant’. The deviant evoking the MMN can differ from the
standards in one or several physical or abstract sound features (for reviews, see Näätänen, 1992;
Tervaniemi, 1999). The subjects presented with this stimulation are usually employed in a primary
(auditory or visual) task that aims at focusing their attention away from the sounds. Moreover, it has
been demonstrated that the MMN is correlated with the discrimination performance of the subjects as
measured in separate experimental sessions. The faster are the reaction times and the more accurate
the hit rates in discriminating pitch changes (Tiitinen, May, Reinikainen, & Näätänen, 1994) or in
musicality tests (Lang, Nyrke, Ek, Aaltonen, Raimo, & Näätänen, 1990; Tervaniemi, Ilvonen, Karma,
Alho, & Näätänen, 1997), the larger is the MMN amplitude and/or the shorter is its peak latency. In
addition, the degree of perceptual timbre similarity highly correlates with the MMN amplitude
(Toiviainen, Tervaniemi, Louhivuori, Saher, Huotilainen, & Näätänen, 1998): the most dissimilar
rating of a given pair of tones corresponded to the most dissimilar MMN responses to the same tones.
As a previous behavioral study demonstrated (Schellenberg & Trainor, 1996), the perceptual
similarity between two consonant intervals is higher than the similarity between one consonant and
one dissonant interval, even if these last ones are in a larger interval width than the previous ones. In
this respect, the MMN could be a useful measure of perceptual similarities between intervals as a
function of their dissonance or consonance, having also the advantage of not being contaminated by
high-level cognitive processes (like attention or motivation). Moreover, with the MMN paradigm it is
possible to reveal how the different intervals are processed by the brain and to determine whether
there are distinct effects of the interval quality even in the ERPs. Thus, the aim of present experiment
was to study the pre-attentively encoded similarity/dissimilarity between consonant and dissonant
intervals and to examine how those intervals are processed by the brain.
METHODS

TuePM5_2 Brattico3
Twelve paid subjects, unselected with regard to their musical background, were employed. One
subject was rejected because of noisy EEG. Of the remained subjects, 4 were males (age range from
18 to 27; average years 22). Two subjects reported to have had formal music lessons (one in piano for
ten years and another in singing for five years).
Two conditions were used (see FIGURE 1). In the Tritone condition, three blocks were presented
each containing the same standard but a different deviant, each. The first block consisted of a
sequence of standard fifth intervals (g3-d4; interval width: 7 semitones) (p=0.8) interrupted by the
deviant augmented fourth or "tritone" interval (g3-c#4; interval width: 6 semitones) (p=0.2). In the
second block the deviants (same probability) were perfect fourths (g3-c4; interval width: 5 semitones).
In the third block they were major sixths (g3-e4; interval width: 9 semitones). In the Seventh
condition, the standard sounds were major sixths (g3-e4; interval width: 9 semitones) (same
probability as before). The deviants were in each block, respectively, major seventh (g3-f#4, interval
width: 11 semitones), perfect fifth (g3-d4; interval width: 7 semitones), and perfect octave (g3-g4;
interval width: 12 semitones). The choice to present each deviant interval in different blocks was
made in order not to create any harmonic context. Moreover, standards of the same pitch were chosen
with regard to the observation that consonance ratings change in different frequency ranges (see
Kameoka & Kuriyagawa, 1969b). The interval width difference between the consonant standard
intervals and dissonant deviant intervals was one or two semitones smaller than the interval width
difference between the consonant standard intervals and the deviant consonant intervals in both the
Tritone and Seventh conditions.
In both the conditions, piano test sounds from the McGill Master series recordings were employed
(McGill University master samples, Volume 3, McGill University 1988;
http://www.music.mcgill.ca/resources/mums/html/mRecTech.html) (see FIGURE 1). For each note,
the first 350 ms part of the sounds was extracted from the longer recorded signal. To avoid audible
clicks by ensuring a smooth release, the final 10 ms of the 350 ms signal was multiplied with an
exponentially decaying signal. The test signals were then normalized to contain the same energy over
their 350 ms duration. The silent inter-stimulus interval lasted 600 ms. Stimuli were presented to the
subjects at 50 dB above individually determined hearing threshold. In all conditions, the stimuli were
presented by the Brain Stimulator software (designed at the Cognitive Brain Research Unit) and
delivered via headphones binaurally with equal phase. Each block consisted of 1000 trials. The
presentation order of the different blocks was randomized across subjects, keeping always a distance
between blocks where standards and deviants were the same. The specific case occurred between the
block of the Tritone condition, where the standard is a perfect fifth and the deviant is a major sixth,
and the block of the Seventh condition, where the standard and deviant were the same as the previous
one but inverted. The subjects were instructed to ignore the stimuli and to concentrate on watching a
silent movie of their own choice.

TuePM5_2 Brattico3
The electroencephalogram (EEG) (bandpass 0.1 – 30 Hz) was recorded from the fronto-central
electrodes Fz, Cz, Pz, and from L1, L2, R1, and R2 (the one- and two-third sites on the arc connecting

TuePM5_2 Brattico3
Fz with the mastoids on left and right sides). Vertical and horizontal eye movements were monitored
from electrodes lateral and below the right eye. All electrodes were referenced to the one placed on
the tip of the nose. The impedance was kept along the experiment below 10 kΩ at 30 Hz. The epoch
considered for averaging ranged from 100 ms before stimulus onset to 400 ms after stimulus onset.
EEG epochs with EEG or EOG artefacts beyond ± 75 µ V at any electrode were rejected from
averaging. In order to get the subjects used to the sounds and to avoid novelty effects on the first
trials, the first 15 stimulus trials of the first block and the first 10 trials of the other blocks were also
rejected from averaging. Moreover, the stimuli of first block of the experiment were used to measure
the hearing threshold of each subject. ERPs were averaged separately for each stimulus type and
condition and then digitally filtered (passband 1-20 Hz; slope 24 dB/octave). The baseline was taken
from 100 ms before stimulus onset to the attack of the sound. For each subject, the recordings for each
condition contained not less than 110 accepted responses to the deviant stimuli.
The MMN was calculated by subtracting the responses to the standards from the responses to the
deviants. In each condition, MMN peak latencies were determined from 80-220 ms window in the
individual difference waveforms. The MMN mean amplitudes were measured from the 40 ms
windows centered on the individual peaks. The statistical significance of the MMN was evaluated by
comparing the MMN amplitude at Fz, Cz, L1, and R1 electrodes to zero with one-tailed t-tests. In the
same way, the mean amplitudes of the positivity reversal were individually determined at both the
mastoid leads and subsequently their significance was also evaluated.
In order to analyze MMNs with maximized amplitude, ERPs were re-referenced to the average of the
left and right mastoids. MMN individual peak latencies for the re-referenced waves were also
measured from 80-220 ms window. The MMN mean amplitudes of the re-referenced waveforms
were, then, calculated from the 40 ms window centered on the individually determined peaks.
The MMN amplitudes and the MMN peak latencies were compared between the different blocks of
each condition at the Fz, Cz, L1, and R1 electrode sites by two-way ANOVAs with repeated measures
(STATISTICA software). Moreover, in order to study laterality effects, a two-way ANOVA was also
performed for the MMN amplitudes at L1 and R1 electrodes only. The significance levels for F values
from ANOVAs were Greenhouse-Geisser corrected when appropriate. Post-hoc tests were conducted
by Newman-Keuls comparisons.
RESULTS
The frequency change elicited significant MMNs in all conditions. The MMN reached its maximal
amplitude over the frontal area peaking, on the average, at about 150 ms after the stimulus onset. The
MMN amplitudes for all conditions differed significantly from zero at Fz, Cz, L1 and R1 (t = 4.2 -
13.4, p < 0.005; one-tailed t-tests). The reversed potential at the left and right mastoids was
significantly different from zero in all conditions (t = 4.6 - 12.6, p < 0.0005; one-tailed t-tests).
MMN amplitude
In the Tritone condition (see FIGURE 2, left panel), the MMN amplitude differences were analyzed
by a two-way ANOVA with Deviant (tritone, fourth, sixth) and Electrode (Fz, Cz, L1, R1) as factors.
Main effects of Electrode [F(3, 30) = 3.49; p < 0.03] and Deviant x Electrode interaction [F(6, 60) =
2.78; p < 0.02] were found. To study possible effects of laterality, as indicated by the significant
interaction, a two-way ANOVA was also performed with Deviant (the same as before) and Electrode
(L1 and R1) as factors. Also in this case, the interaction between the two factors was significant [F(2,
20) = 4.29; p < 0.03]. A post-hoc Newman-Keuls test showed that the tritone MMN was larger than
both fourth and sixth MMNs only at the R1 electrode (p < 0.003, p < 0.03, respectively), while the

TuePM5_2 Brattico3
three deviants did not differ at the L1 site. Moreover, the tritone MMN was significantly larger at the
right electrode (R1) than on the left one (p < 0.01), while the MMNs to the other deviants did not
show different amplitude between the right and left electrodes.
In the Seventh condition (see FIGURE 2, right panel), the MMN amplitude differences were also
analyzed by a two-way ANOVA with Deviant (seventh, fifth, octave) and Electrode (Fz, Cz, L1, R1)
as factors. Main effects of Deviant [F(2, 20) = 8.53; p < 0.002] and Electrode [F(3, 30) = 3.59); p <
0.03] were found. The Newman-Keuls post-hoc test applied to the main effect of Deviant clarified
that the MMN amplitude for the seventh deviant was significantly larger than the MMN amplitude for
the sixth and octave deviants (p < 0.01, p < 0.002, respectively). In addition, a two-way ANOVA
performed only for L1 and R1 electrodes (with Deviant as the other factor), and the relative post-hoc
Newman-Keuls test showed that the seventh MMN was always larger than the sixth and octave
MMNs at L1 and R1 electrodes (p < 0.002 in all comparisons). In addition, the MMN amplitude at R1
was significantly larger than the MMN at L1 for all deviants (p < 0.02-0.04).
MMN latency
In the Tritone condition (FIGURE 2, left panel, bottom), the MMN latencies at the Fz, Cz, L1, and R1
electrode sites were also analyzed by a two-way ANOVA with Deviant and Electrode as factors. A
main effect of Deviant [F(2, 20) = 6.52; p < 0.007)] was found. A Newman post-hoc test revealed that
the tritone MMN occurred significantly later than the fourth and sixth MMN (respectively, p < 0.02, p
< 0.009), while there was no significant difference between the MMN latency of the fourth and sixth
deviants. On the contrary, in the Seventh condition (FIGURE 2, right panel, bottom), no significant
latency differences were found.

TuePM5_2 Brattico3
DISCUSSION

TuePM5_2 Brattico3
The first objective of the present experiment was to investigate the pre-attentively encoded
similarity/dissimilarity between consonant and dissonant intervals. The second parallel objective was
to analyze the way those intervals are processed by the human auditory system. The present data
indicate that dissonant intervals are processed in a specific way when compared to consonant intervals
(the perfect fifth and the major sixth used as standards). In particular, the seventh as deviant evoked a
larger MMN respect to the fifth and octave deviants when compared to the sixth standard. On the
contrary, the tritone deviant, compared to the fifth standard, did not elicit a larger MMN with respect
to the fourth and sixth deviants but a slower one. The slower processing of that interval is probably
due to its frequency ratio complexity. The enhanced MMN amplitude in the case of the seventh
deviant is related to its larger dissimilarity with the standard, but could be also related to the presence
of beats.
Moreover, the present data suggest that the tritone deviant could be processed more in the right
cerebral hemisphere than the other consonant intervals of the same condition (the fourth and the
sixth), being consistent with the results regarding emotional processing by Blood, Zatorre, Bermudez,
& Evans (1999).
Our findings thus suggest that not only differences on a physical dimension could have generated
different ERP responses to musical intervals. If it was so, one could not explain why the MMN
latency and laterality were different in response to intervals varying only in their pitch width and in
their degree of consonance while all the other parameters were kept constant. Following studies
should, then, focus on the localization of sound combinations differing in the frequency ratios
between them and in the way they fuse, keeping in mind that the processing of intervals, chords or
phonemes does not correspond to the quantitative summation of their components but involves
qualitatively new mechanisms.
ACKNOWLEDGEMENTS
We thank prof. M. Karjalainen, Ms. Kuusi, and Mr. C. Erkut for their support. This study was
supported by the Academy of Finland.
FIGURE LEGENDS
FIGURE 1. Schematic illustration of the stimulation used in this experiment. Left panel, top: Musical
representation of the stimuli used in the Tritone condition. The three bars, representing the three
experimental blocks, contain repeated fifth standard intervals and, respectively, the tritone, fourth, and
fifth deviants. The pause between the sounds represents the silent inter-stimulus interval (ISI) of 600
ms used in the experiment. Left panel, bottom: Physical representation of the stimuli in the Tritone
condition. The first envelope from the top shows the standard and the other envelopes the three
deviants. Right panel, top: Musical representation of the stimuli used in the Seventh condition. The
three bars, representing the three experimental blocks, contain repeated sixth standard intervals and,
respectively, the seventh, fifth, and octave deviants. As in the Tritone condition, the pause between
the sounds represents the silent inter-stimulus interval (ISI) of 600 ms of the experiment. Right panel,
bottom: Physical representation of the stimuli in the Seventh condition. The first envelope from the
top shows the standard and the other envelopes the three deviants.
FIGURE 2. Left panel, top: The MMN difference waves for the three deviants of the Tritone
condition at L1, Fz, and R1 electrodes. Left panel, bottom: MMN mean amplitudes (left) and latency
(right) for the three deviants of the Tritone condition averaged across Fz, Cz, L1, and R1 electrodes.
Right panel, top: MMN to the three deviants of the Seventh condition at L1, Fz, and R1. Right panel,
bottom: MMN mean amplitudes (left) and latency (right) for the three deviants of the Seventh

TuePM5_2 Brattico3
condition averaged across Fz, Cz, L1, and R1 electrodes.

REFERENCES
Alho, K., Tervaniemi, M., Huotilainen, M., Lavikainen, J., Tiitinen, H., Ilmoniemi, R.J., Knuutila, J.,
& Näätänen, R. (1996). Processing of complex sounds in the human auditory cortex as revealed by
magnetic brain responses. Psychophysiology, 33, 369-375.
Ayres, T., Aeschbach, S., & Walker, E.L. (1980). Psychoacoustic and experiential determinants of
tonal consonance. The Journal of Auditory Research, 20, 31-42.
Blood, A.J., Zatorre, R.J., Bermudez, P., & Evans, A.C. (1999). Emotional responses to pleasant and
unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2, 382-387.
Brattico, E. (1996). Un'interpretazione cognitivistica della teoria delle sensazioni sonore di H. von
Helmholtz [A cognitivistic interpretation of the theory of tone sensations by H. von Helmholtz].
Annali della Facoltá di Lettere e Filosofia - Universitá degli Studi di Bari, 39, 211-224.
Brattico, E. (1998). Pitch perception processing according to Helmholtz. In C. Taddei-Ferretti & C.
Musio (Eds.), Downward processes in the perception representation mechanisms (Vol. 6). Singapore
- New Jersey - London: World Scientific, pp. 276-278.
Dewitt, L.A., & Crowder, R.G. (1987). Tonal fusion of consonant musical intervals: The oomph in
Stumpf. Perception & Psychophysics, 41, 73-84.
Hari, R., Hämäläinen, M., Ilmoniemi, R.J., Kaukoranta, E., Reinikainen, K., Salminen, J., Alho, K.,
Näätänen, R., & Sams, M. (1984). Responses of the primary auditory cortex to pitch changes in a
sequence of tone pips: Neuromagnetic recordings in man. Neuroscience Letters, 50, 127-132.
Helmholtz, H. von (1954). On the sensations of tone (Ellis, A. J., Trans.). New York: Dover
Publications.
Kameoka, A. & Kuriyagawa, M. (1969a). Consonance theory Part I: Consonance of dyads. Journal of
the Acoustical Society of America, 45, 1451-1459.
Kameoka, A. & Kuriyagawa, M. (1969b). Consonance theory Part II: Consonance of complex tones
and its calculation method. Journal of the Acoustical Society of America, 45, 1460-1469.
Kropotov, J.D., Näätänen, R., Sevostianov, A.V., Alho, K., Reinikainen, K., & Kropotova, O.V.
(1995). Mismatch negativity to auditory stimulus change recorded directly from the human temporal
cortex. Psychophysiology, 32, 418-422.
Lang, A.H., Nyrke, T., Ek, M., Aaltonen, O., Raimo, I., & Näätänen, R. (1990). Pitch discrimination
performance and auditory event-related potentials. In C.H.M. Brunia, A.W.K. Gaillard, & A. Kok
(Eds.), Psychophysiological Brain Research (Vol. 1). Tilburg: Tilburg University Press, pp. 294-298.
Näätänen, R. (1992). Attention and brain function. Hillsdale (N.J.): Erlbaum.
Plomp, R. & Levelt, J.M. (1965). Tonal consonance and critical bandwidth. Journal of the Acoustical
Society of America, 38, pp. 548-560.
Rinne, T., Alho, K., Ilmoniemi, R.J., Virtanen, J., & Näätänen, R. (in press). Separate time behaviors
of the temporal and frontal MMN sources. NeuroImage.
Sams, M., Hämäläinen, M., Antervo, A., Kaukoranta, E., Reinikainen, K., & Hari, R. (1985). Cerebral

TuePM5_2 Brattico3
neuromagnetic responses evoked by short auditory stimuli. Electroencephalography and Clinical

Neurophysiology, 61, 254-266.
Schellenberg, E.G., & Trainor, L.J. (1996). Sensory consonance and the perceptual similarity of
complex-tone harmonic intervals: Tests of adult and infant listeners. Journal of the Acoustical Society
of America, 100, 3321-3328.
Schneider, A. (1997). "Verschmelzung", tonal fusion, and consonance: Carl Stumpf revisited. In M.
Leman (Ed.), Music, Gestalt, and Computing. Berlin - Heidelberg - New York: Springer-Verlag, pp.
117-143.
Tervaniemi, M. (1999). Pre-attentive processing of musical information in the human brain. Journal
of New Music Research, 28, 237-245.
Tervaniemi, M., Ilvonen, T., Karma, K., Alho, K., & Näätänen, R. (1997). The musical brain: Brain
waves reveal the neurophysiological basis of musicality in human subjects. Neuroscience Letters, 226,
1-4.
Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans
is governed by pre-attentive sensory memory. Nature, 372, 90-92.
Toiviainen, P., Tervaniemi, M., Louhivuori, J., Huotilainen, M., Saher, M., & Näätänen, R. (1998).
Musical timbre: Convergence of neural, behavioral, and computational approaches. Music Perception,
16, 223-241.
Wieser, H.-G. & Mazzola, G. (1986). Musical consonances and dissonances: Are they distinguished
independently by the right and left hippocampi? Neuropsychologia, 24(6), 805-812.
Back to index

The Memetics of Music
Proceedings paper
The Memetics of Music and

Its Implications for Psychology
Steven Jan
School of Academic Studies, Royal Northern College of Music, 124 Oxford Road, Manchester M13 9RD, United Kingdom.
E-mail: steven.jan@rncm.ac.uk
1. Introduction: Universal Darwinism and The Memetic Paradigm

In recent years Richard Dawkins has written some splendid things about universal Darwinism. But his message has been the more restricted one that
wherever in the universe life has evolved, it has done so by the processes of Darwinian evolution....in addition to biological evolution as it is normally
understood, Darwinian evolution is also operating to produce the transformations in time that we see in certain other spheres, such as immune system
function and even the way science itself [as a component of culture generally] operates.
(Plotkin 1995: xvii)
At its most radical, the principle of universal Darwinism maintains that some of the most remarkable and powerful things in the
universe are a class of entities Dawkins calls replicators, which he defines as "...anything in the universe of which copies are made.
Examples are a DNA molecule, and a sheet of paper that is xeroxed" (1983: 83). These, he contends, are the fundamental units of
selection in a universe-wide process of Darwinian evolution. A class of such entities is the meme, the subject of the present paper,
defined by Dawkins as a "...unit of cultural transmission...a unit of imitation" (1989: 192).
As a means of understanding the nature of culture, the memetic paradigm has received increasingly serious attention in recent years,
despite attempts from its inception to downgrade it to the status of a "meaningless metaphor" (Stephen Jay Gould, in Blackmore
1999: 17). From the ranks of its advocates, Derek Gatherer has warned of the danger of memetics' becoming "...merely a
meta-narrative having no more right to call itself scientific than dialectics..." (1997: 83). Despite vocal criticism, increasingly serious
attention has been paid to the theory, culminating in three recent book-length studies (Brodie 1996, Lynch 1996, Blackmore 1999),
and the inception, in 1997, of a dedicated, online Journal of Memetics.
If one accepts the validity of the memetic paradigm-that human culture is an ecology of independent particulate entities which
optimize their chances of survival to the degree that they maximize their tendency to imitation-then it is reasonable to attempt to
apply it to music. After all, music is a stream of sound information which, in its generation and perception, is segmented into
discrete, particulate units. Moreover, a memetic perspective on music would draw heavily upon psychology, for our innate perceptual
and cognitive competencies are part of the long-term environment of the meme, and our neural structures their fundamental physical
incarnation.
While constraints of space prevent a detailed exposition of what a memetics of music might constitute, I hope to provide here an
outline of its main premises, with reference to some of the principal concerns of music psychology. In particular, I shall consider how
our innate perceptual and cognitive attributes affect the meme, and how hierarchical aspects of musical structure and perception
relate to the claims of memetics.
2. The Ontology of The Meme

Memetics distinguishes between the physical manifestations of the meme and the psychological, and ultimately neurological,
structures from which these arise. Adapting the dualism from biology, the phemotype (from the biological phenotype) is the term
applied to the behaviours generated by a meme, and the extrasomatic artefacts these behaviours give rise to; and the memotype (from
the biological genotype) refers to the somatic psychological and neural structures which engender the phemotype (Blackmore 1999:
63-66). At times in this paper I will discuss memes in terms of their phemotypes, namely the printed score and its resultant sound
image; elsewhere, however, when discussing the mental representation of memes, issues of memotypic organization will prevail.
3. Memetic Segmentation and Coequality

The verification of a musical particle as memetic requires it to be matched against an analogous, coequal unit in a different context.
This process is also one of segmentation, for it results in the division of the music into discrete units each with a clear initial and
terminal pitch. Such units generally accord with our natural perceptual and cognitive articulation of the sound stream.
1. Gestalt/Implication-Realization Segmentation
file:///g|/Tue/Jan.htm (1 of 9) [18/07/2000 00:36:10]

As an element of the act of cognition, we subject music to the operation of segmentation, dividing the stream of sound information into
discrete units in order to facilitate processing. Our perceptual and cognitive faculties are attuned to obvious points of articulation, such as
pauses, cadences, and changes in material. More fundamentally, however, our comprehension of patterning is controlled by attributes
identified by the Gestalt tradition of psychology. Deutsch observes that
...we group elements into configurations on the basis of various simple rules.... One is proximity: closer elements are grouped together in preference
to those that are spaced further apart.... Another is similarity.... A third, good continuation, states that elements that follow each other in a given
direction are perceptually linked together.... A fourth, common fate, states that elements that change in the same way are perceptually linked
together. As a fifth principle, we tend to form groupings so as to perceive configurations that are familiar to us.
(1999: 300)
Perhaps the most thorough application of such pattern-perception principles to musical analysis is Eugene Narmour's
implication-realization model (1977, 1989, 1990, 1992, 1999), which draws strongly on the Gestalt-inspired groundwork established by
Leonard Meyer (1956, 1973, 1989). The implication-realization model offers means of tracking the note-to-note implicative flux of a
melody and identifying its points of procession and closure at various hierarchical levels. Narmour notes that
...the separate registral and intervallic aspects of small intervals...are said to be implicatively governed from the bottom up by the Gestalt laws of
similarity, proximity, and common direction.... As perceptual-theoretical constants, what is important to notice about the invocation of such Gestalt
laws is (1) that they have been shown to be highly resistant to learning and thus may be innate...; (2) unlike the notoriously interpretive, holistically
supersummative, top-down Gestalt laws of "good" continuation, "good" figure, and "best" organization...the Gestalt laws of similarity, proximity,
and common direction are measurable, formalizable, and thus open to empirical testing....
(1989: 46-47)
The following example illustrates the relationship between grouping principles and memetic replication. The opening of Example 1 i, from
the Act I finale of Mozart's Die Zauberflöte, is replicated at the opening of Schubert's song "Heidenröslein," Example 1 ii. In particular,
Schubert imitates the pitch segment marked x, allowing us to regard it as a meme. According to Narmour's segmentation principles, the
minim a2 in the second bar of each melody is a point of articulation, for
...incisive points of melodic closure creating pitch groupings take place when in the parameter of duration a short note moves to a long note...; we
may mark such durationally cumulative places analytically with the symbol (d) over the closed melodic note, the "d" standing for durational
interference in the continuation of the melodic line....
(1989: 45)
Given that the segments are identical, it follows that the implication-realization closural structure of the meme is unchanged. It consists of
three melodic-implicative units: D (reiteration of a pitch), IP (realization of implied intervallic similarity but denial of implied registral
direction), and P (realization of both implied intervallic similarity and registral direction) (for a full explanation of the meaning of these
symbols, see Narmour 1989, 1990, 1992):

Assuming direct imitation by Schubert, it is unlikely for the later composer to have perceived a segmentation of the first four bars such as
that shown in Example 1 iv, which conflicts strongly with the concept of durational interference; the segmentation given in Example 1 iii
would clearly have had more cognitive reality for Schubert. It will be understood from this simple example that the gene-meme interface is
always significant. To say that a meme will prosper if it takes advantage of the perceptual and cognitive environment provided by genes
(the hierarchical level of laws, discussed in Section 4.1 below) is insufficient; a meme is in large part defined by the template of that
environment.
1. Coequality
As is seen in Example 1 i and ii above, the identification of constituent memes in music is based upon the principle of coequality-the
presence of an overlapping string of data which allows the initial and terminal pitches and medial content of the meme, in both contexts, to
be defined by reference to that segment which is copied. Without the presence of a coequal, a particle would not be a meme; it would be, in
Lynch's terminology, a mnemon-"[a]n item of brain-stored memory. When copied from one brain to another, it becomes a meme" (1998).
It is clear from this example that the longer an imitated passage, the greater the statistical probability the later copy is a conscious
(self-)quotation. Conversely, many very short coequals, of three notes or more, may be so anonymous as to be hard to situate in any nexus
of imitation, in the absence of what Nattiez terms strong poietic evidence (1990: 10-16). These particles exist as the common currency of a
style, to which all practitioners of a given period and location had access. In western tonal music such patterns-style forms, in Narmour's
term, discussed in Section 4.1 below-include basic scale-degree progressions, such as the pattern Þ 3-Þ 2-Þ 1, and simple harmonic
progressions, such as the cadential sequence ii63-V-I. This continuum of imitative relationships is represented below:
Figure 1: The Memetic Continuum
As with the gene, the longer and more complex the meme, the more susceptible it is to fragmentation and miscopying upon replication. It
follows from this that long memes have lower copying-fidelity than short, yet perhaps have higher psychological salience; by contrast,
memes which are too short, perhaps of fewer than three or four elements, lack the necessary prominence which ensures their fecundity
(Dawkins 1989: 18, 194).
1. Memetic Hierarchies
The meme exists as part of a rich complex of hierarchic levels which operate in two basic dimensions, cultural and structural. While
the first dimension is the province of the historian and style analyst, the second is the domain of the music theorist; indeed it is
perhaps true to say that over the last century the central preoccupation of music theory has been to model the internal hierarchical

structure of music. Since the 1950s, systems conceived in accordance with the concerns of psychology have been increasingly
sought.
1. Cultural Hierarchies
Meyer's definition of style, as "...a replication of patterning, whether in human behaviour or in the artefacts produced by
human behaviour, that results from a series of choices made within some set of constraints" (1989: 3), is eminently consonant
with the memetic paradigm. Such propagation results in the formation of several cultural strata, represented in the following
diagram:
Figure 2: Cultural Hierarchies (after Meyer 1989: 13-24)
Depending upon its intrinsic psychological salience, a nascent meme arising at the level of intraopus style may eventually
come to be propagated, via the level of idiom (the style of a composer), at the level of dialect-the meme becomes part of the
compositional repertoire of all the composers of a given chronological or geographical locus. The implication-realization
tradition defines the basic uniparametric patterns which populate the dialect as style forms; in their specific contexts-i.e., at the
intraopus level-style forms exist as syntactic, multiparametric style structures. If a meme is propagated in several dialects, it
will contribute to the structure of the system of rules ultimately mediating the replication of all memes at lower hierarchic
levels. Beyond this, however, the level of laws is governed by innate (i.e., genetically-determined) attributes of perception and
cognition, such as the Gestalt principles examined in Section 3.1 above.
The movement of a meme outwards from the centre of Figure 2 is described in epidemiological terms by those
commentators-starting with Juan Delius (1986)-who see memetics as the study of "thought contagion" (Lynch 1996) by
"viruses of the mind" (Brodie 1996). The infectivity-or cultural fitness (Cavalli-Sforza and Feldman 1981: 17)-of a meme is an
index of its intrinsic appeal to the environment of a brain, which is circumscribed both by innate perceptual and cognitive
attributes, and by the receptivity to incursion of the complement of memes already encoded therein.
2. Structural Hierarchies
Memes exist within a work at hierarchic levels other than the immediate foreground; they are generated at higher structural levels by
memes at lower levels. While perhaps the most obvious means of conceptualizing such intraopus hierarchies, the Schenkerian model
has been justly criticized for its axiomatic imbalance-the a priori generation of every structure recursively downwards from the
Ursatz-which is at odds with the interplay between the "bottom-up" and "top-down" operations which characterize musical
processing. By contrast, the implication-realization model discussed in Section 3.1 above is one system which, while capable of
accounting for hierarchical structures, attempts to reconcile top-down and bottom-up mechanisms by taking account of lower-level
procession and closure. Moreover, it lends itself well to mapping memetic replication at various strata within a work.
One fundamental unit in intraopus hierarchies is the style structure (see Section 4.1 above), which might be understood as
...a kind of "theme" that listeners implicatively map from the top down onto incoming foreground "variations." They hear different melodic variants on
lower levels as creating similar structural-tone "themes" on higher levels.
(Narmour 1999: 444)
The following diagram represents in abstract fashion how a style structure consists of a constellation of pitch structures, each of
which is demarcated by Gestalt closural principles. These generate (bottom-up), and are perceived in terms of (top-down), a
middleground "theme" at the ° . level:
Figure 3: Implication-Realization Hierarchies (after Narmour 1999: Figure 2)

Note: Dotted vertical lines represent transformation of initial and terminal pitches of lower-level groupings to higher levels. Brackets above units are
implication-realization spans, as in Example 1 i and ii above. The diagram simplifies the relationship between units for, as in Example 1, terminal pitches
of one unit may simultaneously function as the initial pitch of another.
Memes may assemble to form confederations termed coadapted meme-complexes, or memeplexes (Blackmore 1999: 19). If it is
accepted that the four foreground-level memes which generate the first structure at the ± level in Figure 3 might theoretically occur
in another context, then while existing as a ± level meme (it is memetic at this level because the same ± level structure occurs in
these two hypothetical contexts), it also exists as a replicated complex of patterns at the foreground level. Each individual foreground
pattern is a meme (it exists in this form in these two contexts, and may exist independently in other contexts), but each complex is
also memetic. Furthermore-applying the same logic recursively to higher levels-if it is accepted that the two ± level memes which
generate the ° . level structure in Figure 3 might theoretically occur in another context, then while existing as a ° . level meme (it is
memetic at this level because the same ° . level structure occurs in these two hypothetical contexts), it also exists as a replicated
complex of patterns at the ± level.
Clearly real units at the foreground level generate virtual configurations at higher structural levels. It may be the case that the same
pattern is replicated at more than one level in a work; or, in different works, the same pattern may be propagated not at their
foregrounds, but at higher levels. In this second case, on a strict definition, these are not units of direct imitation, but they are units of
consequential replication and are therefore memetic. The structure of such higher-level memes is potentially instructive for what it
can tell us about the conglomerative grammar of foreground orientated memes, which is ultimately a function of their initial and
terminal nodes, pitch content, and the way these elements interact with our cognitive attributes.
2. Psychological Representation: The Meme as Cognitive Schema

Robert Gjerdingen sees the style structure (see Sections 4.1 and 4.2 above) as a category of cognitive schema, its constituent style
forms being akin to generative lower-level features (1988: 45-46). In the following example, a "Þ 1-Þ 7...Þ 4-Þ 3 changing-note"
schema is shown-adopting Gjerdingen's symbology-enclosed in square brackets, its initial and terminal events being demarcated by
canted brackets. Component features include the Þ 1-Þ 7...Þ 4-Þ 3 upper line (the "theme"), the Þ 1-Þ 2...Þ 7-Þ 1 lower line (scale
degrees being represented by circled numerals), and the I-V43...V65-I harmonic pattern (1988: 63-67):
Example 2: The Þ 1-Þ 7...Þ 4-Þ 3 Changing-Note Schema: after Haydn, Minuet from Divertimento in C major Hob. XIV: 10
(c. 1760), bb. 1-4
In perception, bottom-up processes at the opening of this phrase will first identify the component features of the initial event, but
without at this stage comprehending their broader context. At some point within the initial event, the cumulative evidence of the

features will elicit the selection of the Þ 1-Þ 7...Þ 4-Þ 3 schema, which then regulates, by top-down perceptual processes, the search
for the remaining features. The unit is deemed closed-instantiated-when all those expected features comprising the terminal event are
registered.
In generation, a middleground schematic meme is engendered by the bottom-up tendency of a set of foreground feature memes to
conglomerate in a particular sequence-to form a memeplex (see Section 4.2 above). From a top-down perspective, the middleground
meme has a number of loci along its length, analogous to those on a chromosome, for which allelic feature memes conforming to the
parameters of the class of features for that locus compete. The third position of the Þ 1-Þ 7...Þ 4-Þ 3 schema, for instance, may only
be filled by memes expressing the harmony V65, the scale degree Þ 4, and conformant in shape to the meme in first position.
From this one might infer that such cognitive schemata create selection pressure in favour of memetic conformance. In Example 2,
for instance, the bass pattern c1-g1-f1-e1-d1 in bb. 1-2 and the corresponding b-f1-e1-d1-c1 in bb. 3-4 (together with their associated
rhythmic meme, ³ 1/4 ± . Ä Ö µ | ° £ ), while different memes (because of their dissimilar scale-degree orientation and
contrasting internal intervallic structure), may achieve comparable population sizes in the dialect because of their membership of a
schema which favours parallelism between its initial and terminal melodic events.
3. Memetic Mutation and Stylistic Evolution

The process of natural selection driving universal Darwinism (Section 1 above) is, Daniel Dennett explains, "substrate neutral," and
requires only the following three conditions:
(1) variation: there is a continuing abundance of different elements[;] (2) heredity or replication: the elements have the capacity to create copies or replicas
of themselves[;] (3) differential "fitness": the number of copies of an element that are created in a given time varies, depending on interactions between the
features of that element and features of the environment in which it persists[.]
(1995: 343)
In memetic terms, the variation (mutation) of memes creates the required "abundance"; the replication of memes is, of course, one of
their defining attributes, furnishing the necessary "heredity"; and it is the case that novel (mutant) memes are more likely to be
imitated than those already established in the meme pool, imparting to them a higher "differential fitness."
If a meme is perceived as a variant of an existing form, this deviation from the antecedent configuration may aid its differential
fitness. This may be because the mutation has the effect of increasing its implicative energy, which might in turn have the effect of
raising its cultural fitness. For instance, meme x from Example 1 i and ii above might be mutated by changing the closing a2 of its
second bar to e2. The resulting terminal IP structure (replacing the original P) is less closed-Narmour regards the IP shape as an
example of "partial denial" of implications (1989: 48)-and this characteristic arguably makes the mutant meme more psychologically
salient, and therefore more fecund, than its antecedent:
Example 3: Memetic Mutation, after Mozart: Die Zauberflöte K. 620 (1791) no. 8, bb. 327-328
If it is the case that such changes affect the propensity of memes to imitation, then the meme is clearly an active replicator, defined
by Dawkins as
...any replicator whose nature has some influence over its probability of being copied. For example a DNA molecule, via protein synthesis, exerts
phenotypic effects which influence whether it is copied: this is what natural selection is all about. A passive replicator is a replicator whose nature has no
influence over its probability of being copied. A xeroxed sheet of paper at first sight seems to be an example, but some might argue that its nature does
influence whether it is copied, and therefore that it is active: humans are more likely to xerox some sheets of paper than others, because of what is written
on them, and these copies are, in their turn, relatively likely to be copied again.
(1983: 83)
Such changes at the level of the individual meme have a cumulative effect which eventually leads to changes at higher hierarchic
levels. The Þ 1-Þ 7...Þ 4-Þ 3 schema discussed in Section 5 above, for instance, gradually declined in population density in the
early-nineteenth century because of the consequences of changes to its constituent features (see the population distribution curve in
Gjerdingen 1988: 263, which gives its apogee as c. 1773). As these components changed, so did the resultant middleground style
structure; although the distinction is fuzzy and ultimately subjective, beyond a certain point of alteration a style structure-indeed any
meme-is no longer the same pattern mutated, but a different pattern.

4. Selfish Memes and Human Consciousness

As an epidemiology of culture (see Section 4.1 above), memetics regards the meme as an infective agent, selfishly existing in a
sometimes mutualist, sometimes commensal, but often parasitic relationship with us, its hosts. We are mere vessels, just as the body
is the vessel for the near-immortal line of genes which engender it. Taking this argument one step further, Dennett adopts the
language of artificial intelligence in claiming that
Human consciousness is itself a huge complex of memes (or more exactly, meme-effects in brains) that can best be understood as the operation of a [serial]
virtual machine implemented in the parallel architecture of a brain that was not designed for any such activities. The powers of this virtual machine vastly
enhance the underlying powers of the organic hardware on which it runs, but at the same time many of its most curious features, and especially its
limitations, can be explained as the byproducts of the kludges [i.e., ad hoc software repairs] that make possible this curious but effective reuse of an
existing organ for novel purposes.
(1993: 210; his emphases)
Ultimately, however, memetics even undermines the notion of a unitary conscious self, seeing this as merely a memeplex, albeit a
large and sophisticated one; Blackmore, the leading advocate of this argument, speaks provocatively of
...the most insidious and pervasive memeplex of all....the 'selfplex.' The selfplex permeates all our experience and all our thinking so that we are unable to
see it clearly for what it is-a bunch of memes.
(1999: 231)
This interpretation clearly challenges received conceptions of the creative process in music. Where traditionally the composer is seen
as in full conscious control of the creation of a work, many accounts given by composers themselves suggest the validity of a
memetic analysis. Despite the variable authenticity of some such remarks, sufficient consensus exists among them to suggest that the
composer is
...not so much conscious of his ideas as possessed by them. Very often he is unaware of his exact processes of thought till he is through with them;
extremely often the completed work is incomprehensible to him immediately after it is finished.
(Roger Sessions, in Sloboda 1996: 115)
In the ontogeny of a musical work, memes rage within the composer's selfplex, gradually conglomerating memotypically to engender
the finished structure in its phemotypic incarnation. Adopting Dennett's computer analogy, the "greatness" of the product is a
function of the composer's memory capacity, neuropsychological processing power, and the richness of the environment-the
complexity of the software-to which he or she is exposed.
5. Conclusion: Memetics, Musicology, and Psychology

Some recent literature in memetics has attempted to draw an analogy between the position of evolutionary biology in the 1850s and
the status of memetics today (Blackmore 1999: 56-58; Dennett 1995: 352-360). Accepting the phenotypic evidence of evolution by
natural selection but lacking the knowledge of genetics to understand the genotypic mechanism, biology started to make substantial
progress as a result of the New Synthesis of the 1930s. Similarly, to many commentators memetics offers a theory of cultural
transmission and evolution in accordance with the phemotypic evidence, but one as yet incompletely supported by knowledge of the
memotypic mechanisms of information storage.
Indeed, it the case that if memetics is to avoid becoming, in the words of Gatherer cited in Section 1 above, "merely a
meta-narrative," it needs to address not just phemotypic issues of pattern imitation and coequality, but also memotypic questions
concerning the psychological representation of musical patterning and the neurological encoding of musical information-and the
complex matter of how these two realms interrelate (see Section 2 above). While the neurophysiology of the brain is, of course,
incompletely understood, neural net models-which propose that musical patterns are coded for by chronotopic (durational),
level-topic (hierarchic), and tonotopic (pitch) neurons (Narmour 1999: 460-464)-offer a means of understanding the nature and
functioning of memes at this most fundamental level. Bharucha notes that such models
...can account for how we learn musical patterns through exposure....their assumptions are either known or plausible principles of neuroscience....they shed
light on the observation...that aspects of pitch and harmony involve the mental completion (or Gestalt perception) of patterns....they are capable of
recognizing varying shades of similarity and are therefore well suited to modeling similarity-based accounts...of tonality or modality....[and] they can
discover regularities in musical styles that may elude formal music-theoretic analysis....[Moreover]...a neural net can learn temporal composite patterns so
that they function as schemas and as sequential memories.
(1999: 413, 424)
In conclusion, despite these challenges, and despite the reservations some might have to the controversial issues noted in Section 7
above, there is much to commend the memetic paradigm as relevant to the concerns of musicology and music psychology. Firstly,
given that musical analysis is "...the resolution of a musical structure into relatively simpler constituent elements, and the
investigation of the functions of those elements within that structure" (Bent and Drabkin 1987: 1), it is legitimate to attempt to cleave
the structure at perceptual/cognitive-imitative joints, for these articulations, as noted in Section 3.1 above, have strong psychological
realty for composers and listeners. Secondly, the memetic perspective is fully concordant with the synchronic view of music as a
multileveled hierarchic structure and the diachronic view of music as a timeline of imitative connection manifesting change over
time. Thirdly, there is clearly much common ground between musicology, music psychology, and memetics. The area of overlap
between the three disciplines clearly has the potential to be a place of fruitful interdisciplinary collaboration.
© Steven Jan, May 2000.

6. References
Bharucha, J.J. (1999) Neural nets, temporal composites, and tonality. In D. Deutsch (Ed.). The
Psychology of Music (Academic Press Series in Cognition and Perception), 2nd edn. San Diego,
Academic Press. pp 413-440.
Blackmore, S.J. (1999) The Meme Machine. Oxford, Oxford University Press.
Brodie, R. (1996) Virus of the Mind: The New Science of the Meme. Seattle, Integral Press.
Cavalli-Sforza, L.L. and Feldman, M.W. (1981) Cultural Transmission and Evolution: A
Quantitative Approach (Monographs in Population Biology, no. 16). Princeton, Princeton
University Press.
Dawkins, R. (1983) The Extended Phenotype: The Long Reach of the Gene. Oxford, Oxford
University Press.
Dawkins, R. (1989) The Selfish Gene. Oxford, Oxford University Press.

Delius, J.D. (1989) Of mind memes and brain bugs: a natural history of culture. In W.A. Koch
(Ed.). The Nature of Culture: Proceedings of the International and Interdisciplinary Symposium,
October 7-11, 1986 in Bochum. Bochum, Brockmeyer. pp 26-79.
Dennett, D.C. (1993) Consciousness Explained. London, Penguin Books.

Dennett, D.C. (1995) Darwin's Dangerous Idea: Evolution and the Meanings of Life. London,
Penguin Books.
Deutsch, D. (1999) Grouping mechanisms in music. In D. Deutsch (Ed.). The Psychology of

Music, 2nd edn. San Diego, Academic Press. pp 299-348.
Gatherer, D. (1997) The evolution of music: a comparison of Darwinian and dialectical methods.
Journal of Social and Evolutionary Systems, 20/1, 75-92.
Gjerdingen, R.O. (1988) A Classic Turn of Phrase: Music and the Psychology of Convention.
Philadelphia, University of Pennsylvania Press.
Hewlett, W.B. and Selfridge-Field, E. (Eds.) (1998) Melodic Similarity: Concepts, Procedures,
and Applications (Computing in Musicology, no. 11). Cambridge, MA, MIT Press.
Jan, S.B. (2000a) Replicating sonorities: towards a memetics of music. Journal of

Memetics-Evolutionary Models of Information Transmission 4/1.
<http://www.cpm.mmu.ac.uk/jom-emit/2000/vol4/jan_s.html>
Jan, S.B. (2000b) The selfish meme: particularity, replication, and evolution in musical style.
International Journal of Musicology 8 (in press).
Jan, S.B. (2002) The illusory Mozart: selfish memes in the priests' marches from Idomeneo and
Die Zauberflöte. International Journal of Musicology 10 (forthcoming).
Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, MA,
MIT Press.
Lynch, A. (1996) Thought Contagion: How Belief Spreads Through Society-The New Science of
Memes. New York, Basic Books.
Lynch, A. (1998) Mnemon 1998a: Y2K Memes (Issue 1).
<http://www.mcs.net/~aaron/Mnemon1998a.html>
Meyer, L.B. (1956) Emotion and Meaning in Music. Chicago, University of Chicago Press.
Meyer, L.B. (1973) Explaining Music: Essays and Explorations. Chicago, University of Chicago
Press.
Meyer, L.B. (1989) Style and Music: Theory, History, and Ideology. Philadelphia, University of
Pennsylvania Press.
Narmour, E. (1977) Beyond Schenkerism: The Need for Alternatives in Music Analysis. Chicago,
University of Chicago Press.

Narmour, E. (1989) The 'genetic code' of melody: cognitive structures generated by the
implication-realization model. In S. McAdams and I. Deliège (Eds.). Music and The Cognitive
Sciences. London, Harwood. pp 45-63.
Narmour, E. (1990) The Analysis and Cognition of Basic Melodic Structures: The
Implication-Realization Model. Chicago, University of Chicago Press.
Narmour, E. (1992) The Analysis and Cognition of Melodic Complexity: The
Implication-Realization Model. Chicago, University of Chicago Press.
Narmour, E. (1999) Hierarchical expectation and musical style. In D. Deutsch (Ed.). The
Psychology of Music (Academic Press Series in Cognition and Perception), 2nd edn. San Diego,
Academic Press. pp 441-472.
Nattiez, J.-J. (1990) Music and Discourse: Toward a Semiology of Music. Tr. C. Abbate.
Princeton, Princeton University Press.
Plotkin, H.C. (1995) Darwin Machines and the Nature of Knowledge: Concerning Adaptations,
Instinct and the Evolution of Intelligence. London, Penguin Books.
Sloboda, J.A. (1996) The Musical Mind: The Cognitive Psychology of Music (Oxford Psychology
Series, no. 5). Oxford, Clarendon Press.
Back to index

TuePM1_3 Costa-Giomi
Proceedings paper
The relationship between absolute pitch and spatial abilities

Absolute pitch, the ability to label pitch without references to other sounds, is rare even among musicians (e.g.,
Revesz, 1953). Studies on the neural basis of this ability found that absolute pitch possessors differ from
nonmusicians in certain structural and functional characteristics of the brain (Zatorre et al., 1998; Schlaug et al.,
1995). It is known that most absolute pitch possessors start music instruction by age seven and that it is unlikely
that absolute pitch will develop if music instruction is initiated after this age (e.g., Sergent, 1969). Whether
absolute pitch is an inherited trait that becomes noticeable after early musical training or a skill that is acquired
through early participation in music instruction is debatable (Baharloo et al., 1998). However, the important role
that early musical instruction plays in the development of this ability is unquestionable.
Early musical instruction has been associated with the development of spatial abilities. Research conducted with
young children showed that their performance in the Object Assembly Scale of the Weschler Intelligence Scale
for Children improved as a result of providing them music instruction (Gromko & Poorman, 1998; Rauscher et
al., 1997). On the other hand, studies that provided 6- to 9-years olds with music instruction did not find such
improvement or found them to be only temporary (Costa-Giomi, 1999; Hurwitz et al., 1975; Persellin, 2000).
Because most absolute pitch possessors receive music instruction from an early age and early musical training
improves performance in specific spatial tasks, it is possible that absolute pitch possessors' spatial abilities are
superior to those of relative pitch possessors and nonmusicians who never had formal musical training or
received it later in life. The present study investigated this possibility by comparing the spatial abilities of
absolute pitch possessors, relative pitch possessors, and nonmusicians.
Method
A total of 31 musicians and 14 nonmusicians volunteered to participate in the study. The age of the subjects
ranged from 17 to 59 years (M = 24.5). Musicians had no formal musical training and had not participated in any
type of musical ensemble such as choir or band. Musicians had received 10 to 27 years of music instruction (M =
15.5 years), and participated in a musical ensemble from 1 to 26 years (M = 10 years).
Participants were asked to complete a questionnaire, a set of spatial abilities tests including The Hidden Figure
Test, and two subtests of the Performance part of the Multidimensional Aptitude Battery, Form L: The Spatial
and Object Assembly subtests. The Hidden Figure test requires subjects to find one of five geometrical figures
shown at the top of the answer sheets in a small area filled with lines. Only one of the geometrical figures appears
in the original position and size in each of the 32 items of the test. In the Spatial subtest, a small figure is
presented next to five variations of it. These variations depict a rotation of the original figure or rotations of
mirrored images of it (a flipped-over image). Subjects are required to identify the rotation of the original figure.
In the Object Assembly subtest, subjects are asked to put four or five pieces of an object in the proper order and
list the ordered labels (i.e., letters) of the pieces.
The questionnaire gathered information about the subjects' musical background and other demographic
characteristics such as age, sex, handedness, and known disabilities. It also included a question related to their
ability to identify pitch in isolation, that is, without seeking reference to other musical sounds. Subjects who
reported that they were in general unable to identify pitch in isolation, were considered to have relative pitch.
Those who indicated that they could identify pitch in isolation regardless of its timbre, register, or other factors
(e.g., time of day) or with different degree of accuracy depending on the tone's timbre, register, or other factors,
completed an absolute pitch test individually. The test , an adaptation of the one used by Zatorre et al. (1998),
consisted of 100 tones between F#3 and C#5, played in random order with a synthesized timbre unfamiliar to the
subjects (see Zatorre et al,, 1998), and recorded onto a CD. Tones were played for one second followed by a
file:///g|/Tue/CostaGio.htm (1 of 5) [18/07/2000 00:36:12]

3-second silence interval. Subjects were asked to identify the tone by pointing to the corresponding key on a
paper keyboard. The test administrator recorded their choice of keys and later calculated the number of correct
identifications and the number of incorrect identifications that were off by one semitone. The accuracy of these
calculations were corroborated by another test administrator. Subjects were considered to possess absolute pitch
if they met two criteria: (1) They identified a minimum of 85% of the pitches accurately and (2) their incorrect
identifications, if any, were off target only by a semitone in at least 85% of the errors. It was found that all the
subjects who met criteria 1 also met criteria 2 (n = 12). These subjects were classified as absolute pitch
possessors. Six subjects identified between 58% and 77% of the tones accurately. The incorrect identifications of
five of these subjects were off from target only by a semitone in most of the errors (88% - 100%) and those of the
remaining subject were off by a semitone in 54% of the errors. Although these six subjects identified pitch in
isolation better than did the relative pitch possessors tested by Zatorre et al. (1998), they did not meet the two
criteria fulfilled by absolute pitch possessors in the present study. These subjects were classified as pseudo
absolute pitch possessors and excluded from the comparisons of absolute and relative pitch possessors.
The data from two subjects with hearing impairments and two subjects who did not follow test directions
properly were discarded. The final sample was comprised of 12 musicians with absolute pitch, 11 musicians with
relative pitch, six musicians with pseudo absolute pitch, and 12 nonmusicians. There were two left-handed
subjects in each of the groups and a total of 18 female and 23 male subjects in the sample.
Results
In order to establish whether there were differences in spatial abilities among absolute pitch possessors, relative
pitch possessors, and nonmusicians, I analyzed their scores in the three tests (Hidden Figure Test, the Spatial
Subtest, and the Object Assembly Subtest) through three analyses of variance. The results showed that Group
affected scores in the Hidden Figures Test, F (2,34) = 8.33, p = .001, but did not affect subjects' performance in
the Object Assembly and Spatial Subtests. Scheffe comparisons indicated that absolute pitch possessors obtained
significantly higher scores, p < .05, than did relative pitch possessors and nonmusicians in the Hidden Figures
Test (Table 1). There were no difference between the Hidden Figure Test scores of nonmusicians and musicians
with relative pitch .
Table 1
Tests' mean scores of nonmusicians, absolute pitch possessors and relative pich possessors
Group (n) Hidden Figure Spatial Object Assembly
Absolute Pitch (12) 19.55 (7.06) 54.58 (7.61) 57.00 (5.7)
Relative Pitch (11) 12.63 (6.36) 48.82 (9.23) 53.91 (14.65)
Nonmusicians (12) 9.38 (5.11) 48.42 (10.23) 50.92 (5.81)
SD are provided between parenthesis
The results of the analyses do not support previous findings regarding the superior spatial abilities of musicians
as compared to nonmusicians because no differences in spatial performance could be established between
musicians with relative pitch and nonmusicians. In order to study the effect of music training on spatial abilities
further, I performed multiple regression analyses with years of music instruction and years of participation in
ensembles as the independent variables and the three test scores as the dependent variables. The data from the six
subjects classified as absolute pitch possessors were included in these analyses because the classification into
absolute pitch or relative pitch possessors was superfluous. The model explained almost 20% of the variance in
the Hidden Figure Test scores (R = .49; Adjusted R-squared = .19, p = .01) and showed that years of music
training, and not years of music ensemble, affected subjects performance in this tests (Year of music training:

Beta coefficient = .50, p = .01).

Because previous research has consistently shown that absolute pitch possessors begin formal music instruction
during early childhood, I studied the relationship between musicians' performance in the absolute pitch test and
the age at which they started taking music lessons. The correlation between age at which formal music instruction
started and percentage of correct identifications in the absolute pitch test of absolute pitch and pseudo absolute
pitch possessors was only moderate but significant r = .56, p = .01. The average age at which musicians started
formal music instruction was 7.36 years for the relative pitch possessors, 6.16 years for the pseudo pitch
possessors, and 4.5 years for the absolute pitch possessors. Among the relative pitch possessors, one subject (9%)
started lessons at age 3, three (27%) at age 5, and seven (64%) at age 7 or more. Among the absolute pitch
possessors, four subjects (33%) started lessons at age 3, three (25%) at age 4, one (8%) at age 5, three (25%) at
age 6, and one (8%) at age 7. Among the pseudo absolute pitch possessors, one subject (16%) started lessons at
age 4, two (33%) at ages 5, one (17%) at age 6, and two (33%) at ages 7 or more.
It is clear that most absolute pitch possessors began formal music instruction by age 5. Interestingly, research
supporting the cognitive benefits of music instruction was always conducted with children younger than six
(Rauscher et al., 1997). In fact, similar research conducted with children older than five found that music
treatments did not affect their performance in the Object Assembly Subtest or affected their spatial abilities only
temporary (Costa-Giomi, 1999; Hurwitz et al., 1975; Persellin, 2000). I re-analyzed the data to study the effects
of age at which music instruction was initiated (i.e., Starting Age) on the performance in the three spatial tests
through three ANOVAs. Subjects were classified into three groups: musicians who began music instruction by
age 5, musicians who initiated music instruction at age 6 or later, and nonmusicians who never participated in
music instruction. The data of the pseudo absolute pitch possessors was included in these analyses because the
distinction between absolute and relative pitch was not relevant in these analyses. Results indicated that Starting
Age affected the scores in the Hidden Figure Test, F (2,40) = 43.99, p <.01, the Spatial Subtest, F (2,40) = 69.18,
p = .02, and the Object Assembly Subtest, F (2,40), = 75.04, p = .03. Scheffe comparisons showed that musicians
who began music instruction when they were 3, 4, or 5 obtained sgnificantly higher scores in the Hidden Figure
Test and in the Object Assembly Subtest than did those who began later or never received formal music training
p = .05. The Spatial Subtest scores of musicians who began lessons later were slightly (and not significantly)
lower than those of nonmusicians and significantly lower than those of the other musicians (Table 2).
Table 2
Tests' mean scores of participants according to starting age of musical instruction
Group (n) Hidden Figure Spatial Object Assembly
3 to 5 years (15) 19.51 (7.36) 55.80 (6.37) 59.73 (4.3)
6+ years (14) 14.62 (6.95) 47.43 (8.36) 53.14 (13.07)
Nonmusicians (12) 9.38 (5.11) 48.42 (10.23) 50.92 (5.81)
SD are provided between parenthesis
These results suggest that the differences between the absolute pitch possessors and the other musicians and
nonmusicians could be attributed to the early musical training received by the former group. In order to explore
this idea further, I studied the data from the absolute and relative pitch possessors included in the original
analyses taking into consideration the age at which they began music instruction. ANOVAs with Group (absolute
pitch or relative pitch possessors) and Starting Age (3/4/5 or 6+ years) as the independent variables indicated that
Group affected the Hidden Figure Test scores F (1,19) = 4.53, p =.05. No other significant main effects or
interactions were found. It is important to consider the small number of subjects included in these exploratory
analyses: Only four relative pitch possessors had began music instruction by age 5 and four absolute pitch
possessors had done so after such age. The number of absolute pitch possessors with early musical training and

relative pitch possessors with late musical training was higher (n = 8 and 7 respectively).
Discussion
The results of the study show that a selected group of musicians, those who had absolute pitch, performed better
in the Hidden Figure Test than did nonmusicians and musicians with relative pitch. No other differences among
absolute pitch possessors, relative pitch possessors, and nonmusicians could be established.
Based on the results of previous studies about the superior spatial abilities of musicians as compared to
nonmusicians and the effects of music instruction on spatial development, I had expected to find clear differences
in the spatial scores of musicians and nonmusicians. In this study, musicians had extensive music training (M =
15.5 years) and nonmusicians had no formal instruction or ensemble experience. Although the mean scores of the
nonmusicians were lower than those of the musicians, in none of the three spatial tasks used in the study did
musicians with relative pitch outperform nonmusicians and only in the Hidden Figures Test did musicians with
absolute pitch obtain significantly higher scores than did nonmusicians.
Interestingly, absolute pitch possessors not only outperformed nonmusicians in the Hidden Figure Test test, but
they also outperformed musicians with relative pitch. This difference between absolute pitch and relative pitch
possessors cannot be exclusively attributed to music training given that the type and length of the training
received by both groups were very similar. The only obvious difference between the two groups of musicians
was the age at which they began formal music instruction. Absolute pitch possessors began taking lessons at a
younger age (4.5 years) than did relative pitch possessors (7.4 years) suggesting that music training may improve
the performance in specific spatial tasks only if it occurs very early in life.
Other results of the study support the idea that the age at which music training is initiated affects the relationship
between music training and spatial development. When I disregarded the information about subjects' possession
of absolute pitch or relative pitch and analyzed the data according to the age at which they began formal music
instruction, I found that musicians who started music lessons during the first five years of life scored significantly
higher in the Hidden Figure Test and in the Object Assembly Subtest than did nonmusicians. Musicians who
began music instruction after their sixth birthday did not score significantly better than nonmusicians in any of
the spatial tests and scored significantly lower than the musicians with early musical training in the Spatial
Subtest. Apparently, extensive music instruction may be associated with enhanced performance in certain spatial
tasks only if provided from a very young age.
There is some evidence that early musical instruction may not only be associated with enhanced performance in
certain spatial tasks, but actually improves the development of spatial abilities. While research conducted with 3-,
4-, and 5-year-olds concluded that music instruction improved children's performance in the Object Assembly
Subtest of the Weschler Intelligence Scale for Children (Gromko & Poorman, 1998; Rauscher et al., 1997),
studies developed with older children did not find such improvements or found only temporary improvements in
certain spatial tasks (Costa-Giomi, 1999; Hurwitz et al., 1975; Persellin, 2000). In the present study, there were
no differences between the Object Assembly Subtest scores of subjects with extensive musical training (i.e.,
musicians with relative and absolute pitch) and those with no formal musical training (i.e., nonmusicians).
However, musicians who began taking music lessons during their first five years of life outperformed
nonmusicians in this test. These findings corroborate that the age at which music training is initiated affects the
relationship between music instruction and spatial development.
Because starting age of musical instruction affected test scores, it could be assumed that the superior performance
of absolute pitch possessors in the Hidden Figure Test reported earlier was the result of their participation in
music instruction from a young age. While most absolute pitch possessors began taking lessons by age 5 (66%),
most relative pitch possessors did so after age 7 (64%). Despite these obvious differences between absolute and
relative pitch possessors, it is not clear that starting age of musical instruction is the cause for their performance
in the Hidden Figure Test scores. Exploratory analyses that looked at the effects of both starting age and absolute
pitch on partial data (nonmusicians were excluded from these analyses), indicated that absolute pitch, and not
starting age, affected the scores in this test. Additionally, no interaction between the two variables was found.
Because of the small sample size of these particular analyses, their results should be taken with caution.
However, the findings indicate that there are differences other than starting age of musical instruction between
absolute pitch possessors and relative pitch possessors that affect their performance in a nonmusical task. Future

research may attempt to identify these differences.

In a summary, the results of the study indicate that age at which music instruction is initiated affects performance
in specific spatial tasks. Individuals who receive extensive musical instruction from a very early age may have an
advantage when completing selected spatial tests over those with no formal musical training. Apparently, starting
musical instruction after age 6 does not provide individuals with a comparable advantage. Additional results
suggest that there are characteristics associated with absolute pitch, other than starting age of musical instruction,
that affect musicians' performance in the Hidden Figure Test. The differences between the performance of
absolute pitch possessors and relative pitch possessors in this nonmusical task guarantee further research.
References
Baharloo, S., Johnston, P. A., Service, S. K., & Freimer, N. B. (1998). Absolute pitch: an approach for
identification of genetic and nongenetic components. American Journal of Human Genetics, 62 (2), 224-231.
Costa-Giomi, E. (1999). The effects of three years of piano instruction on children's cognitive development .
Gromko, J. E. & Smith Poorman, A. (1998). The effect of music training on preschoolers' spatial-temporal task
performance. Journal of Research in Music Education, 46. 173-181.
Hurwitz, I., Wolff, P., Bortnick, B., & Kokas, K. (1975). Nonmusical effects of the Kodaly music curriculum in
primary grade children. Journal of Learning Disabilities, 8(3), 45-52.
Persellin, D. C. (2000, March). The effect of activity-based music instruction on spatial-temporal task
performance of young children. Paper presented at the Music Educators National Conference, Washington, DC.
Rauscher, R., Shaw, G., Levine, L., Wright, E., Dennis, W., Newcomb, R. (1997). Music training cause
long-term enhancement of preschool children's spatial-temporal reasoning. Neurological Research, 19, 2.
Révész, G. (1953). Introduction to the Psychology of Music. London: Longmans Green.
Sergeant, D.C. (1969). Experimental investigation of absolute pitch. Journal of Research in Music Education, 17,
135-143.
Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F. & Evans, A. C. (1998). Functional anatomy of
musical processing in listeners with absolute pitch and relative pitch. Proceedings of the National Academy of
Sciences of the United States of America 95 (6), 3172-3177.
Back to index

Rhythm Categorization in Context
Proceedings paper

Edward W. Large Center for Complex Systems
Florida Atlantic University
777 Glades Road, P.O. Box 3091
Boca Raton, FL 33431-0991
USA
large@walt.ccs.fau.edu
The perception of rhythmic patterns exhibits certain features of categorical perception, including abrupt category boundaries and nonmonotonic
discrimination functions (Clarke, 1987; Schulze, 1989). Other attributes of rhythm categorization, however, such as good within-category
discrimination and strong dependence on context have special implications for the perception of musical rhythm. It has been suggested that two
processes operate in rhythm perception, one assigns rhythms to categories depending on metrical context, while another interprets category deviations
as expressive (Clarke, 1987). This interpretation possesses a certain circularity, however. Perceived rhythmic patterns influence the perception of
metrical structure, while metrical structure influences the perception of rhythmic patterns. How does a temporal sequence give rise to a structural
interpretation? How does metrical structure subserve both categorization and discrimination? Which temporal fluctuations are expressive, and which
force structural reinterpretation? The current study aims to address these issues by investigating the role of rhythmic context in the categorization of
temporal patterns.
Background
In a pioneering study, Clarke (1987) demonstrated that the categorization of rhythmic patterns was sensitive to metrical context. Music students
listened to short musical sequences in which the durations of the final two time intervals were varied systematically to create an interval ratio between
1:1 and 2:1, inclusive. Musicians were asked to categorize the ratio of the final two intervals as either 1:1 and 2:1. The musical sequences were
presented in two blocks, one providing the context of duple meter, the other a context of triple meter. Clarke found that the position of the category
boundary shifted according to metrical context: Ambiguous ratios (between 1:1 and 2:1) were more likely to be categorized as 2:1 in the context of
triple meter, whereas these same ratios were more likely to be categorized as 1:1 in the context of duple meter. Moreover, in a discrimination task
Clarke discovered a nonmonotonic discrimination function with a single peak at the category boundary, providing evidence for categorical perception.
Schulze (1989) criticized Clarke's findings on two grounds. First, he argued, because listeners were forced to choose between two categories, the
category boundary shift might not be perceptual; it might simply reflect a shift in response criterion. Second, because tempo was held constant,
file:///g|/Tue/Large.htm (1 of 7) [18/07/2000 00:36:16]
listeners might not be performing rhythm discrimination task at all, but rather a time discrimination task. Thus, the evidence for categorical perception
of rhythmic patterns might be suspect as well. To control for these factors Schulze (1989) asked two musicians to learn numerical category tags for
prototypical rhythmic patterns. Then, during a response phase, the tempo of the patterns was roved randomly from trial to trial, and listeners' rated
rhythms according to the degree to which they were perceived as realizations of the prototypical patterns. When the ratings were used to derive a
measure of discriminability Schulze found nonmonotonic discrimination functions. But these were not the single-peaked functions of classic
categorical perception (Liberman, et. al., 1957); these discrimination functions contained multiple peaks. These results suggest that rhythmic patterns,
heard out of context, are not perceived categorically.
The specific question of whether or not rhythmic patterns are perceived categorically may be beside the point, however. First, a large branch of
research into categorical perception has called the entire categorical/continuous distinction into question (Macmillan, 1987). Second, Clarke reported
within-category discrimination that was much better than is typical of other categorical judgements. In support of this observation, Jones & Yee (1993)
have reported that time discrimination is better in metrical than nonmetrical contexts. Finally, it is quite clear that musicians (at least) categorize
rhythmic patterns all the time, hence the ability to notate musical performances. Thus, the more relevant questions would seem to be: What is the role
of context in the categorization of rhythmic patterns?, and Can this phenomenon tell us something about how people perceive metrical structure?
If Clarke's (1987) interpretation is correct, that meter provides the categories available for rhythmic pattern classification, then his finding may indeed
inform us as to the nature of meter perception. In dynamical systems terms, Clarke's data provide evidence of hysteresis in meter perception, the
persistence of a percept (e.g. a duple meter) despite a change in the stimulus that favors an alternative pattern (e.g. a triple meter). For example,
understanding the influence of context in rhythm perception could help to address the basic issue of whether meter perception is best described as a
linear (e.g. Scheirer, 1998; Todd, et. al. 1999) or a nonlinear (e.g. Large & Kolen, 1994; Large, 2000) dynamical system, because nonlinear dynamical
systems exhibit hysteresis, whereas linear systems do not.
A Categorization Experiment
The initial objective of this study was to establish baseline observations regarding the role of metrical context in rhythm perception for both musician
and non-musician listeners. An additional objective was to establish an experimental methodology for studying context effects within a framework that
supports interpretation from a dynamical systems point of view. The main requirement for assessing the existence of hysteresis is the systematic
variation of a single stimulus parameter. Thus, in the case of rhythm perception, the stimulus should be gradually changed from one rhythmic figure to
another. However, this raises the difficult problem of distinguishing between hysteresis in the listener's perception and hysteresis in the response.
Listeners responding to a gradually changing stimulus parameter may persevere in their responses even after their percept has changed. There are other
interpretive problems as well. For example, are observed hysteresis effects truly perceptual, or do listeners persist in an earlier decision while the
stimulus parameter passes through values for which they are uncertain about what they are hearing? Hock, Kelso, & Schöner (1993) developed a
methodology for studying perceptual hysteresis in apparent motion patterns. It allows the study of perceptual changes resulting from varying the value
one stimulus parameter using a simple modification of the psychophysical method of limits. The modified method of limits procedure minimizes the
potential for confounding perceptual hysteresis with response hysteresis by requiring a single response only after an entire sequence has been heard.
Here, the modified method of limits procedure is applied to the categorization of rhythmic patterns in an attempt to deal with interpretive problems.
Methods

Stimuli. Stimulus sequences were constructed using sine tones of 70ms total duration (20ms linear onset ramp, 30ms steady state, and 20ms linear
offset ramp) at 262 Hz (middle C). Each rhythmic pattern consisted of three equal-amplitude tones. In one condition the inter-onset interval (IOI)
between the first and third tone, termed the base interval, was 600ms, in another condition it was 300ms. An intervening tone partitioned the base
interval into two sub-intervals, forming a ratio of 1:1, a ratio of 2:1, or one of nine intermediate ratios equally spaced between 1:1 and 2:1. Sequences
were then created by increasing or decreasing the time interval ratio on consecutive cycles within the same sequence. The interval between cycles was
always equal to the base interval (either 600ms or 300ms), creating a rhythmic sequence with a strong beat. Ascending and descending sequences were
presented alternately within randomized blocks of trials.
Ascending sequences all began with a 1:1 ratio, and the ratio increased at a rate of one step per cycle for a variable number of cycles between one and
eleven. Likewise, descending sequences all began with a 2:1 ratio, which decreased by one step per cycle, for between one and eleven cycles. Thus,
both ascending and descending sequences varied with regard to how deeply they probed into the range of ratios that would lead to a transition from the
perception of one pattern to the other. Subjects made one response to an entire ascending or descending sequence, they did not execute a response at
each step in the gradually changing sequence, as in the standard method of limits procedure. Because there was no relationship between when the
listeners heard a change and when the reported their response, observed effects of parameter change involved perceptual hysteresis uncontaminated by
response hysteresis.
One methodological problem not directly addressed by the modified method of limits procedure concerns the possibility that hysteresis can be due to
listeners persisting in the same decision while the stimulus parameter passes through values for which they are unsure of what they are hearing (e.g.
Schulze, 1989). Because this presented a realistic possibility for these stimuli, subjects were asked two types of questions. One question was designed
to ascertain when subjects began to hear a rhythm pattern prototypical of the alternative category. The other was designed to determine when the
rhythmic pattern ceased to be heard as a pattern of the original type. Thus it was possible to determine when listeners heard patterns as neither duple
nor triple.
Design. There were two blocks of 132 sequences in each of four daily sessions. Each block was further subdivided into six sets of 22 sequences,
randomized for sequence length, and alternating between ascending and descending sequences. In the first and third sessions, listeners heard sequences
based on a 600ms base interval. In the second and forth sessions, sequences were based on a 300ms interval. In the first two sessions, for the ascending
sequences (starting with a duple), subjects responded yes if at any time during the sequence they clearly heard a triple pattern, otherwise they
responded no. Conversely, for the descending sequences (starting with a triple), subjects responded yes if at any time during the sequence they clearly
heard a duple pattern, otherwise they responded no. To disambiguate, subjects were further instructed that if unsure of whether they heard a duple or
triple, they were to respond no. In the last two sessions, subjects were asked a different type of question. For the ascending sequences (starting with a
duple), subjects responded yes if at any time during the sequence they heard a pattern that was clearly not a duple, otherwise they responded no. For
the descending sequences (starting with a triple), subjects responded yes if at any time during the sequence they heard a pattern that was clearly not a
triple, otherwise they responded no. To disambiguate, subjects were further instructed that if unsure of whether they heard a duple or triple, they were
to respond yes.
Apparatus. Stimulus sequences were generated by a Max program, running on a 450 MHz Macintosh G3 computer (MacOS 9.0). Tones were
generated on a Kurzweil K2500RS sampling synthesizer, controlled by the Max program via MIDI. Sequences were presented and played to listeners
over Sennheiser HD250 headphones.

Subjects. Two musicians two non-musicians participated in this experiment. Only the results for musicians are reported here.
Results
The results are illustrated in Figure 1 for Musician 1 (M1, left column), Musician 2 (M2, right column), for the 600ms base interval (top row), and
300ms base interval (bottom row). The solid lines in each graph summarize the results of the first two daily sessions. They indicate where the
musicians first heard a transition to triple in the ascending condition (solid black line) and where they first heard the transition to duple in the
descending condition (solid gray line). In each graph a strong hysteresis-like effect is apparent: the transition to triple in the ascending condition is to
the right of the transition to duple in the descending condition.

Figure 1
The dashed curves show the results of the second two sessions. These indicate where the musicians first heard a transition away from duple in the
ascending condition (dashed black line) and where they first heard the transition away from triple in the descending condition (dashed gray line).
These clearly indicate an ambiguous region. But how should it be interpreted? Compare the two gray lines (farthest left) for M2, 600ms. These
measure boundary the same boundary between duple and not duple, but in different contexts (ascending and descending). Likewise the two black lines
(farthest right) measure the boundary between triple and not triple in different contexts. This leads to two preliminary conclusions. First, there are (at
least) three categories: duple, triple, and neither. Second, category boundaries shift depending on context.
With these observations in hand, context effects can be interpreted. For M1, 600ms, there is a slight hysteresis in the boundary between duple and
non-duple (gray lines). The boundary between triple and non-triple (black lines), however, shows a strong enhanced contrast effect. Enhanced contrast
is the opposite of hysteresis, the switch to an alternative percept before the stimulus parameter reaches a value that favors the alternative percept.
(Tuller, et. al., 1994). This is a nonlinear effect that is often observed in perceptual switching studies and is discussed in more detail below. M2
displayed enhanced contrast at every boundary. Finally, in the 300ms condition, M1 displayed hysteresis at every boundary. Although not reported in
detail here, both non-musicians also displayed greater hysteresis in the 300ms condition.
Finally both listeners were interviewed regarding their perception of the patterns. M1 found the 600ms patterns slightly ambiguous: "Sometimes the

pattern just before the switch sounded like neither", but reported little or no perceptual ambiguity in the 300ms patterns. M2 reported hearing several
ambiguous cycles in every sequence. The results confirm the individual introspective reports. In addition, both musicians reported hearing distinct
accents in the rhythmic patterns. M1 reported basing judgements mainly on the perceived patterns of accent, while M2 reported trying to focus on
pattern timing, especially in the last two sessions.
Discussion
These results extend Clarke's (1987) finding that metrical context influences the categorization of rhythmic patterns. However, the situation is more
complex than the original observations indicated. In many cases, musicians perceive patterns to be neither duple nor triple. When this third possibility
is accounted for, however, metrical context remains a powerful determinant of perceptual categorization. Moreover, the methodology used here
minimized the possibility that observed boundary shifts reflected decision or response processes, therefore it is likely that the these effects are truly
perceptual in nature. In addition, strong individual differences can be seen. These may reflect different listening strategies, differing temporal acuity, or
different interpretations of what constitutes category membership. Nevertheless, both hysteresis and enhanced contrast are present in rhythm
categorization, emphasizing the importance of individual differences and illustrating the dangers or averaging over the subjects. Finally, these data hint
at the possibility that perceptual categorization may show different properties depending upon the absolute length of the time intervals involved, i.e.
M1's stronger hysteresis in the 300ms condition.
The findings of hysteresis and enhanced contrast in rhythm perception represent perhaps the first direct evidence addressing the distinction between
linear and models of beat / meter induction. The current data may be taken as a strong indication of nonlinearity in categorization of temporal intervals.
However, this data goes beyond any current proposals; for example, Large's (2000) model of meter perception includes hysteresis as a basic prediction,
but not enhanced contrast. Enhanced contrast in perception is usually taken as evidence of adaptation at the neural level, and this is not a feature of any
current model. Thus, this data may prove useful in extending models of meter perception.
There are still at least two of problems to be addressed, however. First, the meaning of the third category, 'neither duple nor triple' is unclear. From a
theoretical point of view, there are several possibilities: Both categories could be activated simultaneously, or neither category may be activated, or
perhaps ambiguity indexes the relaxation time of the system, the time required for one category to be activated and the other deactivated. It is also
possible that there is an actual representation of a different category, based on a finer subdivision of the base interval. Further research will be required
to sort out these possibilities. Second, it should be emphasized that we do not yet know whether the perceptual categorization of rhythms in metrical
contexts can be taken as diagnostic of the meter perception process itself. At this point, this is a working assumption, but one that can and should be
tested.
In summary, this study extends previous findings that metrical context effects the categorization of rhythm patterns. Rhythm categorization in context
is not due to decision or response processes; it is perceptual in nature. There are strong individual differences among musicians and there is some
evidence that perceptual categorization operates differently depending on the absolute time intervals involved. The context effects observed constitute
evidence that processes responsible for rhythm categorization within a metrical context are nonlinear. Under the assumption that rhythm categorization
reflects meter perception, these results suggests that perceptual stability is an important aspect of meter perception and may be taken as evidence for a
nonlinear dynamical model of metrical pattern formation.
References

Clarke, E. F. (1987). Categorical rhythm perception: An ecological perspective, In A. Gabrielsson (Ed.) Action and Perception in Rhythm and
Music, pp 19-33. The Royal Swedish Academy of Music, 55.
Jones, M.R. & Yee, W. (1997) Sensitivity to time change: The role of context and skill. Journal of Experimental Psychology: Human
Hock, H. S., Schöner, G., & Kelso, J. A. S. (1993). Bistability and hysteresis in the organization of apparent motion patterns. Journal of
Large, E. W., & Kolen, J. F. (1994). Resonance and the perception of musical meter. Connection Science, 6, 177 - 208.
Large, E. W. (2000). On synchronizing movements to music. Human Movement Science, In press.
Liberman, A. M., Harris, K. S., Hoffman, H. S., Griffith, B. C. (1957). The discrimination of speech sounds within and across phoneme
boundaries. Journal of Experimental Psychology, 54, 358 - 368.
Macmillan, N. A. (1987). Beyond the categorical/continuous distinction: A psychophysical approach to processing modes. In S. Harnad (Ed.)
Categorical Perception: The Groundwork of Cognition, pp 53-85. Cambridge: Cambridge University Press.
Scheirer, E. D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society of America, 103, 588-600.
Schulze, H. H., (1989). Categorical perception of rhythmic patterns. Psychological Research, 51, 10-15.
Todd, N. P., Lee, C.S., & O'Boyle, D.J. (1999). A sensory-motor theory of rhythm perception. Journal of New Music Research, 28, 1-24.
Tuller, B., Case, P., Ding, M., & Kelso, J. A. S. (1994). The nonlinear dynamics of speech categorization. Journal of Experimental Psychology:
Human Perception & Performance, 20, 3-16.
Back to index

COGNITIVE AND MOTOR COORDINATION OF POLYRHYTHMS IN PIANO PERFORMANCE
Proceedings paper
COGNITIVE AND MOTOR COORDINATION OF POLYRHYTHMS IN PIANO

PERFORMANCE.
Sally Bogacz, Department of Psychology, University of Maryland.
Background. Piano playing in highly skilled people is remarkable for the speeds that can be achieved
as well as the feats of coordination that can be accomplished. Pianists can play at speeds up to 16
notes/sec (Lashley, 1951), and music that is now standard piano repertoire involves very difficult
coordination tasks.
Aims. The main aim of this research is to see if underlying processes associated with the performance
of a difficult motor coordination task (performance of polyrhythms), change fundamentally with
speed. The idea behind this question is: Does the motor system become increasingly modular with
speed (Fodor, 1983)? Or, to put it another way, does increasing the speed make the motor system less
susceptible to perceptual and cognitive monitoring, i.e. operate more like a "dumb" reflex, that is fast,
domain specific, automatic and involuntary?
Method. Two groups of adult subjects participated: "Experts", or professional pianists and
"Non-experts", or people who had not played piano for more than five years. All subjects were
right-handed except for one expert.
Participants performed on a velocity-sensitive weighted 76-key keyboard (Yamaha YPP-50), which
was monitored by a Macintosh computer (Powerbook 520c) using a MIDI interface (Opcode
Professional Plus) and Opcode "Vision 3.0.3" software. A warning tone sounded, then participants
heard auditory stimuli which acted as a "metronome", because these stimuli were always presented at
the target speed.
The auditory stimuli consisted of target pitches that were written to sound like two overlapping
melodies. Participants began to play as soon as possible after hearing the stimuli. The stimuli were
presented at different target speeds and participants always performed these speed conditions by going
from slow to fast.
Results. There were six experiments: Experiment 1 was a perceptual experiment, not discussed in this
talk. Experiment 2 tested the effect of speed on performance and discovered that all participants
tended to perform better at high speeds. Experiment 3 was designed to measure this phenomenon
more carefully: However, the same effect was found. Experiment 4 was designed to control for the
effect of practice on superior performance at high speeds. Two participants, (one expert, one
non-expert) were tested, but no effect of practice was found. Experiment 5 answered the modularity
question: Did participants' superior performances at high speeds result from lack of interference by
perceptual systems? However, participants (who were required to perform the polyrhythm in pairs
coupling one hand with a hand belonging to the other person), were able to utilize perceptual
monitoring at high speeds. Lastly, Experiment 6 investigated whether visual or auditory monitoring
was more important for successful performance (subjects were blindfolded).
Figure 1 shows an example of the main result, i.e. performances getting better with speed. The data
file:///g|/Tue/Bogacz.htm (1 of 3) [18/07/2000 00:36:17]

are taken from Experiment 3, and show subjects' performances after the pulses were switched off.
The data in Figure 1 show that performance is much more variable at slow speeds than at high speeds.
This is confirmed by standard deviations, which show consistently more variability for slower speeds:
11-159 msec for 1 note/sec and 1-10 msec for 12 notes/sec. Taken together, the data vividly illustrate
that the problem with this task is the control of motor performance at very slow speeds, not very fast
speeds.
Conclusions. The main result was that subjects played very well at high speeds, in terms of their
ability to keep up with the beat, shown by low variability both across and within subjects. By contrast,
at slow speeds, subjects' performances were much more variable.
This is a very surprising result, because it seems so intuitively obvious that performance should get
worse as speed increases. What is going on?
One explanation is that superior performance at high speeds could be due to the motor system shifting
into a modular state, placing it outside any interference from perceptual or cognitive processes.
However, Experiment 5 showed that perceptual monitoring is possible at very high speeds (the data
from Experiment 5 are very similar to the data from Experiment 3 shown in Figure 1).
Experiment 6 investigated which kind of perceptual monitoring was most useful by re-doing
Experiment 5 with participants wearing blindfolds. The results from Experiment 6 show that coupling
between the hands disappears and that at slow speeds, subjects slowed significantly.
Another experiment could be done which is the converse of Experiment 6, where the subjects can see
but not hear, i.e. suppress the sound from the piano keyboard. If the data were the same as
Experiments 2-5, we might be able to say that success at high speeds is due to a visuo-motor module
rather than a motor module by itself.
Evidence for a visuo-motor module has been found in the work of Milner & Goodale (1995) who

show that processes for telling what an object is are correlated with different pathways in the visual
system from the processes that tell where an object is. The "what" processes involve the conceptual
representation of the object, while the "where" processes work in conjunction with the motor system.
These "where" processes have been shown by Milner & Goodale (1995) to be fast and automatic, and
can been thought of as a visuo-motor module. These findings are consistent with those of Experiment
6, which found that lack of visual information caused performance to suffer.
Key Words: polyrhythm, performance, expert, non-expert, modular, visuo-motor module.
References:
Fodor, J. A. (1983). Modularity of Mind. Cambridge MA: MIT.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral
mechanisms of behavior (pp. 112-131). New York: Wiley.
Back to index

Dr Adrian North
POP MUSIC LYRICS AND THE ZEITGEIST: A HISTORIOMETRIC ANALYSIS
Dr Adrian North
ACN5@LE.AC.UK
Background:
Many authors have often considered the relationship between (particularly pop)
music and the broader zeitgeist in which it is experienced. However to date
there has been little quantitative research on the subject.
Aims:
The study described in this paper aimed to investigate the relationship between
the content of pop music lyrics and various zeitgeist indicators, and also to
investigate trends in the evolution of pop music lyrics.
method:
Lyrics were obtained for each song to have appeared in the British weekly Top 5
singles sales charts between March 1960 and December 1998. The lyrics were
scored by text analysis software to produce several lyrical variables (e.g.
optimism). Zeitgeist indicators (e.g. GDP, crime figures) were obtained from a
variety of sources (e.g. UK Government publications): some of these indicators
were available monthly, while others were available only quarterly or annually.
Results:
At the time of writing, analysis of the data has only just begun. This will
describe whether pop music lyrics precede or follow changes in the zeitgeist
variables, and will describe how the lyrics have changed over the 38 year
period considered.
Back to index
file:///g|/Tue/North.htm [18/07/2000 00:36:18]

TuePM5_3 Patel
Proceedings paper
TONE SEQUENCE STRUCTURE IS REFLECTED IN DYNAMIC

NEURAL RESPONSES
Aniruddh D. Patel
The Neurosciences Institute
10640 John Jay Hopkins Drive, San Diego, CA 92121, USA
apatel@nsi.edu
ABSTRACT
Background. It is a commonplace notion that melodies are tone sequences which are neither too random nor too predictable
in their structure. Little is known, however, about patterns of brain response as a function of the structure of tone sequences.
Aims. This study sought to determine if differences in the statistical structure of tone sequences are reflected in measurable,
dynamic neural responses, and if sequences that are melody-like in their statistical properties have a distinct neural
signature.
Methods. Subjects listened to 1-minute long diatonic tone sequences while neural signals were recorded using 148-channel
whole-head magnetoencephalography (MEG). Sequences were random, deterministic (scalar), or one of two categories of
'fractal' sequences differing in their balance of predictability and unpredictability. (One of the fractal categories had
melody-like statistics). Amplitude-modulation of the tone sequences was used to generate an ongoing, identifiable neural
response whose amplitude and timing (phase) could be studied as a function of sequence structure.
Results. Ongoing timing patterns in the neural signal showed a strong dependency on the structure of the tone sequence. At
certain sensor locations, timing patterns covaried with the pitch contour of the tone sequences, with increasingly accurate
tracking as sequences became more predictable. In contrast, interactions between brain regions (as measured by temporal
synchronization), particularly between left posterior regions and the rest of the brain, were greatest for the tone sequences
with melody-like statistics. This may reflect the perceptual integration of local and global pitch patterns in melody-like
sequences.
Conclusions. Dynamic neural responses reveal a neural correlate of pitch contour in the human brain, and show that
interactions between brain regions are greatest when tone sequences have melody-like statistical properties.
1. Introduction
Melodies are a special subset of auditory sequences. Their acoustic raw materials can be extremely simple (e.g. a few dozen
pure tones), yet the arrangement of these materials in time can create structures that engage a host of interacting mental
processes, including chunking, melodic expectancy, and the perception of meter. Studying these processes is a principal goal
for the cognitive science of melody, and is being actively pursued by a number of research groups (e.g. Krumhansl et al.,
2000).
Discovering the neural correlates of melodic processing is a challenge for cognitive neuroscience. Progress in this area has
focused on average neural responses to individual events in sequences (e.g. via event-related potentials or ERP Besson &
Faïta, 1995) or on the brain's average response to entire sequences (e.g. via positron emission tomography or PET Zatorre et
al., 1994). These techniques continue to provide valuable information, yet it is evident that to "tap into the
moment-to-moment history of mental involvement with the music" (Sloboda, 1985), techniques are needed that measure
patterns of neural activity as perception unfolds within individual sequences.
With these goals in mind, we set out to determine if aspects of tone sequence structure are reflected in dynamic neural
responses, and if melody-like sequences have a distinct neural signature. Full details of this study are given in Patel &
file:///g|/Tue/Patel.htm (1 of 6) [18/07/2000 00:36:21]

TuePM5_3 Patel
Balaban (2000). This paper emphasizes a qualitative understanding of our methods and results.
2. Methods
To explore brain responses, we used statistically-generated tone sequences. This allowed us to generate novel stimuli which
lay on a spectrum from random to deterministic in structure. We elected to use statistical tone sequences rather than
precomposed melodies so that the sequences would be unfamiliar to subjects, easily generated in quantity, and
mathematically well characterized. The latter two points were of particular importance because we were employing a novel
brain imaging technique and wanted to have good control over the stimuli.
All tone sequences were approximately 1 minute long, consisting of ~150 pure tones (415 msec each) with no temporal
gaps. Sequences were diatonic, and ranged between A3 (220 Hz) and A5 (880 Hz) in pitch. Four structural categories of
sequences were employed: random, deterministic (musical scales), and two intermediate 'fractal' categories of constrained
variation which differed in their balance of predictability and unpredictability (Schmuckler & Gilden, 1993). These
categories were given mathematical names in accordance with the technique used to generate them1: 1/f ("one over f") and
1/f2 ("one over f squared").
A qualitative understanding of these categories is possible without delving into the underlying mathematics. In random
sequences each successive pitch is chosen independently of the previous one, and there are no long term pitch trends.
Deterministic sequences represent the opposite case: they consist entirely of long-term pitch trends (predictable stair-like
patterns) with no short-term unpredictability. The fractal sequences are intermediate. 1/f sequences have a hint of long term
pitch trends but still have much unpredictable variation from one pitch to the next. 1/f2 sequences are strong in long term
pitch trends, but retain a small amount of unpredictability in the behavior of successive tones.
Examples of pitch contours from the different sequences are shown in Figure 1, f-i (black/dark lines). Note the different
shapes of the pitch contours in the four conditions: the random sequence has no discernable long-term patterns. The 1/f pitch
contour has some evidence of long-term patterns (e.g. the general dip in the pitch contour in the middle of the sequence,
followed by a slow climb in average pitch), but retains a good deal of unpredictable jagged pitch movement. The 1/f2 pitch
contour has clearly discernable long-term pitch patterns with relatively little unpredictable jaggedness. The pitch contour of
the scales moves up and down in a completely predictable way. Sound examples of all tone sequence categories can be
heard at: www.nsi.edu/users/patel/tone_sequences.
Subjects (n=5 right handed males, 2 with musical training) were familiarized with the different stimulus categories in a
training session where examples of each category were presented along with an arbitrary category label (the numbers 1-4).
Subjects quickly learned to identify the different categories, and during the experiment, classified novel sequences by their
category with little difficulty. The experiment consisted of 28 such sequences, 7 per category. Stimuli in each category were
equally distributed among seven Western diatonic modes (ionian, dorian, phrygian, lydian, mixolydian, aeolian, and
locrian). Each subject heard a unique set of stimuli, with the exception of the scales, which were identical across subjects.
During stimulus presentation, neural data were recording using 148-channel whole head magnetoencephalography (MEG).
MEG measures magnetic fields produced by electrical activity in the brain, providing a signal with similar time resolution to
electroencephalography (EEG) but with certain advantages relating to source localization and independence of signals
recorded from different parts of the sensor array (Lewine & Orrison, 1995).
We used a novel method to detect stimulus-related neural activity. Each sequence was given a constant rate of amplitude
modulation (41.5 Hz), as shown in Figure 1 a-c. Fig 1a shows frequencies from a 4-second portion of a tone sequence. Fig
1b shows the associated amplitude waveform. Figure 1c provides a detail of a small piece of the waveform, showing the
constant amplitude modulation frequency (41.5 Hz, blue/dark line) overlaid on the changing carrier frequency. This
amplitude modulation gave the tone sequences a slightly warbly quality, without disrupting their perceived pitch pattern:
listeners heard them as sequences of pitches at the underlying pure tone frequencies.
It is known from auditory neuroscience that continuous amplitude modulation of pure tones results in a detectable brain
response at the amplitude modulation frequency (Galambos et al., 1981; Hari et al., 1989), known as the auditory
"steady-state response" (SSR). This response is visible in a power spectrum of the brain signal, which shows a peak at the
amplitude modulation frequency. Fig 1d shows a 4-second piece of brain signal, and Fig1e shows two corresponding power
spectra, based on two successive 2-second portions of the signal. A peak at 41.5 Hz is clearly visible.
Thus amplitude modulation results in detectable stimulus-related cortical activity. We studied properties of this activity
during individual sequences. In particular, for each sequence heard by a subject we studied the amplitude and timing
characteristics (phase) of this activity in contiguous two-second epochs from each channel. One amplitude and phase value
of the SSR was obtained from each successive 2-second epoch of the channel's brain signal via a Fourier transform, yielding
approximately 30 data points x 148 channels per sequence.

TuePM5_3 Patel
Since a good deal of our analysis concerns phase information, it is worth giving a brief conceptual explanation of phase. By
amplitude modulating our tone sequences at 41.5 Hz, we are introducing an oscillatory signal into the brain at that same
frequency. This causes an oscillatory response (the SSR) at that frequency in certain brain regions. The degree to which the
oscillatory brain response lagged the time-referenced input signal is measured by the phase of the brain response at 41.5 Hz.
We studied the amplitude and phase of the brain response over time during individual sequences heard by our subjects.
3. Results
Our first finding was that the phase of the measured brain signal varied with the pitch of the tone sequence. As pitch
increased, phase advanced (corresponding to a decreased lag between stimulus brain response), and vice-versa. This general
result is depicted in Fig1e, which shows two spectra, one taken during a sequence of low pitches (Fig 1a, left half) and one
taken when pitches were higher (Fig1a, right half). Fig 1e shows that the peak of the SSR remains steady at 41.5 Hz, but its
phase (inset arrow) advances as the average pitch of the tone sequence increases. This relationship between SSR phase and
carrier frequency was suggested by early work (Galambos et al., 1981), and has been independently confirmed by another
laboratory (John & Picton, 2000). It is likely to be due to the tonotopic layout of the basilar membrane in the human ear,
where higher frequencies are closer to the oval window and hence stimulated earlier than lower frequencies.
Our next finding was that the phase of the brain response tracked the pitch contour a subject was hearing, and that this
tracking improved as the sequences became more predictable in structure, with the best tracking for musical scales.
Examples of phase-time contours (red/light lines) overlaid on their corresponding pitch time contours (black/dark lines) are
shown in Fig1 f-i, showing how tracking improves across the stimulus conditions. Each subject showed a number of sensor
locations where this 'phase tracking' of pitch was observed. Across subjects, these locations tended to be in fronto-temporal
regions, with a right-hemisphere bias (Patel & Balaban, 2000, Fig 2). A similar set of locations was identified when we
looked for sensors where the amplitude of the SSR was strong. However, we found no evidence that the amplitude of the
SSR correlated with the heard pitch contour.
Knowing that the phase of the brain response contained information about stimulus properties, we then turned to looking at
patterns of phase coherence between different brain regions. Phase coherence does not measure the lag between an
oscillatory signal and brain response but rather the stability of the phase difference between oscillatory activity in different
brain areas. Thus phase coherence is a measure of temporal synchronization between brain regions. If two brain areas show
greater synchronization during certain condition, this is suggestive of a greater degree of functional coupling between those
areas (see Bressler, 1995 for a review).
We found that across subjects, the different conditions were characterized by differing degrees of phase coherence. Random
sequences generated less phase coherence than all other categories, and among the structured categories, 1/f2 sequences
generated the greatest degree of phase coherence (Patel & Balaban 2000, Fig3). Interestingly, statistical research on Western
music suggests that melodic tone sequences have approximately 1/f2 statistics (Nettheim, 1992; Boon & Decroly, 1995),
suggesting that music-like sequences generated more brain interactions than other sequences.
To better understand the nature of these interactions, we examined topographic patterns of phase coherence, subdividing the
brain into four quadrants (anterior and posterior x left and right). We found that the greater phase coherence of 1/f2
sequences was driven by interactions between the left posterior hemisphere and the rest of the brain, including the two right
hemisphere quadrants. This is of interest because neuropsychological studies of brain-damaged patients suggests that left
superior temporal regions are involved with the discrimination of precise interval sizes, while right fronto-temporal circuits
are involved with the perception of more global contour patterns (Liégeois-Chauvel et al., 1998; Patel et al., 1998). Thus the
observed pattern of coherence may reflect the dynamic integration of local and global pitch perception, and suggests that
this integration is greatest when tone sequences resemble musical melodies.
4. Discussion
This study has shown that it is possible to extract a signal from the human cerebral cortex which reflects the pitch contour an
individual is hearing. The accuracy with which this signal reflects the pitch contour improves as the pitch sequence becomes
more predictable. Thus there may be top-down influences of musical expectancy which influence this brain signal.
The basis of this signal is temporal information in cortical activity. When the amount of activity was examined, no
relationship with pitch contour was observed. This suggests that dynamic imaging techniques have an important role to play
in the study of music perception, complementing techniques sensitive to the amount of neural activity but insensitive to the
fine temporal structure of that activity (e.g. functional magnetic resonance imaging, fMRI).
Dynamic imaging techniques also offer the opportunity to study how brain areas interact during perception. It is clear from
decades of neural research that the brain is divided into different regions, each of which has a special role to play in
perception and cognition. Yet it is also clear that these brain areas must interact to form coherent and unified percepts.
Complex patterns such as music and speech engage multiple brain regions, and sequences with different perceptual

TuePM5_3 Patel
properties may be distinguished by the pattern of brain interactions they engender rather than by the particular brain regions
which respond to them.
Using phase coherence, we examined brain interactions as a function of stimulus structure and found that sequences with
melody-like statistics engendered the greatest degree of neural interactions. In particular, we found evidence for strong
functional coupling between the left posterior hemisphere and right hemisphere regions during the perception of melody-like
sequences. This may reflect the perceptual integration of local and global pitch patterns, and suggests that one neural
signature of melody is the dynamic integration of brain areas which process structure at different time scales.
Future work will use this technique to examine brain interactions as a function of stimulus structure in real melodies. This
may provide one way to quantify the perceptual coherence of melodies in individuals who cannot easily give details of their
perception, such as non-musicians and infants.

TuePM5_3 Patel
Figure 1. (a-d): Example of stimulus and brain response over 4 seconds: (a) Tone frequencies; (b) Stimulus waveform; (c)
Waveform detail (150 msec), showing constant modulating frequency (41.5 Hz, blue/dark line) overlaid on changing carrier
frequency; (d) Neural signal from one sensor. (e) Successive 2-second spectra of neural signal. The brain signal shows an
energy peak at 41.5 Hz, whose phase (inset arrow) varies with carrier frequency. (f-i): Phase-tracking of individual tone
sequences. Pitch-time contours (black/dark lines) illustrate the four different stimulus categories. Associated neuromagnetic
phase-time series (red/light lines) from a single sensor during one trial in one subject were randomly drawn from the top
10% of sensor correlation values for each stimulus. The correlation between the resampled pitch-time series and the
neuromagnetic phase-time series is given in the inset to each graph.
Acknowledgements
This work was supported by the Neurosciences Research Foundation as part of its research program on Music and the Brain
at The Neurosciences Institute.
NOTES
1. Inverse Fourier transform of power spectra with different slopes (see Patel & Balaban 2000 for details).
references
Besson, M. & Faïta, F. (1995). An event-related potential (ERP) study of musical expectancy: Comparison of musicians with nonmusicians. J. Exp. Psych: Human
Boon, J.P. & Decroly, O. (1995). Dynamical systems theory for music dynamics. Chaos 5, 501-508.
Bressler, S. (1995). Large-scale cortical networks and cognition. Brain Research: Brain Research Reviews, 20(3), 288-304.
Galambos, R., Makeig, S. & Talmachoff, P.J. (1981). A 40-Hz auditory potential recorded from the human scalp. Proc. Natl. Acad. Sci. USA 78, 2463-2647.
Hari, R. Hämäläinen, M., & Joutsiniemi, S.-L. (1989). Neuromagnetic steady-state responses to auditory stimuli. J. Acous. Soc. Am. 86, 1033-1039.
John, M.S. & Picton, T.W. (2000). Human auditory steady-state responses to amplitude modulated tones: phase and latency measurements. Hearing Research, 14,
57-79.
Krumhansl, C., Louhivuori, J., Toiviainen, P., Järvinen, T. & Eerola, T. (2000). Melodic expectation in Finnish spiritual folk hymns: convergence of statistical,
behavioral, and computational approaches. Music Perception, 17(2), 151-196.
Lewine, J.D. & Orrison, W.W. (1995). Magnetoencepha-
lography and magnetic source imaging. In: Functional Brain Imaging (W.W. Orrison et al., ed): 369-417. St. Louis: Mosby.
Liégeois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V. & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes to music processing.
Brain 121, 1853-1867.
Nettheim, N. (1992). On the spectral analysis of melody. Interface 21, 135-148.
Patel, A.D. & Balaban, E. (2000). Temporal patterns of human cortical activity reflect tone sequence structure. Nature, 404, 80-84.
Patel, A.D., Peretz, I., Tramo, M. & Labrecque, R. (1998). Processing prosodic and musical patterns: a neuropsychological investigation. Brain and Language 61,
123-144.
Schmuckler, M.A. & Gilden. D.L. (1993). Auditory perception of fractal contours. J. Exp. Psychol: Human Percep. & Perform. 19, 641-660.
Sloboda, J. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford: Clarendon Press.
Zatorre, R.J., Evans A.C. & Meyer, E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience, 14(4), 1908-1919.

TuePM5_3 Patel
Back to index

Origins of Human Musicality: Evolution, Mind, and Culture
Proceedings paper

Timothy C. Justus1 and Jeffrey J. Hutsler2
1Department of Psychology, University of California at Berkeley
Berkeley, California, United States 94720-1650
2Department of Psychology, University of Michigan
Ann Arbor, Michigan, United States 48109-1100
"It [music] has no zoological utility; it corresponds to no object in the natural environment; it is a pure
incident of having a hearing organ...our higher aesthetic, moral, and intellectual life seems made up of
affections of this collateral and incidental sort, which have entered the mind by the back stairs, as it
were, or rather have not entered the mind at all, but got surreptitiously born in the house."
William James, 1890
In recent years the historical origins of human cognitive abilities have become an intense topic of discussion. Many
have argued quite convincingly that evolutionary forces have shaped cognitive capacities to assist us in solving
environmental problems, thereby increasing reproductive fitness. Numerous cognitive abilities that are present in
humans have been considered in light of natural selection, and language in particular has received a large amount of
attention (e.g. Pinker and Bloom, 1990; Pinker, 1994; Jackendoff, 1994). More recently, the evolutionary history of
musical ability has been examined, largely from an adaptationist perspective as a trait shaped by natural selection
(Wallin et al., 2000). However, processes other than direct adaptation can also account for its origins and are consistent
with a modern evolutionary framework. Two such processes, cultural transmission and exaptation, seem especially
suited to an evolutionary theory of the origins of music.
Examining the evolutionary origins of music cognition or any other psychological phenomenon requires grappling with
some of the most central and difficult issues in cognitive psychology, cultural anthropology, and evolutionary biology.
Within the traditional framework of the social sciences, behavioristic psychology was given the limited role of
explaining the general learning mechanisms through which culturally transmitted knowledge is acquired. More
recently, the more domain-specific and modular cognitive psychology has formed a stronger allegiance with the
biological sciences, and has attempted to explain the structure of the adult mind in terms of evolutionarily-derived
mental mechanisms rather than acquired cultural traditions (see Tooby and Cosmides, 1992). An extreme adaptationist
version of psychology, however, is no improvement over an equally extreme behaviorism. By taking one perspective
over the other, the psychologist risks losing half of the field's potential for exploring the origins of human mental
processes.
We believe that arguments concerning the evolutionary history of a cognitive phenomenon such as music and its
corresponding neural structure must first deal with two issues. First, both evolutionary and cultural processes can
explain the origins of these cognitive structures in the adult mind. Second, both adaptationist and non-adaptationist
processes can account for the evolutionarily-derived structures. If one is going to argue that music or any other mental
file:///g|/Tue/Justus.htm (1 of 10) [18/07/2000 00:36:24]

process is a biological adaptation that has been shaped by natural selection, two essential cases must be made. First,
one must be prepared to argue that the process is based on domain-specific innate universals that are not specifically
devoted to other cognitive abilities. Otherwise its developmental origins may be explained more parsimoniously via
mechanisms of learning and cultural transmission, rather than through natural selection. Second, one must be prepared
to argue that the purpose of this mental process is the same function that facilitated its selection. Otherwise its
evolutionary origins may be explained more parsimoniously via mechanisms of exaptation rather than adaptation.
The Evolutionary and Cultural Origins of Cognitive Structure

Evolutionary biologists have faced a number of conceptual difficulties in ascribing adaptive explanations to an
organism's physical structure (Williams, 1966; Gould and Lewontin, 1979; Depew and Weber, 1995). Adaptive
explanations of cognitive abilities are not only burdened with these same problems, but must also address the unique
dilemma that the structure of cognition is not readily apparent. Cognitive abilities, unlike vertebrate eyes, avian wings,
and canine teeth, are not physical structures whose form can be readily identified and examined for their functional
utility for the organism. Cognitive abilities are dependent upon the structure and specific organization of the nervous
system. However, we are often unsure of the basic neural organization that produces a particular cognitive ability.
From a biological standpoint this is an unfortunate situation since it becomes impossible to evaluate directly whether
these abilities rely upon structurally unique forms that have been shaped by natural selection. Instead, we are often
forced to limit our discussions to the behaviors that are produced by this underlying neural organization. Unfortunately,
behaviors can be the product of multiple cognitive systems, many of which may not be specifically devoted to the
behavior in question.
The Analogy between Maturation and Learning, Evolution and Culture
Effects of Maturation and Learning on Cognitive Structure. Cognition is the product of a nervous system as it interacts
with the environment. Conceptually, we identify two processes by which the structure, and therefore the behavior, of
the nervous system can be determined and changed. The first process includes relatively dramatic changes that occur
through the interaction of an organism's genetic makeup with the environment during development. We often call such
changes maturation. These changes represent the unfolding of the genetic blueprint of the organism. The second
process includes relatively subtle yet still permanent changes that occur in response to environmental stimuli. We often
call these changes learning. Although these processes vary in the magnitude of their effect on the organism, both are
dependent upon the interaction of an organism's genetic endowment and its environment. In the case of maturation, an
environmental context in which genetic programs can unfold is required. In the case of learning, genetically specified
learning mechanisms upon which environmental information can act are required. Because these two processes share a
similar outcome, which is the specification of the structure of the nervous system, it is difficult to distinguish where
one ends and the other continues. It should be clear that we are not speaking of the traditional dichotomy between
genetics and the environment where a greater amount of direct genetic specification means a lesser degree of
environmental influence. Genetics and the environment, broadly construed, produce the structure of an organism
through their concerted actions and these two paths in many cases cannot be considered independently of one another
(see Tooby and Cosmides, 1992).
Historically, however, maturation and learning have been considered two separate processes, and have been cast
throughout the history of philosophy and psychology as dichotomies between nativism and empiricism, innate and
acquired traits, nature and nuture, and genetics and the environment. Despite this deep historical division in the way we
think of an organism's unchanging biological endowment and that which is acquired from the environment, these
processes overlap and interact in interesting and potentially powerful ways. For instance, in organisms with an evolved
system of learning and memory, biologically relevant information can also take the form of shared group knowledge.
This information can be transmitted either through observing older members of the group or it can be encoded into a
representational system. Organisms that use a representational system must share the ability to use such a code, the
most common example of which is language. The sharing and intergenerational transfer of such information has been
called cultural evolution and cultural transmission (see Key and Aiello, 1999) as well as memetic transmission
(Dawkins, 1976). Cultural evolution in humans not only carries simple types of information, but can also embody very
complex constructions such as religion and social customs. According to Mead (1964), "the term cultural transmission
covers a series of activities, all essential to culture, which it is useful to subdivide into the capacity to learn, the
capacity to teach, and the capacity to embody knowledge in forms which make it transmissible at a distance in time or
space."
Differences between Biological and Cultural Transmission. Although heavily intertwined there are clear differences
between biological and cultural information transfer. Biological information, the information that is carried by an
individual's genetic makeup, is resistant to change and therefore has high fidelity across many generations. It cannot be
altered within the lifetime of an organism, but only between generations. In addition, a substantial number of changes

that do occur in the genetic code are eliminated, either because they are incompatible with life or result in a decrease in
the relative reproductive success in the organisms that possess this change. Other changes to the genetic code are
neutral, neither conferring an immediate adaptive advantage or disadvantage (Patterson, 1999). Bits of cultural
information, or memes, are also capable of transmitting information that is relevant to reproductive success. Unlike
genetic transmission, memetic transmission has relatively low fidelity and can incorporate or delete changes multiple
times across a single generation (Dobzhansky and Boesiger, 1983). In addition, we are not burdened with an
unalterable set of memetic information. Humans can modify or disregard information that has become obsolete and is
no longer useful. The invention of written languages has greatly diminished the permanent loss of cultural information.
However, if one considers only the active set of commonly accepted cultural information, we are faced with an ever
changing entity where only the most critical pieces of information retain stability in the knowledge set. The cognitive
abilities, such as language and social group formation, that are required to use, reshape, and convey such a knowledge
set across generations are stable, and perhaps biologically-based cognitive adaptations. The temporal aspects and
fidelity of these two modes of transmission are quite different from each other but do seem to share the ability to be
acted upon by selection when the solutions they provide have utility for the organism.
It is easy to understand how such a system of culturally carried information that increases the reproductive fitness of
members of the social group could provide a powerful basis for reproductive success. So powerful is the potential of
such a system that some researchers have argued that it is of greater significance than biological evolution in
explaining our unique cognitive abilities.
"That adaptation to cultural change is more important to humans than adaptation by genetic change is
incontrovertible; changed genes are transmitted only to direct descendants of the individuals within
which the changes arise. Many generations of selection are needed in order to confer the benefits of the
changed genes on the whole species. Changed ideas, skills, or inventions can be transmitted, in
principle, to any number of persons within a single generation." (Dobzhansky and Boesiger, 1983)
The Learnability of Musical Knowledge: Differentiating Development and Enculturation
Both an extreme nativist and an extreme empiricist argument can simultaneously incorporate memetic and genetic
modes of information transfer. For the empiricist, the genetics of an organism can specify a general learning
mechanism that is capable of acquiring musical ability. Memetics then provides the information that this general
system uses to produce music cognition. In this view it is the general learning mechanism that provides a reproductive
advantage and the construction of musical ability by this mechanism is largely irrelevant to its evolutionary history. For
the nativist, musical ability is a direct outcome of a specific neuronal architecture that has provided reproductive
success to the organism and has been open to natural selection. These arguments differ in the degree of specificity that
the biology of the nervous system carries to the situation and also differ in the emphasis of where selection has had its
effect, on the general learning mechanism or on more specific mechanisms.
Chomskian Learning Theories. Perhaps no one has argued for specific cognitive mechanisms as convincingly as Noam
Chomsky (1975), who has described development in terms of species- and domain-specific learning theories. A
learning theory is the set of innate mental mechanisms that all members of a particular species use to acquire
information about a particular domain from the environment. In some cases, learning mechanisms may seem to be
relatively general and rely on laws of association. In others, a high degree of innately-specified "knowledge" must be
present in order for the organism to draw the right conclusions. Chomsky's learning theory of interest was for language
acquisition, but this kind of knowledge can also be extended to other domains, including music. Lerdahl and
Jackendoff (1985) have provided one detailed hypothesis of the mental mechanisms we have for structuring musical
information.
Adults possess a great deal of knowledge about music that is represented in the structure of their brains. As we have
suggested, the first question that evolutionary musicology has to face is the respective roles of biological evolution and
cultural transmission in getting that structure in place. Although it does not make sense to expect a clear distinction
between knowledge that is innately specified and knowledge that is learned, it does make sense to distinguish between
(1) knowledge that is acquired via the unfolding of developmental programs interacting with environmental universals
and (2) knowledge that is acquired via the process of enculturation, having accumulated throughout thousands of
generations of human culture. In particular, the more recent cultural developments, such as the European
tonal-harmonic system, are likely not to have been involved in any co-evolutionary processes. Natural selection
certainly played a role in setting up the general learning mechanisms and linguistic abilities that allow cultural
transmission. However, a reliance upon biological adaptation to account for specific kinds of knowledge that could be
explained by cultural transmission may not be warranted.
The Example of Language. Another reason why learning theories are important to the discussion of music evolution is

the example of language itself. It has been argued that language is an adaptation shaped by natural selection (Pinker
and Bloom, 1990). The primary reason why this argument is compelling is not based on the current utility of language;
it is based on the fact that the entire field of linguistics provides us with a learning theory for the domain of language
that is unparalleled by research in any other cognitive domain. The kinds of innate constraints we have for learning
language are complex, specialized, and seem to be well-designed for the task they serve. This kind of
argument-from-design for language as an adaptation is not possible without first having the foundations of a learning
theory for language that was provided by Chomsky and his followers.
If one is going to talk about music and evolution, a learning theory for musical knowledge would help reveal the kinds
of knowledge that are genetically specified. This is the information that has been carried throughout human history and
is open to traditional forms of biological evolution. Certainly there is more to music than this biological system, but it
is the domain of cultural transmission and not biological evolution that must explain the elements that are not innate.
We will now set aside the issue of culturally transmitted contributions to the structure of the adult mind and turn the
discussion to these evolved cognitive mechanisms.
Evolved Cognitive Structure: Functions and Uses

The question that remains is determining the origins of this particular neural structure. The patterns of innate neural
connections in the brain are very complex, and many evolutionary theorists, including Darwin (1859), have contended
that natural selection is the only evolutionary mechanism that can explain high levels of biological complexity. For
many low-probability, highly complex arrangements of matter, the origins of the genotypes that specify them are to be
explained by selection pressures on the phenotypes produced in the living organism. It is not so trivial in the case of
music, however, to make the claim that musical perception, cognition, and behavior was the original phenotype for
which these patterns of neural connectivity were selected. In many cases, music cognition shares its innate mechanisms
with other cognitive domains, ones for which more plausible natural selection scenarios can be suggested. The fact that
music does not seem to pass the test of special design will be explored in this section. First, however, we will discuss
the importance of the argument from design in evolutionary biology, and why other attempts to argue for music as an
adaptive trait must deal with this fundamental issue of function and use in the long run.
The Argument of Special Design and Some Evolutionary Fallacies
The Distinction between Function and Use in Evolutionary Biology. This distinction between complex mental structure
that originated because of selection for a behavior of interest (in this case musical behavior) as opposed to another
behavior is an echo of the distinction between function and use in the field of evolutionary biology (see Williams,
1966; Gould and Lewontin, 1979). A function of a given structure is the purpose for which the structure was selected,
while a use is a purpose that the given structure allows that is not the purpose for which it was selected. It has also been
argued that the term "adaptation" be reserved for structures whose modern functions of interest are the same purposes
for which they were selected; structures whose original functions and modern uses differ are sometimes referred to as
"exaptations" (Gould and Vrba, 1982). The issue of teasing apart original function from current use is no less of a
problem when we are dealing with cognitive structure than when we deal with macroscopic structure (Lewontin, 1990).
These distinctions are crucial if one is going to label a particular behavior an evolutionary adaptation. In order to make
such an argument, that behavior must be the function (and not merely a use) of the neural structures that specify it. In
the case of music, we are left with the question of if on one hand, our innate abilities that lend themselves to musical
processing and behavior are there because of the fact that they lend themselves to such abilities, making music the
function of these innate mechanisms as well as an adaptation, or if on the other hand, these neural structures are there
because of selection pressures in other domains, making music one of perhaps many uses or exaptations of these
mechanisms. Arguments have been made extensively on both sides of this issue as it pertains to language (see Pinker
and Bloom, 1990).
The Fallacy of Inferring Function from Use. The importance of this distinction between original function and current
use has been either underestimated (Huron, 1999) or explicitly denied (Miller, 2000; Brown, 2000) in recent treatments
of music evolution. Specifically, consider the following statement by Brown (2000).
Music making has all the hallmarks of a group adaptation and functions as a device for promoting group
identity, coordination, action, cognition, and emotional expression. Ethnomusicological research cannot
simply be brushed aside in making adaptationist models... Music making is done for the group, and the
contexts of musical performance, the contents of musical works, and the performance ensembles of
musical genres overwhelmingly reflect a role in group function. The straightforward evolutionary

implication is that human musical capacity evolved because groups of musical hominids outsurvived
groups of nonmusical hominids due to a host of factors related to group-level cooperation and
coordination (pp. 296-297).
Brown not only has inferred evolutionary function from current use, but has also implied that by disagreeing with him
we insult the field of ethnomusicology. We would disagree on both points. First, decades of evolutionary biological
theory exist that would take major issue with the "straightforward" conclusion derived above. Second, from the cultural
perspective disagreeing only implies that certain musical phenomena fall in the domain of cultural anthropology rather
than evolutionary biology. In fact, comparative ethnomusicological research may be an essential tool if we are to
determine which elements of music are the products of evolution and which are the products of culture.
Often closely following on the heels of the modern function argument is the claim that the evidence for a theory of
music evolution will be found in studies of modern evolutionary fitness (see Miller, 2000) and genetics (see Huron,
1999). In other words, we should examine the purposes of musical behavior in modern human society and observe if
such behavior leads to different levels of individual reproduction or is associated with differences in genetics. This
particular kind of empirical evidence is not only likely to be impractical (see Lewontin, 1990), but fundamentally
incapable of answering the critical question at hand (see Symons, 1992), as we explain next.
The Fallacy of Inferring Heritability from Reproductive Success. Consider the first question of whether musical ability
leads to increased reproduction. First, assuming that if musical ability had been selected as an adaptation, the once
essential variability may now be significantly reduced in modern humans by the forces of selection. Second, even if
there were a significant correlation with reproductive rate, it would likely require an overwhelmingly large data set in
order to observe it. Third, assuming that there were a correlation and that we could measure it, we would have then to
prove somehow that the differences in musicality were based on differences in genetics; we would have to show that
the differences are heritable. This makes the strength of argument from this first question entirely dependent on a
particular answer for the second: that differences in musical ability are related to differences in genetics.
The Fallacy of Inferring Adaptation from Heritability. Are there genes for music? Symons (1992) discusses how there
are really three such questions, which we apply to music here. In the ontogenetic sense, the answer is a trivial yes to
any question about genes and behavior. Everything about being human is inextricably linked to numerous genetic
processes. In the heritability sense, the answer is yes if any population variance in musical ability is caused by genetic
variance. The answer to this question is probably yes, but this would not say anything about whether or not music is an
adaptation. One might find that good readers and poor readers have statistically different genetics, but that does not
change the fact that written languages are a cultural invention. Lastly, in the adaptationist sense, the answer is yes if
there is evidence from studies of phenotypic design. If experimenters are interested in the adaptationist form of the
question, but design experiments that test for heritability, they have not accomplished anything. Just as careful
consideration of the question of reproduction rates requires asking a question about genetics, careful consideration of
the question of genetics requires asking a question about phenotypic design. Why not begin with the most fundamental
issue of all, the issue that in the long run will require an answer if you start out by examining reproduction and
genetics? This is the issue of special design.
Special Design for Music? The Relationship of Music to Other Cognitive Domains
The primary reason for skepticism regarding a musically adaptive explanation of the origins of neurological
connectivity relating to music is the fact that music shares mechanisms with other domains. The mere existence of
other kinds of thinking and other kinds of behavior that use cognitive mechanisms similar to that of music decreases
the odds that music is the evolutionary function of such mechanisms. Furthermore, many of these other forms of
cognition have a more compelling argument from design. These shared mechanisms fall into two general categories,
which we will consider below. The first category includes general mechanisms of perceptual organization and
cognitive representation. The second category includes the more specific domains of language and emotion, whose
mechanisms are possibly more specific and specialized than those of the first group. Similar arguments could be
extended to the domains of timing, motor control, and other capabilities related not only to perception but performance
as well (Justus and Hutsler, in preparation).
Helmholtzian Perceptual Organization. If one were to ask what perception is for, in the evolutionary sense, a
reasonable answer might be that perception is designed to provide an organism with useful knowledge about its
environment. In the auditory realm, a fundamental problem is to segregate the large set of frequency information
entering the ear into categories corresponding to different environmental objects and events. Bregman (1990) has
termed this process "auditory scene analysis." In the natural world, most sound sources produce harmonic vibration. If
we assume that part of the evolutionary function of the auditory system is to categorize frequencies on the basis of
what environmental objects and events are producing them, then it should not be surprising that the system makes the

assumption that tones with similar harmonic spectra should be categorized together, as if they were part of the
harmonic vibrations emanating from a single source. Such heuristics recall Helmholtz's (1867/1925; 1877/1954) ideas
of unconscious inference in perception.
In many cases, fundamentals of our perception of music seem to be governed by such Helmholtzian principles of
perceptual organization. Candidate universal "musical" principles that fall into this category include octave
equivalence, consonance, perceptual fusion, and pitch. Here we have cases in which a universal constraint of musical
processing, and perhaps even an innate universal, seems to be either an implementation (in the case of fusion and pitch)
or a direct consequence (in the case of octave equivalence and consonance) of harmonicity-detection mechanisms that
more plausibly evolved as a perceptual heuristic. It would not make sense to propose a separate adaptive value for
these mechanisms as they apply to music making; they were already put in place by another system. In fact, there may
be no need to hypothesize special innate mechanisms for learning to perceive pitch and the relationships between
musical intervals at all. Given that our adult knowledge in this case is reflecting acoustic regularities in the
environment, only the general mechanisms of perceptual learning are required (Bharucha and Mencl, 1996). It is
reasonable to suppose that at least pitch perception based on harmonicity and octave equivalence are universal features
of human auditory processing, either innate or learned as an unavoidable consequence of experience with physical
universals, given that infants seem to categorize frequency complexes based on pitch and regard octave-spaced tones as
equivalent (see Clarkson and Clifton, 1985; Demany and Armand, 1984).
Gestalt Perceptual Organization. These first examples are primarily concerned with the perception of simultaneous
auditory or musical events. The principles that govern melodic grouping are an example of the perceptual organization
of events unfolding over time. One class of perceptual mechanisms that seem to be employed extensively in melodic
grouping are those of Gestalt psychology, exemplified by the work of Wertheimer (1924/1950). The kinds of principles
that can govern auditory organization are the same as those described for vision: proximity, similarity, good
continuation, and common fate. The dimensions along which these principles operate include frequency, amplitude,
timbre, temporal position, and spatial location (see Bregman, 1990; Deutsch, 1999), just as in the visual realm they
might apply to different dimensions such as hue, brightness, saturation, spatial location, and temporal position. The
kinds of rules that govern what a melody can be seem to stem nicely from Gestalt principles. These principles may be
an example of a kind of innate knowledge that is applied to the perception of music. It hardly seems plausible that such
principles evolved so that they could contribute to the musicality of humankind. If anything, they evolved (probably
well before Homo sapiens did) in order to increase the fluency with which organisms interpret their environments.
Conceptual Representation. We now turn to the more cognitive or representational end of the perception-cognition
continuum. Gestalt psychology also serves well to explain cognitive principles used in music perception. Krumhansl
(1990) has described how tonality can be considered a holistic property. Individual tones within tonal contexts are not
perceived in an atomistic manner, but rather in terms of their relation to the whole. Three contextual principles
summarize the findings of such top-down processing on the relationship between tones: contextual identity, contextual
distance, and contextual asymmetry. These findings parallel more general principles of cognitive representation, and
could just as easily describe our knowledge about colors or artifacts (see Rosch, 1975; Tversky, 1977; Krumhansl,
1978). Again, here we have a case in which a major constraining principle in musical organization seems to be an
implementation of more general cognitive principles. Even the concept of musical key, which in the European
tonal-harmonic system requires quite an abstract representation, can be thought of as an extension of similar principles
of cognitive organization.
Another feature of music that illustrates cognitive principles is the existence of musical schemata, representations of
the musical regularities in one's culture. The most basic musical schemata contain information about the transition
probabilities between musical events. For example, American listeners show implicit knowledge of the chord transition
probabilities of tonal-harmonic music (Bharucha and Stoeckig, 1986). We have already discussed how it is to an
organism's advantage to make perceptual inferences from an ambiguous signal in order to interpret the environment
effectively. It is also to the organism's advantage to recognize patterns over time that it encounters frequently in order
to make predictions and process subsequent information more efficiently. Elements of many musical systems seem to
take advantage of the brain's ability to make predictions based on prior experience.
Language. As previously mentioned, the most extensive application of a "linguistic" approach to music is Lerdahl and
Jackendoff's (1983) Generative Theory of Tonal Music. Taking a linguistic approach to the study of music, however,
does not presuppose that the cognitive structure underlying music is the same as that of language. The parallel often
suggested between the syntax of language and the structure of music does not seem to hold. One of many fundamental
differences between the two is that while language is concerned with absolute grammaticality (either a sentence is
well-formed or it is not), music is concerned with preference (one particular interpretation is preferred over several
good interpretations).

There are some parallels, however. Patel (1998) has proposed a shared structural integration resource (SSIR)
hypothesis, which suggests that the cognitive operations performed within the domains of language and music are
distinct but that common systems are used in the process of structural integration. Such an organization explains one of
the most compelling similarities between music and language: a hierarchical structure that unfolds over time. One piece
of evidence to support this SSIR hypothesis is the fact that a particular event-related potential (ERP) known as a
correlate of syntactic processing in language (the P600) also is observed when listening to music that deviates from
expected patterns (Patel et al., 1998).
This ability to integrate patterns over time, to build expectations based on what was just perceived while at the same
time integrating each new piece of information, seems to be a good candidate for a universal feature of music. At the
same time, such an ability seems not to be unique but shared with language, a more complex domain with a more
compelling evolutionary story. In fact, one might argue that such processes in language began as more elementary
temporal pattern recognition processes. Recent brain imaging research has found evidence that the amount of
processing of a visual sequence within Wernike's area, a posterior area of the superior temporal gyrus associated with
language comprehension, is correlated with the predictability of the sequence (Bischoff-Grethe et al, 2000).
Emotion. The link between music and emotion is also a compelling issue for any psychological theory of music. Again,
it seems to be the case that there is no need to postulate some sort of special mechanisms on the part of music that lead
to its emotional qualities. Consider some of the basic findings of research on cognition, physiology, and emotion.
William James (1890) believed that the physiological response of the body to an emotional event preceded the
cognitive interpretation of what the experienced emotion was. This idea was strengthened and modified by the work of
Stanley Schacter (e.g. Schacter and Singer, 1962) who found that an otherwise unexplainable physiological reaction
was necessary for an emotional experience, which would then be interpreted in different ways depending upon the
context surrounding those reactions.
One kind of experience that can lead to an emotional reaction is the disruption of a cognitive schema (Mandler, 1984).
As mentioned earlier, abstracting information about past experiences into general principles about regularities in the
world is an adaptive trait, allowing organisms to process information more quickly when it is predicted by the schema.
When events occur that violate the predictions of the schema, physiological arousal occurs to indicate that something in
the environment requires attention. This arousal may then be experienced as an emotion, such as fear.
The emotions associated with music may emerge from such a process of schema disruption and interpretation. In fact,
the very nature of music's aesthetic value may depend on its ability to fulfill and violate expectations based upon
previous experience to varying degrees (Meyer, 1956; Dowling and Harwood, 1986). The fact that our knowledge
underlying such expectations seems to be modular and therefore ineffable may contribute critically to this aesthetic
experience (see Fodor, 1983; Raffman, 1993; Justus and Bharucha, submitted).
Our point is not to devalue the emotional qualities of music, but rather to illustrate how they may be a natural
consequence of the ways in which prior experience, physiology, and cognition interact to create emotional experiences
in general. Again, if this quality of music seems to be an implementation of a broader psychological principle, there is
no need to postulate special musical mechanisms underlying the emotions it causes us to experience.
Conclusion: An Evolutionary Theory for Music

Music involves a diverse number of cognitive processes, including many instances of innate knowledge. These
processes have parallels with other cognitive domains, which suggests that they may not be musical processes per se. It
seems reasonable to hypothesize that genetically-specified mental mechanisms are recruited for musical processing.
However, the evolutionary history of these mechanisms may be explained entirely in terms of selection pressures for
any number of other cognitive abilities, including perceptual, linguistic, and emotional processing. The exaptation of
these processes for music may have been followed by secondary selection pressures. However, it is premature to
assume that such selection for music existed without a firm demonstration that other cognitive mechanisms combined
with the system of cultural transmission are not sufficient to explain the origins of human musicality.
What would it take to argue for such a position? First, an innate constraint in musical processing that seems to be
unique or at least primary to music must be described precisely. Perhaps when the study of music from a cognitive
science perspective begins to catch up with linguistics, with the aid of a more integrated developmental psychology and
ethnomusicology, we will be in a position to evaluate whether or not any such unique biologically-specified
mechanisms exist. Second, a convincing argument must be made for why these particular mechanisms, used in the
context of musical activities, would have been well-designed to give Homo adaptive advantages. While it is tempting to

start a discussion of music's evolution with speculation of how music would have been adaptive to our ancestors, doing
so without discussing what exactly music cognition corresponds to developmentally fails to rule out any number of
alternative possibilities.
In our view, given the current state of knowledge about music and the brain, there is no compelling reason to label
music as an evolutionary adaptation to the exclusion of many other hypotheses. Such a conclusion is not inconsistent
with the belief that music is a universal and cherished human trait, as many universal human qualities may share
similar evolutionary pasts. Nor is it inconsistent with the belief that music is a product of the evolutionary process,
since as we have discussed even the most general processes of learning and cultural transmission are products of an
evolved brain. Instead, music may illustrate a unique combination of non-adaptationist evolutionary processes and
cultural transmission. Whether any part of music has entered the house of the human mind from the front stairs, to use
James's metaphor, remains an open question.
References
Bharucha, J.J. and Mencl, W.E. (1996). Two issues in auditory cognition: Self-organization of octave
categories and pitch-invariant pattern recognition. Psychological Science, 7, 142-149.
Bharucha, J.J. and Stoeckig, K. (1986). Response time and musical expectancy: Priming of chords.
Journal of Experimental Psychology: Human Perception and Performance, 12, 403-410.
Bischoff-Grethe, A., Proper, S.M., Mao, H., Daniels, K.A., and Berns, G.S. (2000). Conscious and
unconscious processing of nonverbal predictability in Wernike's area. Journal of Neuroscience, 20,
1975-1981.
Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge,
MA: MIT Press.
Brown, S. (2000). The "musilanguage" model of music evolution. In N.L. Wallin, B. Merker, and S.
Brown (Eds.), The Origins of Music (pp. 271-300). Cambridge, MA: MIT Press.
Buss, D.M., Haselton, M.G., Shackelford, T.K., Bleske, A.L., and Wakefield, J.C. (1998). Adaptations,
exaptations, and spandrels. American Psychologist, 53, 533-548.
Chomsky, N. (1975). Reflections on Language. New York: Pantheon.
Clarkson, M.G. and Clifton, R.K. (1985). Infant pitch perception: Evidence from responding to pitch
categories and the missing fundamental. Journal of the Acoustical Society of America, 77, 1521-1528.
Darwin, C. (1859/1964). On the Origin of Species. Cambridge, MA: Harvard University Press.
Dawkins, R. (1976). The Selfish Gene. Oxford: Oxford University Press.
Demany, L. and Armand, P. (1984). The perceptual reality of tone chroma in early infancy. Journal of
the Acoustical Society of America, 76, 57-66.
Depew, D. J. and Weber, B. H. (1995). Darwinism Evolving: Systems Dynamics and the Genealogy of
Natural Selection. Cambridge, MA: MIT Press.
Deutsch, D. (1999). Grouping mechanisms in music. In D. Deutsch (Ed.), The Psychology of Music (2nd
ed., pp. 299-348). San Diego, CA: Academic Press.
Dobzhansky, T. and Boesiger, E. (1983). Human Culture: A Moment in Evolution. New York:
Columbia University Press.
Dowling, W.J. and Harwood, D.L (1986). Music Cognition. San Diego, CA: Academic Press.
Fodor, J.A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.
Gould, S.J. and Lewontin, R.C. (1979). The spandrels of San Marco and the Panglossian paradigm: A
critique of the adaptationist programme. Proceedings of the Royal Society of London B, 205, 581-598.

Gould, S.J. and Vrba, E.S. (1982). Exaptation - a missing term in the science of form. Paleobiology, 8,
4-15.
Helmholtz, H.L.F. von. (1925). Treatise on Physiological Optics. New York: Dover. (Original work
published 1867)
Helmholtz, H.L.F. von. (1954). On the Sensation of Tone as a Physiological Basis for the Theory of
Music. New York: Dover. (Original work published 1877)
Huron, D. (1999). An instinct for music: Is music an evolutionary adaptation? The 1999 Ernest Bloch
Lectures, Department of Music, University of California, Berkeley.
Jackendoff (1994). Patterns in the Mind: Language and Human Nature. New York: Basic Books.
James, W. (1890). The Principles of Psychology. New York: Holt.

Justus, T.C. and Bharucha, J.J. (submitted). Modularity in musical processing: Automaticity in the
Priming of Chords.
Key, C. A. and Aiello, L. C. (1999). The evolution of social organization. In R. Dunbar, C. Knight, and
C. Power (Eds.), The Evolution of Culture: An Interdisciplinary View. Edinburgh: Edinburgh University
Press.
Krumhansl, C.L. (1978). Concerning the applicability of geometric models to similarity data: The
interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.
Krumhansl, C.L. (1990). Cognitive Foundations of Musical Pitch. Oxford: Oxford University Press.
Lerdahl, F. and Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT
Press.
Lewontin, R.C. (1990). The evolution of cognition. In D.N. Osherson and E.E. Smith (Eds.), Thinking:
An Invitation to Cognitive Science, Volume 3 (pp. 229-246). Cambridge, MA: MIT Press.
Mandler, G. (1984). Mind and Body: Psychology of Emotion and Stress. New York: Norton.
Mead, M. (1964). Continuities in Cultural Evolution. New Haven, CT: Yale University Press.
Meyer, L. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.
Miller, G. (2000). Evolution of human music through sexual selection. In N.L. Wallin, B. Merker, and
S. Brown (Eds.), The Origins of Music (pp. 329-360). Cambridge, MA: MIT Press.
Patel, A. (1998). Syntactic processing in language and music: Different cognitive operations, similar
neural resources? Music Perception, 16, 27-42.
Patel, A., Gibson, E., Ratner, J., Besson, M., and Holcomb, P. (1998). Processing syntactic relations in
language and music: An event-related potential study. Journal of Cognitive Neuroscience, 10, 717-733.
Patterson, C. (1999). Evolution. Ithaca, NY: Comstock Pub. Associates
Pinker, S. (1994). The Language Instinct. New York: Harper Collins.
Pinker, S. and Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences,
13, 707-754.
Raffman, D. (1993). Language, Music, and Mind. Cambridge, MA: MIT Press.
Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7, 532-547.
Schacter, S. and Singer, J.E. (1962). Cognitive, social, and physiological determinants of emotional
state. Psychological Review, 69, 379-399.

Symons, D. (1992). On the use and misuse of Darwinism in the study of human behavior. In J.H.
Barkow, L. Cosmides, and J. Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the
Generation of Culture (pp. 137-159). Oxford: Oxford University Press.
Tooby, J. and Cosmides, L. (1992). The psychological foundations of culture. In J.H. Barkow, L.
Cosmides, and J. Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the Generation of
Culture (pp. 19-136). Oxford: Oxford University Press.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.
Wallin, N.L., Merker, B., and Brown, S. (2000). The Origins of Music. Cambridge, MA: MIT Press.
Wertheimer, M. (1950). Gestalt theory. In W.D. Ellis (Ed.), A Sourcebook of Gestalt Psychology (pp.
1-11). New York: Humanities Press. (Original work published 1924)
Williams, G. (1966). Adaptation and Natural Selection. Princeton, NJ: Princeton University Press.
Back to index

LONG-TERM EFFECTS OF MUSIC INSTRUCTION ON KINDERGARTENERS

SPATIAL-TEMPORAL REASONING: A LONGITUDINAL FIELD STUDY
Dr. Frances H. Rauscher
rauscher@uwosh.edu
Background:
Data suggest that schoolchildren who receive piano instruction score
higher on spatial-temporal tasks than children who do not receive
lessons. There is doubt, however, regarding the durability of the
effect. Our goal was to explore the effect's durability by following
a group of children through three years of on-again/off-again
instruction
Aims:
Our goal was to explore the effect's durability by following a group
of children through three years of on-again/off-again instruction.
method:
Year 1 (Kindergarten): Thirty-four children received eight months of
bi-weekly 20-min group keyboard lessons; 28 children received no
lessons. Children were pre- and post-tested with two
spatial-temporal tasks and one pictorial memory task.
Year 2 (First Grade): Some children did not continue instruction. We
thus pre- and post-tested three groups: (1) 14 received instruction
for one year only (post-testing occurred one year after lessons
terminated); (2) 17 continued instruction; (3) 17 received no
instruction.
Year 3 (Second Grade): All children received instruction. (1) 13
received lessons in kindergarten and second grade, (2) 16 received
lessons in all grades; (3) 17 began lessons in second grade.
Results:
Year 1: The music group scored higher on the spatial-temporal tasks
than the no music group. Pictorial memory was not affected.
Year 2: Children who received music training for one year only
scored lower when tested a year later. Scores of children who
continued lessons increased. Children who received no lessons scored
significantly lower than the other groups.

Year 3: Scores of children who received lessons only in kindergarten

improved significantly when their lessons were reinitiated in the
second grade. Children who received lessons in all grades continued
to improve, although not significantly (possibly due to a ceiling
effect). Children who began lessons in the second grade improved
minimally.
Conclusions:
These data support other studies finding improved spatial-temporal
task performance following music instruction. It appears that one
year of instruction is not enough for lasting effects, and that the
greatest effects are found for those who begin training at the
earliest ages.
Back to index

RHYTHM CATEGORIZATION ACUITY AS A MANIFESTATION OF RHYTHMIC SKILLS
George Papadelis
School of Musical Studies
Faculty of Fine Arts
University of Thessaloniki
Greece
Background
The research literature in the area of rhythm categorisation performance is

mainly concentrated on the demonstration of the categorical vs. continuous
(non-categorical) perception of musical rhythm patterns. The extremely few
experimental approaches investigate categorisation performance in respect to
certain physical attributes, such as durational ratios and relative dynamic
accent, which play a significant role in pattern's classification as a member
of a certain rhythm category. Related experimental work on various, non -
musical auditory continua revealed the importance of directing research towards
an exploration of categories' mental representations and their structural
aspects.
Aims
This study is directed towards a detailed description of the "perceptual rhythm

space", which highlights the basic structural aspects for rhythm categories and
serves as a basis for evaluating a person's rhythmic skills.
Method
The participants were twenty - one music students at undergraduate university

level. Before they entered the main experimental procedure (identification,
goodness rating and discrimination tasks), they were assigned to two groups,
with respect to their acuity on identifying rhythm patterns, whose structure is
characterised by small, hole number durational ratios (e.g. 1/2, 1/3, 2/3
etc.).
Results
Comparative analysis between the high and the low performance group revealed
that subjects' efficiency on classifying the best exemplars within each
category reflects the accuracy that characterises category's mental
representation throughout its hole range. Additionally, it was also concluded
that extreme tempo values and a high degree of pattern's structural complexity
affect mainly categorisation performance for these subjects who exhibit a low
degree of rhythmic skills.
Conclusions
A generalisation of the above results provides further evidence that

acquisition of rhythm categories is a developmental process from an initial
stage, where only a fuzzy outline of category's structure is available, to a
file:///g|/Tue/Papadab.htm (1 of 2) [18/07/2000 00:36:25]

final stage, where accurate and detailed representations are made and fine -
grain distinctions are possible.
Back to index
file:///g|/Tue/Papadab.htm (2 of 2) [18/07/2000 00:36:25]

TuePM3_4 Goebl
Proceedings paper
Skilled Piano Performance: Melody Lead Caused by Dynamic Differentiation

Werner GOEBL
Austrian Research Institute for Artificial Intelligence
Schottengasse 3, A-1010 Vienna, Austria
wernerg@ai.univie.ac.at
Introduction
Simultaneous notes in the printed score (chords) are not played strictly simultaneously by pianists. As reported in the literature, an emphasised voice is not only played
louder, but additionally precedes the other voices typically by around 30ms (melody lead; Hartmann, 1932; Vernon, 1937; Palmer, 1989, 1996; Repp, 1996). It is still
unclear whether this phenomenon is "a common expressive feature in music performance ... that aids listeners in identifying the melody in multivoiced music" (Palmer,
1996, 51). An alternative hypothesis is that it may be mostly due to the timing characteristics of the piano action (velocity artefact, Repp, 1996) and therefore a result of a
dynamic differentiation of different voices. Especially in chords played by the right hand, high correlations between velocity difference and melody lead (between melody
notes and accompaniment) seem to confirm this velocity artefact assumption (Repp, 1996).
The investigated data, derived mostly from computer-monitored pianos, represents the asynchronies at the hammer-string contact points. The present study will be focused
on asynchrony patterns at the finger-key contact times as well. Finger-key profiles represent what pianists initially do when striking chords. In this paper, we show that the
melody lead phenomenon disappears at the finger-key level. That means that pianists tend to strike the keys almost simultaneously, and it is only the different dynamics
(velocities) that result in the typical hammer-string asynchronies (melody lead).
Background
In considering note onset asynchronies, one has to differentiate between asynchronies that are indicated in the score (arpeggios, appoggiaturas) and asynchronies that are
performed but not especially indicated in the score. For the latter, two typical types have been observed in the literature: (1) Melody precedes other voices about 30ms
(melody lead), or (2) the melody lags in comparison to the other voices. Type 2 asynchronies occur mostly between two hands, e.g. a bass note is played clearly before the
melody (melody lag or bass lead), which is well known from old recordings of piano performances. Type 1 asynchronies are more common within one hand (especially
within the right hand, because melody often corresponds with the highest voice).
file:///g|/Tue/Goebl.htm (1 of 12) [18/07/2000 00:36:31]

TuePM3_4 Goebl
Note asynchronies have been studied in the literature since the 1930s, when Hartmann (1932) and the Seashore group (Vernon, 1937) made first investigations of piano
performances. Hartmann used reproduction piano rolls as a data source and found mostly type 2 asynchronies. Vernon (1937) differentiated between asynchronies within
one hand and asynchronies between different hands. For the former he observed Melody Lead (type 1), whereas the latter mostly show the bass note anticipation (type 2).
In the recent literature, Palmer (1989, 1996) and Repp (1996) studied the melody lead phenomenon. Palmer (1989) used MIDI keyboard recordings to analyse among other
topics chord asynchronies. Six pianists played the beginning of the Mozart Sonata KV. 331 and of Brahms' op.117/1 ("Schlaf sanft, mein Kind..."). The melody leads by
about 20 to 30ms; this effect decreased at unmusical performances and for melody voices in the middle of a chord (Brahms op.117/1). In a second study, melody lead was
investigated exclusively (Palmer 1996). Six pianists played the first section of Chopin's Prelude op. 28/15 and the initial 16 bars of Beethoven's Bagatelle op.126/1 on a
Boesendorfer computer-monitored grand piano (SE290, as in the present study). Like in the previous study, melody lead was found to increase with expressiveness, with
more familiarity with a piece (the Beethoven was sight-read and repeated several times) and with skill level (expert pianists showed a larger effect than student pianists).
In a more detailed study carried out at the same time, in part with the same music, but with a more differentiated methodology, Repp (1996) evaluated three repetitions of
10 individual performances of the whole Chopin Prelude op. 28/15, a Prelude by Debussy and the 'Träumerei' by Schumann. To minimise random variation, Repp
averaged over the three repetitions of each pianist. He then calculated timing deviations between the melody and each remaining voice, so that asynchronies of the right
hand (within hand) and between hands could be treated separately. He found that melody lead could be explained mostly with variability in the dynamic differences
between melody and accompaniment. These findings were proved by regressing dynamic differences (differences in MIDI velocity) with timing differences between
melody and the other voices. The correlations were generally higher for within right hand asynchronies than for those between hands.
Palmer's (1996) correlations were only computed between melody lead and the average velocity difference between melody and accompaniment and were mostly
non-significant. In her view, the anticipation of the melody voice serves primarily as a common expressive feature to communicate the performer's intention to the
audience. In a perception test, listeners had to identify melody in a multi-voiced piece by rating different artificial versions: one with intensity differences and melody lead,
one with melody lead only and one without any differences. The only difference in the ratings occurred between pianist and non-pianist listeners when the pianist listeners
better identified intended melody hearing the melody lead version. A version with intensity differences only was not tested (Palmer, 1996, 47). However, the ratings were
much higher for versions with intensity differences and melody lead than for those without any differences or with melody lead only.
Piano action timing properties

Considering the melody lead phenomenon, its physical background has to be clarified. The piano action shows some typical temporal properties, which are described best
by Askenfelt (1990), and Askenfelt & Jansson (1990, 1991). When pressing a key, the time from its initial position to key bottom ranges between 25ms (forte or 5m/s final
hammer velocity, FHV) and 160ms (piano or 1m/s FHV; Askenfelt & Jansson, 1991, 2385, see Figure 1a). At a grand piano the hammer impact times (when the hammer
excites the strings) are shifted in comparison to key-bottom times. According to measurements by Askenfelt (1990, 43), the hammer impact time at a forte touch is about
3ms after the key-bottom contact and 12ms before the key-bottom contact at a piano attack. These data only provide a rough idea of these timing properties. They are
plotted in Figure 1a, to give the overall impression.
Another measurement was made by Repp (1996), who was 'lucky' to have a Yamaha Disklavier on which the 'prelay' function failed to work, so he had the opportunity to
measure roughly a grand piano's timing characteristic. He fitted a quadratic function into his measurement points, which provided a range of about 110ms corresponding to
MIDI velocities between 30 and 100 (about 1 to 2ms onset deviation per velocity unit, Repp 1996, 3920).
We now have to differentiate between asynchronies at the beginning of the attack movement (finger-key contact) and asynchronies at its end (hammer-string asynchronies
or key-bottom asynchronies). Computer-monitored grand pianos, like Yamaha or Boesendorfer, store time points of hammer-string impact, which provide a robust
representation of the beginning of the sounding events.

TuePM3_4 Goebl
Aims
Almost nothing is known about finger-key asynchronies, because all electronic instruments used for acquiring performance data don't measure this parameter. However, to
clarify the origin of melody lead, it is very important to consider exactly those finger-key asynchronies. When pianists stress one voice in a chord, do they hit the keys
asynchronously or do they advise their fingers to initially accelerate the keys at the same instant of time, but with different velocities so that the hammers arrive at the
strings at different points of time?
The following hypothesis was tested preliminarily in the present study: most of the asynchronies measured in hammer-string domain (computer-monitored pianos) will
disappear in the finger-key domain. Pianists tend to accelerate the keys in synchrony, it is only different velocities that produce the typical hammer-string asynchronies,
known in the literature as melody lead. In other words: differences in velocity account for basically all the melody lead effect (velocity artefact).
Therefore, the basic investigation needs of this study are:
1. To determine precisely how long a hammer needs from its zero position to excite the strings at different final hammer velocities (timing correction curve),
2. To collect a large sample of reliable data of high performance quality (expert students or concert pianists),
3. To investigate what pianists do, when they are asked to emphasise one voice,
4. To evaluate hammer-string asynchrony profiles as well as finger-key asynchronies. If a pianist's aim is finger-key synchrony, the asynchronies observed at the
hammer-string level will basically disappear at finger-key level, or - at least - decrease distinctly.
In the course of this methodology, we try to prove more effectively the velocity artefact hypothesis proposed by Repp (1996) and to get a more detailed picture of what
pianists are doing or are forced to do, when playing a melody louder in a chord.
Method
Materials
The Etude op. 10/3 (first 21 measures) and the Ballade op. 38 (initial section, bar 1 to 45) by Frédéric Chopin were recorded on a Boesendorfer SE290
computer-monitored concert grand piano by 22 skilled pianists (9 female and 13 male). They were professional pianists, graduate students or professors from the Vienna
Music University, came well prepared to the recording sessions, but were nevertheless allowed to use the music scores during recording. Additionally the pianists were
asked to play the initial 9 bars of the Ballade in two versions: once particularly stressing the melody (first voice) and once stressing the third voice (the lowest voice of the
upper stave). The performance sessions were recorded onto digital audio tape (DAT) and the MIDI data from the Boesendorfer grand piano was stored on a PC's hard disc.
All performances came up to a very high pianistic and musical level and contained very few errors (overall error rate for the Etude and the Ballade was 0.34% and 0.66%
respectively).
Apparatus
The Boesendorfer SE290 has one set of shutters at the hammers, which provides two trip points, one as the hammer crown just starts to contact the string and the other 5
mm lower. These two trip points provide two instants in time as the hammer travels upward, and the time difference between these instants yields the final hammer
velocity (FHV, in meters per second), which can be transformed into MIDI velocity.
The instant at which the trip point at the strings occurs is taken as note onset time. The note onset times show a timing precision of 1.25 milliseconds, the FHV
measurement has a counter period of about 0.04 milliseconds.

TuePM3_4 Goebl
To avoid timing distortions in reproduction, the Boesendorfer SE290 uses a timing correction similar to the Yamaha's 'prelay' function, which corrects the timing
deviations for the different velocities. The Boesendorfer SE system recalculates the precise timing characteristics for each key individually by running a calibration
program. Unfortunately the detailed correction data was not made available to the author, but a curve was interpolated from data points provided by Wayne Stahnke, the
developer of the Boesendorfer SE system (Stahnke, 2000). This timing correction curve (TCC) represents the time interval from the initial acceleration of the key
(finger-key) to the hammer-string contact as a function of FHV (Fig. 1a) and MIDI velocity (Fig. 1b).
Figure 1a, 1b. The timing characteristics of a grand piano action - the hammer-string contact times as functions of final hammer velocity (left side) and MIDI velocity (right side, solid lines). The
y-axes represent the time intervals from the finger-key strike times to the hammer-string contact times. The data in Figure 1a provided by Askenfelt is drawn in dotted lines with asterisks, the
horizontal lines indicate the key-bottom times for the three notes (piano, mezzo forte, forte), which are temporally displaced relative to the hammer-string contact times (see text).
Procedure
All note onsets and the velocity information were extracted from the performance data. This data was corrected and matched to a symbolic score in which each voice was
individually indexed. The error rate was very low (0.34% or 0.66%, see above). Wrong pitches were corrected and wrong or missing notes marked as missing. Timing
differences and velocity differences between the first voice (= melody) and each other voice were calculated for each event in the score. From that, asynchrony profiles and

TuePM3_4 Goebl
average chord profiles can be computed that give an overview of asynchrony tendencies in the data. Missing or wrong notes, arpeggiated chords (Ballade) or appoggiaturas
(Etude), as well as extreme outliers (asynchronies larger than 150ms) were not used in calculation.
Using the timing correction curve (TCC, Fig. 1a) for each hammer-string impact time the corresponding finger-key time was calculated. So, a second set of performance
data was created, which represents an approximation of finger-key asynchronies. For this second data set the same asynchrony calculation procedure, explained above, was
applied.
Results
Hammer-string asynchronies
The average chord profiles are shown in Fig. 2 & 3 (Etude, Ballade, solid lines with asterisks). All pianists play the first voice consistently louder than the accompaniment,
so it can undoubtedly be called melody. As expected, the melody precedes other voices about 20-30 ms. In the Ballade the chord profiles are very similar to each other,
melody lead slightly increases the lower the voices are. The chord profiles of the Etude show more variability, especially in the left hand, where the bass voice (7) tends to
have a lead with some pianists. This corresponds to type 2 asynchrony pattern (bass lead, esp. Pianist 6, 12, 13, 14, 16 and 21). A real outlier is Pianist 3, who plays
melody up to 50ms before accompaniment. In this case, it is a deliberate choice or spleen that pianist 3 uses to emphasise melody (personal communication with pianist 3).
In the exaggerated first voice versions of the first 9 bars of the Ballade, the stressed voice is played louder than in the normal version (1.4m/s versus 1.0m/s on average),
while the accompaniment maintains its dynamic range. The melody lead increases up to 40 to 50ms (see Figure 4). The third voice versions show the same tendency, but to
a smaller extent. The third voice is played loudest (at about 1.2m/s on average), the melody is still quite prominent (about 0.8m/s) and the other voices are as usual. The
third voice leads about 20ms compared to the first voice, the remaining voices lag by another 20ms (Figure 4). Thus, when pianists are asked to emphasise one voice, they
play this voice louder and enlarge the dynamic distance to the accompaniment. Parallel to this, the timing difference increases correspondingly.
Figure 2 & 3. The average chord profiles for the average version (bold lines) and the 22 individual performances (grey lines). The profiles plot the averaged timing delays of the individual voices
relative to the melody (voice 1). The right line tree (solid lines, average version with asterisks) represents the hammer-string domain (h-st); the left tree (dotted lines, average version with circles)
indicates the finger-key domain (fg-k). Pianist 3 is outlined by the (coloured) lines with diamonds.

TuePM3_4 Goebl

TuePM3_4 Goebl
Figure 4. Average chord profiles of the 22 individual performances. The versions with the 1st & the 3rd voice stressed are indicated by asterisks or circles respectively (hammer-string domain, solid
lines). The finger-key domain is plotted in dotted lines.
Generally, it can be seen that the larger the dynamic differences, the greater the extent of melody lead. This overall tendency can be measured also for each single event in

TuePM3_4 Goebl
the data. All velocity differences between first voice and accompaniment were regressed against their timing differences. The correlation coefficients are shown in Table 1.
Table 1. Correlations between velocity differences and timing differences for 22 recordings and their average, two pieces and two hands. The significance level is indicated by asterisks (* p < .05; **
p < .01)
Etude Ballade
Piansts r (r.h.) r (l.h.) r (r.h.) r (l.h.)

nmax = 126 nmax = 104 nmax = 181 nmax = 269
1 -0.777** -0.562** -0.706** -0.596**
2 -0.593** -0.602** -0.546** -0.680**
3 -0.810** -0.575** -0.631** -0.468**
4 -0.620** -0.396** -0.511** -0.179**
5 -0.587** -0.085 -0.542** -0.564**
6 -0.771** -0.574** -0.653** -0.519**
7 -0.649** -0.478** -0.593** -0.512**
8 -0.606** -0.260** -0.360** -0.450**
9 -0.468** -0.228* -0.480** -0.402**
10 -0.670** -0.516** -0.562** -0.558**
11 -0.649** -0.241* -0.497** -0.572**
12 -0.704** 0.049 -0.655** -0.299**
13 -0.622** -0.159 -0.285** -0.452**
14 -0.577** -0.288** -0.665** -0.427**
15 -0.540** -0.690** -0.443** -0.481**
16 -0.575** 0.210* -0.619** -0.275**
17 -0.789** -0.372** -0.717** -0.703**

TuePM3_4 Goebl
18 -0.704** -0.450** -0.510** -0.537**
19 -0.720** -0.421** -0.607** -0.603**
20 -0.802** -0.141 -0.554** -0.435**
21 -0.822** -0.447** -0.761** -0.652**
22 -0.570** -0.240* -0.496** -0.421**
average -0.859** -0.462** -0.691** -0.707**
The within right hand coefficients are usually higher than the between hand coefficients (l.h. in Table 1), more so for the Etude than for the Ballade. The lower left-hand
correlations indicate a larger independence between the two hands. The often anticipated bass note is one example of this tendency, as it is displayed by Pianists 5, 12, 13,
14, 16 and 20 in the Etude. The correlation coefficients for the special versions of the Ballade show a similar picture and are not shown here.
In considering correlations of timing and velocity differences, a linear relation is measured. The timing correction curve of the piano action shows an inverse power
relationship (Fig. 1). However, to get at least an impression of the slope of TCC, the FHV - MIDI velocity data was approximated by a linear curve (Fig. 1b). Figure 5
shows the scatter plots of the timing and velocity differences of the average of the 22 recordings for both hands. The interpolated slope of the TCC (-.69) is slightly steeper
than the slopes of the scatter plots. The left hand slope of the Etude (-.27) is less steep, because of the asynchrony tendencies described above. In this figure, one can see
that the expected and the observed direction of the velocity artefact (slope of TCC and regression line of the average data respectively) are quite similar.

TuePM3_4 Goebl
Figure 5. MIDI velocity differences against timing differences of the average of the 22 recordings.
The dotted line indicates the linearly interpolated slope of the TCC function (slope = -.69).
Finger-key asynchronies
To provide an overview of the initial finger-key times, a special finger-key version for each recording was computed, where the onset times were corrected by TCC in
dependence of FHV. The average chord profiles of the finger-key versions are plotted as dotted lines in Figures 2 & 3, the average finger-key chord profile of the average
versions in solid lines with circles.
It can be easily seen that the chord profiles of the finger-key versions are much more synchronous than the hammer-string pattern. The melody lead for the right hand is
reduced to about zero, which seems to indicate a strong effect. The differences between the hammer-string and the finger-key profiles throughout all pianists, pieces and
voices are significant at p < .01 (two-tailed t-test), whereas the delays of the other voices relative to the melody in the finger-key profiles are all statistically non-significant
(p > .05). This result indicates that the deviations from zero in the finger-key profiles show no striking evidence.
The grey chord profile cluster, which gives an impression of the individual chord profiles of the 22 pianists, is more homogeneous for the Ballade than for the Etude. This

TuePM3_4 Goebl
could be explained by the more choral texture of the Ballade. The leftward tendency in Etude's voice 4 by some pianists (5, 8, 9, 11, 12, 16 and 22) could be due to the fact
that most voice 4 notes are syncopated (at the 2nd and 6th semi-quaver) and accented. So, voice 4 is only rarely played together with the bass (voice 7). It is more or less the
same pianists that display large bass anticipations, so that their between-hand correlations (Table 1) become quite low and statistically non-significant.
In the special versions of the Ballade (Fig. 4) the melody lead phenomenon disappeared in the finger-key domain as well, although the stressed voice changed into the
middle of the hand in the 3rd voice stressed version.
In all versions the left hand surprisingly shows a slight shift leftwards, which means that it is anticipated relative to the right hand by up to 10ms. Although this effect
shows no statistical significance (see above), it can be seen throughout all versions of the Ballade and - in a slightly different way - in the Etude (Fig. 4). It is an
asynchrony between hands which might be explained as a compensation tendency for a softer played hand.
In addition to this left hand leading, which is an unexpected phenomenon, another obvious asynchrony in the finger-key domain could be observed. Bass anticipations - the
type 2 asynchronies mentioned above - are made by some pianists (Pianists 5, 12, 13, 14, 16 and 20), who clearly strike the bass note earlier (around 50ms, up to 150ms
and higher in extreme situations).
The above mentioned Pianist 3, who plays deliberately asynchronously, still shows a reasonable melody lead of 10 to 15ms in finger-key domain (Fig. 2 & 3, diamonds).
This finding suggests that intentional melody lead (pianist 3, personal communication) remains observable even in the finger-key domain.
Discussion
The consistently high correlations between timing and dynamic difference show the strong overall dependency of melody lead on velocity. The more the melody is
separated dynamically from accompaniment, the more it precedes. Our way of calculation and our results coincide with Repp's study (Repp, 1996).
In addition to these findings, the finger-key versions show that most of the investigated melody lead phenomenon disappears at this level. Pianists start to strike the keys
almost synchronously, which is a strong support of the overall validity of the velocity artefact assumption. They begin their acceleration basically simultaneously, but
different velocities cause the hammers to arrive at the strings at different points in time.
Nevertheless, pianists clearly play asynchronously in some cases as well. First, the left hand leads globally in the finger-key domain by about 10ms, which is an
unexpected result. Is it possible that pianists compensate for the longer response time of softer notes, and therefore start to strike soft chords instinctively earlier? This is -
nota bene - an effect which occurs only between hands, and not within one hand. Second, as a special case of the first, the bass anticipations are extended up to 150ms in
some cases and about 50ms usually, when they occur. This distinct anticipation seems to be produced by the pianist's will, although probably often unconscious.
Another interesting finding is that pianists obviously are able to enlarge the melody lead deliberately beyond the usually observed 30ms, as Pianist 3 indicates. The
question is, whether it is possible for pianists to dynamically differentiate voices in a chord without producing melody lead in the hammer-string domain. In a common
sense view this opposite direction seems not to be possible. There is no example in the data which would prove the opposite.
The findings of this study confirm the assumptions by Repp (1996) more than those of Palmer's studies (1989, 1996). Nevertheless, melody lead is of course a
phenomenon which helps a listener to identify melody in a multi-voiced music environment. Temporally offset elements tend to be perceived as belonging to separate
streams (stream segregation, Bregman & Pinker, 1978) and spectral masking effects are diminished by shifting one masking voice by several milliseconds (Rasch, 1978,
1979). But in the light of the present data, it doesn't seem that pianists primarily produce melody lead in order to separate voices temporally. The temporal shift of melody
is more a result of differentiating voices dynamically. Melody lead is linked together with dynamic differentiation, as we have seen, however both phenomena have similar
perceptual effects, that is, separating melody from accompaniment.
The TCC used to create the finger-key onset points in time is still a preliminary approximation of the grand piano's action characteristics. However, in further research a

TuePM3_4 Goebl
more detailed correction matrix should be obtained.
Acknowledgements
This research was supported by the Austrian Federal Ministry of Education, Science and Culture in the framework of the START programme (grant no. Y99-INF). I want
to specially thank Wayne Stahnke, who gave generous insight into the functionality of the Boesendorfer SE system, and to the Boesendorfer Company in Vienna, which
provided the SE290 Imperial grand piano for experimental use. I am grateful to Gerhard Widmer, Simon Dixon and Emilios Cambouropoulos for correcting earlier
versions of this paper.
References
Askenfelt, A. & Jansson, E. V. (1990). From touch to string vibrations. I: Timing in grand piano action, Journal of the Acoustical Society of America, 88, 52-63.
Askenfelt, A. & Jansson, E. V. (1991). From touch to string vibrations. II: The motion of the key and hammer, Journal of the Acoustical Society of America, 90,
2383-2393.
Askenfelt, A. (ed.) (1990). Five lectures on the acoustics of the piano, Stockholm, (Publications issued by the Royal Swedisch Academy of Music, 64).
Bregman, A. S. & Pinker, S. (1978). Auditory streaming and the building of timbre. Canadian Journal of Psychology, 32, 19-31.
Goebl, W. (1999). Analysis of piano performance: towards a common performance standard? Paper presented at the Society of Music Perception and Cognition
Conference (SMPC99), Evanston, USA, August 14-17, 1999.
Hartmann, A. (1932). Untersuchungen über das metrische Verhalten in musikalischen Interpretaionsvarianten. Archiv für die gesamte Psychologie, 84, 103-192.
Huron, D. (1993). Note-onset asynchrony in J. S. Bach's two part inventions. Music Perception, 10, 435-444.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331-346.
Palmer, C. (1996). On the assignment of structure in music performance. Music Perception, 14, 23-56.
Rasch, R. A. (1978). The perception of simultaneous notes such as in polyphonic music. Acustica, 40, 21-33.
Rasch, R. A. (1979). Synchronization in performed ensemble music. Acustica, 43, 121-131.
Repp, B. H. (1996): Patterns of note onset asynchronies in expressive piano performance. Journal of the Acoustical Society of America, 100, 3917-3932.
Stahnke, W. (2000), developer of the Boesendorfer SE system. Personal communication.
Vernon, L. N. (1937). Synchronization of Chords in Artistic Piano Music. In: Carl E. Seashore (ed.). Objective Analysis of Musical Performance, Iowa: University
Press, Studies in the Psychology of Music, IV, 306-345.
Back to index
EBM_keele
Proceedings paper
Expectancy-Based Model of Melodic Complexity

Tuomas Eerola & Adrian C. North
University of Leicester
Music Research Group
University of Leicester
University Road
Leicester LE1 7RH
United Kingdom
{ tpe1, acn5}@leicester.ac.uk
1 Background
According to Berlyne (1971), preference for stimuli is related to their complexity or unpredictability.
Although this claim has been supported by a large number of studies in the field of music (reviews by
Finnäs, 1989; and Fung, 1995; North & Hargreaves, 1995), adequate objective ways of measuring the
originality or complexity of music are in short supply. The existing objective measurements, such as
the information-theoretic and tone transition probability (Simonton, 1984) models possess
well-known limitations: First, research within this tradition has tended to employ limited stimuli,
which have typically been specially-composed pieces or excerpts of classical music. Secondly, these
models do not address the role of the listener's perceptual system in organising the structural
characteristics of music.
Therefore the aim of the present research was to (i) devise an objective computer model of those
perceptual processes which underlie human listeners' musical expectations and complexity
judgements; (ii) determine the extent to which this model could predict experimental participants'
complexity judgements in response to a range of musical stimuli; (iii) determine the effectiveness of
the model relative to that of the previous models; and (iv) provide an example of the model's
application to real music.
2 The expectancy-based model of melodic complexity

Existing research on music cognition and particularly melodic expectancies has already offered
considerable insight into the processes by which human listeners perceive musical stimuli and
translate these perceptions into judgements of melodic complexity. The expectancy-based model
(EBM) of melodic complexity comprises a series of contributory factors, which are divided into tonal,
intervallic, and rhythmic factors. Tonal factors include tonal stability (Krumhansl & Kessler, 1982),
which is modified by metrical position (as established by Lerdahl & Jackendoff, 1983; empirically
file:///g|/Tue/Eerola.htm (1 of 7) [18/07/2000 00:36:34]

EBM_keele
supported by Palmer & Krumhansl, 1990; Thompson, 1994) and duration of tones (e.g. Castellano,
Bharucha & Krumhansl, 1984; Monahan & Carterette, 1985). These modifications are made since
they emphasise tones that occur in more prominent locations or possess longer durations: These
factors both lead to the increased perceptual saliency of tones.
Intervallic factors consist of principles derived from Narmour's (1990) implication-realization model.
The principles are proximity, registral return, registral direction, closure, intervallic difference, and
consonance. These principles are hypothesised to be innate and are based on a variety of Gestalt laws
applied to tone-to-tone continuations. Here the principles are used to measure the extent to which
these implied patterns are violated. The coding of the model is derived from Krumhansl (1995).
Rhythmic factors include rhythmic variability, which accounts for changes in the duration of notes,
syncopation, which measures the amount of deviation from the regular beat pattern, and rhythmic
activity, which is simply the number of tones per second. All three rhythmic principles have been
found to increase the difficulty of perceiving or producing melodies (e.g. Clarke, 1985; Povel &
Essens, 1985; Conley, 1981, respectively.).
In short, melodies which create expectancies that are clearly structured in terms of their tonal,
intervallic, and rhythmic properties tend to be easier to reproduce and recognise, and are also judged
by listeners as being less complex. The EBM processes MIDI melodies and produces a final melodic
complexity score by calculating the weighted sum of each principle. Full details of the EBM can be
found in Eerola and North (2000).
3 The predictive ability of the EBM and other models

An experiment featuring melodies of a wide range of complexity was designed in order to evaluate the
ability of the EBM, the information-theoretic, and the transition probability models to predict human
listeners' complexity ratings. The transition probability model was reproduced using the tone
transition data provided by Simonton (1984).
3.1 Musical stimuli
There were five blocks of stimuli. These comprised both short artificial melodies and more realistic
melodies. Block 1 contained nine test melodies taken from Simonton (1984), Block 2 contained five
melodies taken from Smith & Cuddy (1986) that the original authors rank ordered for complexity,
Block 3 contained seven melodies created by the authors for the present study, Blocks 4 and 5 each
featured 10 excerpts derived from stimuli employed by North & Hargreaves (1995). These were
representative excerpts taken from commercially released pieces. Block 4 comprised the melodic lines
of these 10 excerpts encoded as MIDI files. Block 5 comprised the original recordings of these same
10 excerpts, which allowed a comparison of the MIDI and 'real' versions of the melodies.
The melodies in Blocks 1-4 were transposed into C-major or C-minor where necessary, and were
transformed into MIDI files. Tempo was set to a constant and the timbre (pan flute) and velocity of
the tones in Blocks 1-4 were kept constant throughout the experiment.
3.2 Participants
The participants were 56 psychology undergraduates (M = 18.80, SD = 0.93) who rated the
complexity of 41 melodies on an 11-point scale. There was a high product-moment correlation
between mean ratings assigned to each melody by participants (r(39) = .96, p< .001), and therefore the
data from all participants were collapsed into a single group for subsequent analyses.

EBM_keele
3.3 Design and procedure

A repeated measures design was employed in which each participant responded to all the excerpts.
Participants rated each excerpt on an 11-point scale for complexity, on which 0 was defined as
"extremely simple", 10 as "extremely complex", and 5 as "midway between the two". Simple music
and complex music was defined for the listeners.
3.4 Results of experimental study
Multiple regression analyses were carried out to determine the relationship between participants'
complexity ratings and each of the EBM's predictors. The ability of the principles to predict
participants' ratings was high (R = .968, R2 = .937, F = 29.55, df = 10,20, p< .001). Five of the EBM
predictors were associated significantly with participants' ratings. These were tonality (modified by
metrical position and tone duration), two intervallic principles (namely registral direction and
intervallic difference), and two rhythmic principles (namely syncopation and rhythmic variability).
The redundant principles (namely rhythmic activity, registral return, consonance, closure and registral
direction) were deleted and a subsequent multiple regression analysis concerning the ability of the
revised EBM to predict participants' ratings confirmed that these deletions had little impact on the
predictive ability of the revision (R = .949, R2 = .900, F = 45,13, df = 5,25, p < .001). Therefore these
remaining principles constitute the core elements of the EBM and each principle can significantly
predict listeners' complexity ratings. The explanatory power of these principles is also consistent with
studies of music perception and production (Carlsen 1981; Krumhansl et al, 1999; Krumhansl, 1995;
Krumhansl & Kessler 1982; Schmuckler, 1989; Unyk & Carlsen 1987).
A correlation was calculated between the mean complexity rating assigned by participants to each of
the 31 melodies and measures of the complexity of the melodies produced by each of the three
computerised models. This revealed how the information-theoretic model (r=.58, N=31, p<.001) and
the transition-probability model (r=.74, p<.001) were both able to predict participant's ratings,
although the EBM (r=.95, p<.001) accounted for the greatest proportion of the variance in
participants' complexity ratings. This indicates that of the three models, the EBM was most capable of
predicting participants' complexity ratings of the test melodies (despite the differences between the
characteristics of the latter). The primary application of the EBM lies in using its ability to produce
automated measures of melodic complexity which might therefore predict listeners' preferences. This
application is considered in the following section.
5 Application of the EBM: Predictability and preference for the

Beatles' hits
In addition to those experimental studies that have explicitly tested the relationship between
complexity and preference for musical pieces (cited above), some researchers have adopted an
archival approach. This approach considers large samples of data and correlates findings with a range
of popularity indices. Notable in this area is work by Simonton (1984; 1994; 1995) who analysed the
melodic complexity of 15618 classical music themes and found clear relationships between the
melodic complexity of the themes and their popularity. A similar approach has been successfully
applied to the analysis of arousal potential in lyrics and poetry (Whissell, 1996; Martindale, 1990). In
this paper, the archival approach is used to investigate whether the predictions of the EBM are related
to preference.
The songs of the Beatles were selected for the analysis because they are accessible, there is a large

EBM_keele
musicological literature concerning them, and because the popularity of the songs can be easily
determined. Furthermore, an earlier study by West & Martindale (1996) showed that the arousal
potential of the lyrics of the Beatles increased across time and was not related to the popularity of the
songs, as measured by the record positions in the charts. The aim of the present research was to
employ the EBM in determining (a) whether the Beatles' melodies show an increase in arousal
potential over time and (b) whether the popularity of the melodies is linked with their complexity.
5.1 Analysis material
The material used in the study comprises of all of the songs that were written by the Beatles for the
Beatles, published by the English record companies EMI and Apple officially in 1962-70. This
sampling deliberately excludes cover versions recorded by the Beatles or recordings of Beatles' songs
by other artists. This resulted in a set of 182 qualifying songs. The songs were arranged
chronologically by their original recording date (which was derived from Lewisohn, 1988). The
melodies of the songs were obtained from most reliable notation of the Beatles' music available
(Fujita, Hagino, Kubo, and Sato 1993), encoded as MIDI files, and transposed to a common key (C
major or minor) for the analyses. Grace notes and notated non-pitch information (e.g. shouting,
speaking etc) were removed when encoding the melodies for the computerised analysis.
5.2 Results of the archival study
The melodic complexity of the songs was first regressed onto recording dates, to see whether linear
time-trends across the Beatles' career existed. A highly significant increasing trend emerged (R2=
12.4, F= 25.42, df= 1,180, p< .001). More simply, the melodies of the Beatles' songs became more
complex over time. This is consistent with a previous study that has investigated the statistical
properties of the Beatles' lyrics (West & Martindale, 1996) by means of computerised content
analysis. Both results support Martindale's theory of aesthetic evolution (1990) (which proposes that
art works should become increasingly arousing over time), but could also be attributed to increasingly
sophisticated performance and compositional skills.
Next various popularity indices were correlated with melodic complexity. These included the number
of weeks the songs spent on the chart, the chart positions, and an aggregate function of both these for
singles and albums in the UK and US top 40 (Gambaccini et al, 1996; Whitburn, 1996). The
popularity of singles as measured by weeks on chart in UK correlated negatively with the melodic
complexity of the songs (r(23)= -.567, p< .01). This implies that the simpler the song is melodically,
so the longer it spent on the charts. Also, the chart success of albums (measured either by chart
position, chart duration or both) correlated negatively with the mean melodic complexity of the
albums (r(11)= -.729, p< .05), suggesting that melodically simpler albums have fared better in the
popular music markets. Finally, different poll results (Reed, 1982), compilations (Bronson, 1995) and
expert ratings (Larkin, 1994,) were compared with the melodic complexity of the songs but no clear
trends emerged from these analyses. Interestingly, the other models were not able to predict any
trends in the popularity of the Beatles songs. The transition probability model, however, demonstrated
the same, although weaker, increase in melodic complexity over time as the EBM (R2= .037, F= 6.91,
df= 1,180, p< .01).
It should be noted that although relationships between chart performance and melodic complexity
were observed in this study, several extraneous social, cultural and commercial influences certainly
affect the popularity of songs. However, the present findings may serve as a starting point for further
inquiries into questions of this nature.

EBM_keele
6 Discussion
In this paper, an expectancy-based model of melodic complexity was tested. The EBM provided the
most accurate prediction of human responses to the present stimuli. However, it is important to
establish the extent to which the EBM can predict humans' responses to a wider range of melodies.
There is a risk that the model may be excessively tailored to the specific set of melodies employed in
the present research: some principles might be more important in predicting responses to kinds of
music different to those considered here. Also, the EBM ignores several aspects of music by
considering only single melodic lines. For example, the richness of arrangement, harmony and tempo
are factors outside the scope of the EBM but which might influence listeners' sense of the complexity
of the music in question.
In conclusion, the EBM may have potential for the analysis of very large samples of 'real' musical
stimuli in terms of their melodic complexity. This was demonstrated by analysing all the songs by the
Beatles, which revealed modest trends between melodic complexity and chart success of the singles
and albums; and also an increase in melodic complexity over time. Studies using considerably larger
samples of music are currently underway. Research along these lines could have a considerable
impact on our understanding of the perception and appreciation of music, as this computerised
approach opens up new possibilities for studying the relationship between the properties of music and
listeners' preferences.
7 References
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.
Bronson, F. (1995). Billboard's hottest hot 100 hits. New York: Watson-Guptill.
Carlsen, J. C. (1981). Some factors which influence melodic expectancy. Psychomusicology, 1,
12-29.
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music
of North India. Journal of Experimental Psychology: General, 113, 394-412.
Clarke, E. F. (1985). Some aspects of rhythm and expression in performances of Erik Satie's
"Gnossienne No. 5.", Music Perception, 2, 299-328.
Conley, J. K. (1981). Physical correlates of the judged complexity of music by subjects
differing in musical background. British Journal of Psychology, 72, 451-464.
Eerola, T., & North, A. C. (2000). Measuring melodic complexity: An expectancy-based model.
(Submitted for publication).
Finnäs, L. (1989). How can musical preferences be modified? Bulletin of the Council for
Fujita, T., Hagino, Y., Kubo, H. & Sato, G. (Transcr.) (1993). The Beatles Complete Scores.
London: Wise Publications.
Fung, V. C. (1995). Music preference as a function of musical characteristics. The Quarterly
Journal of Music Teaching and Learning, 6, 30-45.
Gambaccini, P., Rice, T., & Rice, J. (1996). The Guinness book of top 40 charts. 2nd ed. UK:

EBM_keele
Enfield Guinness Publishing.

Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological Review, 89, 334-368.
Krumhansl, C. L. (1995). Effects of musical context on similarity and expectancy.
Systematische Musikwissenschaft [Systematic Musicology], 3, 211-250.
Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodic
expectation in Finnish Spiritual Folk Hymns: Convergence of statistical, behavioral, and
computational approaches. Music Perception, 17, 151-196.
Larkin, C. (1994). All Time top 1000 Albums. London: Guinness.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge: MIT
Press.
Lewisohn, M. (1988). The Beatles: Recording Sessions. The Official Abbey Road Studio
Session Notes 1962-1970. New York: Harmony books.
Martindale, C. (1990). The clockwork muse: The predictability of artistic change. New York,
USA: Basicbooks.
Monahan, C. B., & Carterette, E. C. (1985). Pitch and duration as determinants of musical
space. Music Perception, 3, 1-32.
Narmour, E. (1990). The analysis and cognition of basic melodic structures: The
implication-realization model. Chicago: University of Chicago Press.
North, A. C., & Hargreaves, D. J. (1995). Subjective complexity, familiarity and liking for
popular music. Psychomusicology, 14, 77-93.
Palmer, C., & Krumhansl, C. L. (1990). Mental representation for musical meter. Journal of
Experimental Psychology: Human perception and Performance, 16, 728-741.
Povel, D. J., & Essens, P. (1985). Metrical and nonmetrical representations of temporal
patterns. Perception and Psychophysics, 37, 1-7.
Read, M. (1992). Labatt's 500: Britain's All-Time Favourite Tracks. Spain: Mandarin
Paperbacks.
Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic
processes. Music Perception, 7, 109-150.
Simonton, D. K. (1984). Melodic structure and note transition probabilities: A content analysis
of 15,618 classical themes. Psychology of Music, 12, 3-16.
Simonton, D. K. (1994). Computer content analysis of melodic structure: Classical composers
and their compositions. Psychology of Music, 22, 31-43.
Simonton, D. K. (1995). Drawing inferences from symphonic programs: Musical attributes
versus listener attributions. Music Perception, 12, 307-322.
Smith, K. C., & Cuddy, L. (1986). The pleasingness of melodic sequences: Contrasting effects
of repetition and rule-familiarity. Psychology of Music, 14, 17-32.

EBM_keele
Thompson, W. F. (1994). Sensitivity to combinations of musical parameters: Pitch with

duration, and pitch pattern with durational pattern. Perception and Psychophysics, 56, 363-374.
Unyk, A. M., & Carlsen, J. C. (1987). The influence of expectancy on melodic perception.
West, A. & Martindale, C. (1996). Creative trends in the content of Beatles' lyrics. Popular
Music and Society, 20, 103-125.
Whissell, C. (1996). Traditional and emotional stylometric analysis of the songs of Beatles Paul
McCartney and John Lennon. Computers and the Humanities, 30, 257 -265.
Whitburn, J. (1996). The Billboard Book of Top 40 hits. 6th ed. USA, NY: Watson-Guptill
Publications.
Back to index

THE ROLE OF MOTION PERCEPTION IN JUDGEMENTS OF RELATIVE PITCH
THE ROLE OF MOTION PERCEPTION IN JUDGEMENTS OF RELATIVE PITCH
Mr Philippe Lacherez
lacher@psy.uq.edu.au
Background:
Recent models of pitch perception have suggested that pitch is not perceived
absolutely but involves comparisons between successive sounds. An important
paradigm in this research has been the use of specially-synthesised "Shepard"
tones, which ambiguate absolute frequency information while retaining the
relationships between tones, thus permitting a direct analysis of this
comparison process. Recent work using such stimuli has led to the suggestion
that a motion-specific perceptual process may be involved in relative pitch
judgements.
Aims:
The present study made use of a motion-adaptation paradigm to investigate the

role of motion perception in judgements of relative pitch.
method:
Twelve psychology students from the University of Queensland judged the

perceived direction of ambiguous tritone pairs of Shepard tones, before and
after listening to ascending and descending sequences of Shepard tones. Four
sets of tones were used: one in which the centre of the tones' spectral
envelopes was fixed around 262 Hz, another fixed around 370 Hz, a third fixed
around 523 Hz, and a fourth fixed around 740 Hz.
Results:
The perceived direction of tone-pairs was found to change as a result of the

induction sequence. However, the nature of the change was found to differ
according to the centre frequency of the envelope, and also varied considerably
between listeners.
Conclusions:
These results indicate a highly complex interaction in which absolute frequency

judgements, contextual information, and individual differences are all involved
in the perception of pitch. Moreover these findings are not accounted for by
existing theories of auditory perception.
Back to index
file:///g|/Tue/Lacherez.htm [18/07/2000 00:36:34]

Why Are We Musical
Proceedings paper
Why are we musical?

Support for an Evolutionary Theory of Human Musicality
Donald A. Hodges
Institute for Music Research
University of Texas at San Antonio
More than ten years ago, I wrote an article on the evolutionary basis for human musicality (Hodges,
1989). It is exciting to revisit a topic of such significance to our understanding of the phenomenon of
music. While I still believe in the arguments put forth in that paper, there are several significant
additions that will make the case even stronger. That case, briefly stated, is that human beings are
genetically endowed to be musical creatures in the same sense that we are designed to be linguistic.
Music is so prevalent among all humans that Blacking (1973) calls it a species-specific trait and
Wilson (1986) contends that we have a biological guarantee of musicianship. The findings of
anthropologists and sociologists amply demonstrate that all people in all times and in all places have
engaged in musical behaviors (Lomax, 1968; Merriam, 1964).
However, saying that human beings are musical creatures does not explain why that should be so. The
cornerstone of evolution theory is that traits become genetically encoded when they provide survival
values. What survival values does music provide? In what way does being musical make it more
likely that the individual possessor will survive long enough to mate and thus to transmit these
musical genes? The discussion begins with a brief consideration of Plotkin’s theory of primary and
secondary heuristics.
Plotkin (1994) calls the natural tendency for an organism to perpetuate itself through progeny the
primary heuristic. Secondary heuristics are more generalized attributes (e.g., language) in which the
particulars that are not inherited (e.g., French, Swahili, or English) need to be shaped by
environmental circumstances. They provide organisms with the ability to respond to unpredictable
changes. Species unable to make appropriate adaptations to changing environments are doomed to
extinction. Many animals adapted a "physical specialty" such as great speed or superior vision as a
survival mechanism, while human beings became mental specialists. As primates and early hominid
forebears explored ever more divergent habitats, a basic means of solving unforeseen problems was
necessary. Unlike many species who tend to remain in a more or less stable habitat (even if they
migrate they tend to move back and forth to familiar destinations), early humans gradually moved into
every possible habitat on the globe. Doing so forced us to respond to a myriad of challenges and we
became admirably equipped to do just that. Eventually our intelligence led us to change the
environments instead of adapting to them. In fact, some argue that biological evolution gave way to
cultural evolution around 35,000 years ago (Dubos, 1974).
But, modern cognitive science does not favor a general intelligence. Rather, it appears that we
developed a number of intelligences or ways of solving problems presented by living in a complex
and changing world. Plotkin states that "… one would expect intelligence to take the form of skill- or
file:///g|/Tue/Hodges.htm (1 of 9) [18/07/2000 00:36:36]

Why Are We Musical
domain-specific intelligences, with learning, memory and thinking machinery focused upon specific
areas of function" (1994, 189). This concept of modularity, with each domain somewhat autonomous
as it originated in response to specific needs, is perhaps most popularly exemplified in Gardner’s
(1983) list of multiple intelligences; we acquired linguistic, musical, logical-mathematical,
bodily-kinesthetic, spatial, intrapersonal, interpersonal, naturalist, and spiritual (or existential)
intelligences as a means of adaptation to changing circumstances in the environment.
Time — Rhythm— Hearing
Nature includes many periodicities, or rhythms. Many of these, such as day-night, lunar, and seasonal
cycles, impact animal behavior. Think of the effects of temperature changes on cold-blooded animals.
A snake, for example, is at the mercy of the environment as the temperature swings from chilly in the
morning to hot in the afternoon to cold at night. Other animals have internal regulators that help to
keep internal temperatures more constant. In humans, brain waves, hormonal outputs, and sleeping
patterns are examples of the more than 100 complex oscillations monitored by the brain (Farb, 1978).
Homeostasis (preservation of internal sameness), among higher animal forms and especially among
humans, represents freedom by making us more time-independent. The issue of time is not only
biological, but psychological as well, as it was just as important to find ways to create mental
homeostasis (a stable psychological world). "Inner sameness, whether biological or psychological (the
two cannot be separated in any clear-cut way), is an evolutionary invention peculiar to advanced
forms of life and necessary if living creatures are to avoid being the slave of time" (Campbell, 1986,
60).
Hearing is a primary sense through which we create a stable, inner world of time. Millions of years
ago when dinosaurs roamed the earth, mammals, then just small forest creatures, were forced to hunt
at night for safety's sake. Hunting at night requires a keen sense of hearing as sonic events occurring
over time must be ordered to become meaningful. A rustling of leaves may indicate a predator
approaching or prey retreating. Typically, organisms analyze sounds for patterns, patterns detected are
ascribed "meaning," and this meaning drives behavior (Mikiten, 1996). The more complex the sound
analyzer and pattern detector (i.e., the brain and its related sensory organs), the more complex the
patterns that can be detected and the more complex the resulting behaviors. Thus, evolution provided
human beings with a remarkable capacity to interpret sounds that are time-ordered. "To hear a
sequence of rustling noises in dry leaves as a connected pattern of movements in space is a very
primitive version of the ability to hear, say, Mozart's Jupiter Symphony as a piece of music, entire,
rather than as momentary sounds which come and then are no more ..." (Campbell, 1986, 263-264).
Biophony
The sonic world in which we evolved was filled with an incredible array of detectable patterns.
Modern living has detached us from the sounds of nature, but for our ancient ancestors their very
survival depended upon the ability to detect patterns in these sounds, derive meaning from them, and
adjust their behavior accordingly. Wind and water noises, bird calls, monkey screeches, and tiger
growls all had meaning. Beyond this, many (if not all) animal sounds were suffused with an
"emotional" content. They screamed in pain, roared a challenge, or offered enticements for mating.
Darwin contended that human musicality arose out of the emotional content of animal sound-making
when he said that "musical tones and rhythm were used by our half-human ancestors, during the
season of courtship, when animals of all kinds are excited not only by love, but by the strong passions
of jealousy, rivalry, and triumph" (1897/nd, 880).
Early humans would have heard these sounds not in isolation but holistically as a sound tapestry.

Why Are We Musical
Krause’s (1987) niche hypothesis likens the sounds of nature (biophony) to a symphonic score. A
spectrogram of the sounds of the forest or around a pond shows that each species produces sounds that
occupy particular niches. If these sounds are important—mating calls, for example—they wouldn’t be
very effective if they were lost among all the other sounds. Thus, each animal has learned over the
millenia to create sounds that occupy a very particular strata in the overall biophony, insuring that
those for whom the sound is intended can pick it out.
Growing up in a particular sonic environment—growing up both in the sense of the individual and of
the generations over thousands of years—it is quite natural that we would make attempts to mimic the
sounds of nature. With our great brains we moved easily from mimicry to elaboration, extension,
synthesis, and eventually the creation of novel sounds. Thus, we occupy our own niche in the natural
order of sounds, but we are not content to remain in that niche. As a dramatic example, Krause (1987)
finds that it now takes 2,000 hours of field recording to acquire one hour of usable material; the
reason for this is that it is nearly impossible to find natural habitats that are not invaded by human
sounds.
Much of the earliest music would have been vocal (and other bodily sounds) and many of the earliest
instruments would have been biodegradable, having been made of reeds, wood, or skins, and thus lost
in the mists of time. Nevertheless, there are evidences of early music. Acoustical analyses of caves
show that those places where the acoustics are best are accompanied by many paintings; those places
where the acoustics are poor have few or no cave paintings. "Thus, the best places to view the artwork
of the cave appear to have been the best places to hear music or chants" (Allman, 1994, 216). Also
found in the caves are whistles, flutes, and mammoth bones that may have been used as drums or as
Ice Age xylophones.
Attema (2000) recently demonstrated a 53,000 year-old bone flute made from an animal leg bone.
This is not a simple "toy" such that any child could make out of a hollow tube. Rather, it is a fipple
flute (similar to a recorder), requiring a plug in the head joint with an air channel venting through an
air hole and tone holes properly spaced down the length of the tube. This is a startling demonstration
that even at that early stage in our development we had the brain power to figure out complex
problems. Moreover, this was obviously important enough to have invested a considerable amount of
time and energy to get it right. No doubt there were many unsuccessful attempts along the way. (See
Hodges and Haack, 1996, for a review of additional ancient musical artifacts.)
Mother/Infant Bonding
In consideration of the survival benefits music has to offer, the evolutionary advantage of the smile,
like music a universally-innate human trait, provides a useful analogy. From a cultural evolutionary
standpoint, the smile has taken on many diverse meanings. However, from a biological evolutionary
standpoint, the primary survival benefit may have been the bonding of mother and infant (Konner,
1987). Likewise, although music has many diverse cultural meanings today, at its roots it may also
have had survival benefits in connection with mother-infant bonding.
From Australopithecus africanus, nearly five million years ago, to modern humans, the brain has
nearly tripled in size (Cowan, 1979). If the human fetus were carried "full term" in terms of brain
development, the head would be too large to pass through the birth canal and we would be unable to
be delivered. The evolutionary solution to this problem was that we are now born with our brains
incompletely developed. It takes about six years for the brain to reach 90 percent of its eventual adult
size.
The result of this post-partum brain development is an increased period of dependency of infants on

Why Are We Musical
their parents. Compared with other animal species, human infants are more helpless and for a
significantly longer period of time. The fact that human mothers most often give birth to single babies
rather than litters means that more time may be devoted to the individual child. While the baby is in
this stage, s/he is growing, developing, and learning at a tremendous rate. Nearly 75 percent of a
newborn's cerebral cortex is uncommitted to specific behaviors (Springer and Deutsch, 1989). This
uncommitted gray matter, called association areas, allows for the integration and synthesis of sensory
inputs in novel ways.
Mothers and newborns confer many important physiological and psychological benefits on each other
and chief among them are loving behaviors. Babies learn to love almost immediately and in turn are
nurtured by love. The importance of these loving interactions cannot be overstated.
Love and affection are communicated to a baby in a number of ways. Speaking, singing, and touching
(primarily in the form of rhythmic stroking, patting, and rocking) are three primary modes of
communicating with infants. Some psychologists have coined the term "motherese" in reference to the
particular kind of speech patterns mothers use with their infants (Birdsong, 1984). The musical
aspects of motherese are critically important, not only as an aid to language acquisition, but especially
in the communication of emotions. Long before youngsters begin to talk, they are adept at deciphering
the emotional content of speech, largely due to the musical characteristics of motherese. In motherese
speech, it is the pitch, timbral, dynamic, and rhythmic aspects to which the baby responds, certainly
not the verbal content. "You are an ugly baby" spoken in a soft, sing-song fashion will illicit a far
more positive response than "you are a beautiful baby" shouted in an angry tone.
Of course, the communication system is a two-way affair. Babies, too, are learning to give love as
well as receive it. Vocalizations are a primary way that babies express their feelings (Fridman, 1973;
Roberts, 1987). Even in the first few days of life, babies begin to establish a relationship with their
parents through their cries. In the first few months, they develop a wider range of crying styles that
form a particular kind of infant language. The development of variations in crying styles is important
to emotional development, in providing cues to parents regarding their state, and in practicing for the
eventual development of language. Babies learn to cry to gain attention and to express an increasing
range of feelings. Because their vocalizations are nonverbal, it is once again the manipulation of pitch,
timbre, rhythm, and dynamics (prosody) that forms the basis of their communications system.
Imagine a small tribe of people living many thousands of years ago. A mother sits cradling a newborn
baby in her arms. This baby will be totally dependent upon her for all the basic necessities of
life—food, clothing, shelter, protection—for nearly two years and somewhat dependent upon her for
many years after that. If the baby were not respondent to anything related to musical or pre-musical
behaviors, how would the mother communicate love? And if the mother could not communicate love,
how would the baby survive? And if the baby could not survive, how could the species survive?
Fortunately, the baby has an inborn capacity to respond to a wide range of pre-musical expressions. A
large part of this inborn response mechanism must deal with some notion of pleasure. Warmth,
assurance, security, contentedness, even nascent feelings of happiness, are all a part of what is being
shared with the baby. If these responses to pre-musical activities were wired into the brain, is it not
understandable that music still brings us deep pleasure long after cultural evolution has developed
these pre-musical behaviors into playing bagpipes or singing grand opera?
Music and Language
One of the outcomes of the mother/infant dyad discussed previously is that the baby becomes
motivated to recognize and respond to sound patterns that will later become necessary for speech
perception. When parents communicate with their infants, their "baby talk" quite naturally emphasizes

Why Are We Musical
the prosodic elements used in the native tongue.

Rhythm also plays a crucial role in language acquisition. Newborns move their limbs in rhythm to the
speech they hear around them (Bohannan 1981). If they hear a different language, their rhythms will
change subtly. Rhythmic activities in the acquisition of language are so important that they form the
basis for acquiring cognitive expectancies and for interrelating cognition and affect (Stern 1982).
Simultaneously with the acquisition of the mechanics of listening to and producing speech, infants are
learning other useful information through the musical aspects of communication. They are learning
that there are important nonverbal messages to be sent and received. Almost any utterance can be
spoken with an infinite variety of shadings and resultant meanings. Through such means as body
language, context, and prosody, one can express the "real" meaning behind the words. In terms of
biological evolution, equipping the brain with neural systems that have the ability to produce and
interpret both verbal and nonverbal messages was a crucial step in our survival. Abundant data exist
to document that language and music are processed to a considerable degree by different neural
mechanisms (Hodges, 1996).
Social Organization
Though mentioned previously, the role of rhythm in human life needs elaboration. Chronobiologists,
those who study body rhythms, believe that rhythm is such an important part of life that a lack of it
can indicate illness. For example, complex forms of dysrhythmia may be a symptom of autism,
manic-depression, or schizophrenia; dysrhythmia can also indicate dyslexia or other learning
disabilities (Bohannan, 1981; Condon, 1982; Wehr, 1982).
The impact of rhythmic experiences is widespread. Infants who receive stimulation through rocking or
other body movements gain weight faster, develop vision and hearing acuity faster, and acquire
regularity of sleep cycles at a younger age. Perhaps even more important is the fact that the
cerebellum is directly linked to the limbic system, specifically a region of the hypothalamus known as
the pleasure center. The result is that body movement brings pleasure. Infants deprived of movement
and closeness will fail to develop brain pathways that mediate pleasure (Restak, 1979). Integration
into environmental rhythms begins at birth with the onset of rhythmic breathing and continues as the
baby gradually adapts to the rhythmic cycles of the world into which it has been born. Over the next
months, the patterns of family life, particularly the parent's cycle of activity and rest, will condition
and shape the baby's social rhythms. This is highly important, since nearly all social interactions are
rhythmically based (Brown and Graeber, 1982; Davis, 1982; Evans and Clynes, 1986; Montagu and
Matson, 1979).
For prehistoric societies, cooperation was vital for hunting, gathering, protection (from the elements,
animals, and enemies), and for the creation of the family unit; a social network was necessary for
survival of the human species. Increasing social complexity required increased brain power. Music
may have conferred survival benefits in two ways related to social organization: (1) music is a
powerful unifying force and (2) music is a powerful mnemonic device (Roederer, 1984; Stiller, 1987).
Consider, once again, a prehistoric tribe. Their survival is directly related to the kind of commitment
they share with one another. If the group scatters at the first sign of danger, the individuals will have a
much more difficult time of coping. Behaviors that help promote the notion of corporate identity
would be of immense value. One of music's strongest attributes is that it brings people together for a
common purpose. For there to be a feeling of unity, some common ideas, goals, visions, dreams, and
beliefs must be shared. What better way to share them than through music and dance?
Members of a tribe are often bonded together by common religious beliefs that are frequently

Why Are We Musical
expressed through music. Members of one tribe must band together to fight off members of another
tribe. Music gives courage to those going off to battle and it gives comfort to those who must stay
behind. Much of the work of a tribal community requires the coordination of many laborers. Music
not only provides for synchrony of movement but also for relief from tedium. These are but a few of
the many ways music may have supplied a unifying force to early communities.
Memory is also of crucial importance to the survival of a society. Not only is memory of a
technological nature important—When best to plant? Where best to find game? How best to start a
fire?—but also the things that make the society unique and special. Who are we? Where did we come
from? What makes us better than our enemies who live on the other side of the river? Music is one of
the most effective mnemonic devices. It enables preliterate societies to retain information—not just
facts but the feelings that accompany the facts, as well. Poems, songs, and dances are primary vehicles
for the transmission of a heritage.
Much of musical thinking may be placed under a broader heading of "play," which may provide
significant evolutionary advantages. The importance of play is understood more clearly when seen in
the fullest sense of exploring, examining, and problem solving (Brown, 1994). Curiosity may have
killed the cat, but for human beings it has led to discoveries and inventions that have aided survival.
Playing with every aspect of the environment has led both to the invention of the bow and to the songs
and dances that accompany the hunt and the battle. Which is more important? Are not both necessary
for survival? In fact, some evidence suggests that the bow was initially as much a musical instrument
as it was a weapon (Mumford, 1966). Musical bows can be found among both Native American and
African tribes. There are indeed significant survival premiums in play, generally, and in musical play,
specifically. What human beings have learned about themselves and the world through music has
been of tremendous benefit.
Akin to the notion of play, is Wilson’s (1998) contention that the hand and brain co-evolved. That is,
developmental changes that took place in the arm and hand allowed for many more skills (e.g.,
grasping, throwing, pounding, creating and manipulating tools, etc.) and this spurred brain
development. Furthermore, hearing is important in tool use; for example, as it aids in the processes of
filing or hammering. The same combination of hearing, handedness in tool making, and brain would
have been involved in the creation of early bone flutes. Tinkering and experimenting with different
lengths of tubing, where to put the tone holes, how to direct the air through the air channel, and so on,
would have provided important mental problems to solve, with an emotional investment in the
outcome. (Wilson also devotes an entire chapter to the idea that music has an evolutionary basis as a
secondary heuristic.)
Perhaps the most important thing human beings have learned through music is how to deal with
feelings. Although certain emotional responses may be inborn as a protective mechanism,
by-and-large we have to learn to recognize and express feelings. One of the hallmarks of humanness
is a sensitivity to feelings that allows for many subtle nuances. Being fully human means to
experience the infinite shadings that exist between the polar ends of emotional states. Our experience
of these refined feelings is essentially nonverbal. Notice how limited our vocabulary is in this area and
how often we experience difficulty in telling another exactly how we feel.
Music may provide a means of conferring survival benefits through the socialization of emotions.
When group living is mandatory for survival, as it is for human beings, learning to react to others with
sensitivity has clear evolutionary advantages. Lions hunt in groups; however, after a kill has been
made each individual fights for his or her share. The biggest and strongest get the most to eat. This
display of aggression at feeding time necessitates a subsequent period of licking and
rubbing—"making up"—an activity necessary to keep the social bonds of the pride in place (Joubart,

Why Are We Musical
1994). Among primates, grooming serves a similar purpose, while language (particularly gossip) may
do the same for humans (Wilson, 1998). Music, as previously suggested, also contributes.
Listening to the daily news is all one needs to do to realize that human beings still have to deal with
many aggressive behaviors in our societies. We need to find ways to separate actions from feelings.
How does one feel anger without acting on it? How does one avoid giving in to loneliness and
despair? It is important to learn how to feel deeply without always resorting to action. Music is one of
the most powerful outlets for expressing emotions. One can learn to cope with grief, frustration, and
anger or to express joy and love through musical experiences.
Each intelligence has developed because it provides a unique way of knowing about the world. Each
type of intelligence may be better suited for providing information about different aspects of the inner
and outer worlds of human beings. Music, no better and no worse than other types of intelligence,
provides it own type of information. Music is particularly useful in providing a medium for dealing
with the complex emotional responses that are primary attributes of humanity. Clearly, developing
means of controlling and refining emotions would have evolutionary advantages.
Concluding statement
Contrast "Minimal musical skills are not essential so far as we know …" (Brown, 1981, 233) with
"Art was as crucial a part of our ancient ancestor’s survival as finding food and shelter" (Allman,
1994, 209). Or compare "Why do we respond emotionally to music, when the messages therein seem
to be of no obvious survival value?" (Roederer, 1982, 38) with "It [music and art] represents activity
as basic for the survival of the human species as reproducing, getting food, or keeping predators at
bay" ("Pfeiffer, 1980, 74). Finally, consider
[Music] is a creation of the human brain that made use of structures it inherited from
evolution, and that were designed to serve biologically relevant functions, in order to
develop and sustain a domain of activity as yet unheard of and of no direct biologically
adaptive value. (Sergent, 1993, 20)
in relation to
Proper study of the organization of the brain shows that belief and creative art are
essential and universal features of all human life. They are not mere peripheral luxury
activities. They are literally the most important of all the functional features that ensure
human homeostasis. (Young, 1978, 231)
Which side has it right? Is it reasonable to agree that like language, music is found in all human
groups, like language it arises readily in children with some degree of rule-based structure, and like
language it has identifiable neural structures devoted to it, but unlike language it confers no survival
benefits? It makes more sense that musicality was built into the human system over thousands of years
because, like language and all the other intelligences, it provides unique ways of knowing that
allowed our species to cope with the many uncertainties of life.
References
Allman, W. 1994. The stone age present. New York: Simon and Schuster.
Attema, J. 2000. Music from prehistoric times. Biomusic: The music of nature and the nature of
music. Washington, DC, February 19.
Birdsong, B. 1984. Motherese. In Science yearbook 1985: New illustrated encyclopedia, 56–61.

Why Are We Musical
New York: Funk and Wagnalls.

Blacking, J. 1973. How musical is man? Seattle: University of Washington Press.
Bohannan, P. 1983. That sync'ing feeling. Update: Applications of Research in Music
Education 1983, 2, no. 1:23–24. First published in Science 81:25–26.
Brown, F., and R. Graeber, eds. 1982. Rhythmic aspects of behavior. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Brown, R. 1981. Music and language. In Documentary Report of the Ann Arbor symposium.
Reston, VA: Music Educators National Conference, 233-265.
Brown, S. 1994. Animals at play. National Geographic 186, no. 6:2–35.
Campbell, J. 1986. Winston Churchill's afternoon nap. New York: Simon and Schuster.
Condon, W. 1982. Cultural microrhythms. In Interaction rhythms: Periodicity in
communication behavior, ed. M. Davis, 53–77. New York: Human Sciences Press.
Cowan, W. 1979. The development of the brain. Scientific American 241, no. 3:113–33.
Darwin, C. 1897/nd. The descent of man. New York: Modern Edition.
Davis, M., ed. 1982. Interaction rhythms: Periodicity in communication behavior. New York:
Human Sciences Press.
Dubos, R. 1974. Beast or angel? New York: Scribner.
Evans, J., and M. Clynes, eds. 1986. Rhythm in psychological, linguistic, and musical
processes. Springfield, IL: Charles C. Thomas.
Farb, P. 1978. Humankind. New York: Bantam Books.
Fincher, J. 1976. Human intelligence. New York: Putnam.
Fridman, R. 1973. The first cry of the newborn: Basis for the child's future musical
development. Journal of Research in Music Education 21, no. 3:264–69.
Gardner, H. 1983. Frames of mind: The theory of multiple intelligences. New York: Basic
Books.
Hodges, D. 1989. Why are we musical? Speculations on an evolutionary plausibilty for musical
behavior. Bulletin of the Council for Research in Music Education 99:7-22.
Hodges, D. 1996. Neuromusical research: A review of the literature. In Handbook of music
psychology, ed. D. Hodges, 197-284. San Antonio, TX: IMR Press.
Hodges, D. and P. Haack. 1996. The influence of music on human behavior. In Handbook of
music psychology, ed. D. Hodges, 469-555. San Antonio, TX: IMR Press.
Joubart, D. 1994. Lions of darkness. National Geographic 18, no. 2:34-53.
Konner, M. 1987. The enigmatic smile. Psychology Today 21, no. 3:42–46.
Krause, B. 1987. The niche hypothesis: How animals taught us to dance and sing.
http://www.wildsanctuary.com.

Why Are We Musical
Lomax., A. 1968. Folk song style and culture. New Brunswick, NJ: Transaction Books.
Merriam, A. 1964. The anthropology of music. Chicago: Northwestern University Press.
Mikiten, T. 1996. A method for research in music medicine. In MusicMedicine, Volume 2, ed.
R. Pratt and R. Spintge, 14-23.
Montagu, A., and F. Matson. 1979. The human connection. New York: McGraw-Hill.
Mumford, L. 1966. The myth of the machine. New York: Harcourt Brace Javanovich.
Pfeiffer, J. 1980. Icons in the shadows. Science80 1, no. 4:72-79.
Plotkin, H. C. 1994. Darwin machines and the nature of knowledge. Cambridge, MA: Harvard
University Press.
Restak, R. 1979. The brain: The last frontier. New York: Warner Books.
Roberts, M. 1987. No language but a cry. Psychology Today 21, no. 5:41.
Roederer, J. 1982. Physical and neuropsychological foundations of music. In Music, mind, and
brain, ed. M. Clynes, 37-46. New York: Plenum Press.
Roederer, J. 1984. The search for a survival value of music. Music Perception 13:350–56.
Sergent, J. 1993. Mapping the musical brain. Human Brain Mapping 1:1, 20-38.
Springer, S. and G. Deutsch. 1989. Left brain, right brain, 3d ed. New York: Freeman.
Stern, D. 1982. Some interactive functions of rhythm changes between mother and infant. In
Interaction rhythms: Periodicity in communication behavior, ed. M. Davis, 101–17. New York:
Human Sciences Press.
Stiller, A. 1987. Toward a biology of music. Opus 35:12–15.
Wehr, T. 1982. Circadian rhythm disturbances in depression and mania. In Rhythmic aspects of
behavior, eds. F. Brown and R. Graeber, 399–428. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilson, F. 1986. Tone deal and all thumbs? New York: Viking Penguin.
Wilson, F. 1998. The hand. New York: Vintage Books.
Young, J. 1978. Programs of the brain. Oxford: Oxford University Press.
Back to index

TITLE
Keynote speech - Abstract
Why does music matter?

Simon Frith, University of Stirling
ABSTRACT
This paper will consist of sociological reflections on the importance of music in everyday life. It will
be organised around two themes. First, the dialectic between music and noise. On the one hand, I'll
consider the ways in which music has become noise--the issue here isn't simply the ubiquitous
presence of music as the background sound of an increasing number of public, domestic, and private
activities, but also the circumstances under which such music becomes irritating, intrusive, a threat.
On the other hand, I'll consider the ways in which noise becomes music--my interest here is in the use
of volume and the effects of electronic sound production.
My second theme is listening. What does it mean to listen to music? What is involved in terms of
attention and engagement? I ask these questions not as a psychologist, wondering what's going on
inside people's heads, but as a sociologist, interested in both the social circumstances in which people
pay a special kind of attention to what they hear (what's involved here is both a timetable of
engagement and a sense of proper musical space) and in the discourse they use to explain what they
are doing.
While it is easy enough to describe the social and economic forces that have put music everywhere in
our lives, it is less easy to explain how, nonetheless, music remains a special experience. The question
I want to raise is what that 'specialness' now involves, with particular reference to the concept of
choice and key everyday musical institutions like the radio, the CD player and the club.
Back to index
file:///g|/Tue/Frithkey.htm [18/07/2000 00:36:37]

wednesday
Back
Proceedings
Wednesday 9th August

. S1 S2 S3 S4 S5 S6
Symposium: Symposium: Symposium: Thematic Symposium:
Session: Thematic
The power of the Assessment of Time in music: Current trends in Session:
voice for singer children's music from Music and evolution the study of music
and listener compositions: in psychoacoustics to and emotion Timbre
cognitive Chair: Baroni,M.
looking out and Chair: Yi, S.W.
Convenor: psychology Convenors:
out looking in
Davidson, J Juslin,P.&
(Second session)
Convenor: Zentner,M.
Chair: Webster, P. Convenors:
Davidson,J. Chair: Juslin,P.
Drake,C.,
Chair: Palmer,C.
MacDonald,R. Discussant:
Sloboda,J
Chair: Drake,C.
9.00 Coimtra, D. Webster, P. Madison, G. Bannan, N. Zentner, M. Vassilakis, P.
Understanding Formal approaches Drift and timing Prosody, meaning The nature of Auditory
the artist: to the evaluation variability in and musical musical emotions: roughness
exploring how of music isochronous behaviour a perspective estimation of
singers are compositions of interval production from psychology complex tones
evaluated children by with and without
external judges: music imagery (Abstract)
rating scales,
rubrics and other (Abstract)
techniques
(Abstract)
9.30 Salgado, A. Hickey, M. Brochard, R. Merker, B. Peretz, I Turgeon, M.
Voice, Emotion The use of Functional imaging The birth of music Is the emotional The perceptual
and Facial consensual of rhythm in synchronous system isolable organisation of
Gesture in assessment in the perception chorusing at the from the cognitive complex tones
Singing evaluation of hominid-chimpanzee system in the in a free field
children's music split brain?
compositions
(Abstract)
10.00 Magee, W. Mellor, L. Large, E. Cross, I. Trehub, S. Kendall, R.
The voice in Children's On the persistence Music in Human Musical emotions: Musical timbre
therapy: perception of their of metrical percepts evolution a perspective beyond a single
monitoring own music from development note: tessitura
disease process in compositions (Abstract) (Abstract) and context
chronic (Abstract)
degenerative (Abstract)
illness
file:///g|/Wed/wednday.htm (1 of 3) [18/07/2000 00:36:39]

wednesday
10.30 Meadows, D. Seddon, F. Scheirer, E Juslin, P. .
Singing on high: Influences of On the perceived Towards a

investigating the formal complexity of short computational
use of singing in instrumental music musical segments. model of
Christian worship tuition on Presenter: Vercoe, expression in
adolescent B. performance: the
self-confidence GERM model
and engagement in
computer based (Abstract)
composition
11.30 Invited Speaker: Scherer, K.

Music and emotional meaning: Perception and production rules
Chair: Peretz, I.
Wednesday 9th Afternoon Session

14.00 Poster Session 3
Demonstration paper session 2
. S1 S2 S3 S4 S5 S6
Symposium: Thematic session: Thematic session: Thematic session: Symposium: Thematic
Session:
Children's Perception of Music and Computational Current trends in
compositions: harmony and meaning models the study of music Music and
understanding tonality and emotion movement
Chair: Ohgushi,K. Chair:
the process and Chair: Deliege,I. Belardinelli,M. Convenors: Juslin, Chair: Desain,P.
outcomes. P., Zentner, M.
Convenor: Chair: Zentner,M.
MacDonald, R.
Disscussant:
Chair: Miell,D. Scherer, K.
Discussant:
Folkestad, G.
15.30 Seddon, F. Krumhansl, C. & Dibben, N. Povel, D. Juslin, P. Bangert, M.

Toiviainen, P.
Adolescent Making music Towards an Emotional How to get a
engagement in Dynamics of mean on-line model of expression in piano into your
computer-based tonality induction: music perception speech and music: head: effects of
composition: an a new method and (Abstract) different channels practice on
analysis of the a new model same code? cortical and
process of sub-cortical
composition representations
of the sounding
keyboard
(Abstract)
16.00 MacDonald, R. Bigand, E. Coward, S. Piat,F. Schubert, E. Todd, N.
An empirical More about the Deriving meaning Artist: a Unresolved issues A sensory-motor
investigation of (weak) differences from sound: an connectionist in continuous theory IV:
the social and between musicians experiment in model of musical response in vestibular
musical and ecological acculturisation methodology responses to
processes non-musician's acoustics music
involved in abilities to process
children's harmonic (Abstract)
collaborative structures
compositions

wednesday
16.30 Byrne, C. Fineberg, I. Horton, T. Davis, S. Gabrielsson, A. Stevens, K.
Developing Similarity Musical meaning Implied Emotion Choreographic

processes based judgements for and theories of polyphony in the recognition and cognition:
on inventing harmonic phrase tonal structure unaccompanied emotion induction: composing space
activities: a units: the string works of parallels and and time
spider's web of relevance of the (Abstract) J.S. Bach: a rule contrasts
intrigue and harmonic features system for
creativity and subtle discerning (Abstract)
rhythmic melodic strata
(Abstract) manipulations
(Abstract)
17.00 Morgan, L. Parncutt, R. Cox, A. . . Gruhn, W.
Children's Effect of temporal The mimetic The relation of

collaborative context on pitch hypothesis and body movement
music salience in embodied musical and voice
composition: musical chords meaning production in
communication early music
through music (Abstract) learning

Title of Symposium:
Title of Symposium: The Power of the Voice for Singer and Listener
Symposium Rationale: Using the human voice as a means of musical production has long been recognised as a way of
permitting any individual to have immediate access to musical expression. It has also been anecdotally reported to be
the most powerful of all musical instruments to elicit emotional responses in both performers and listeners. So, to
investigate the voice with all its potentialities is clearly important for the psychologist. However, little systematic study
has been undertaken to consider how voice production is achieved and received within a psychodynamic framework.
That is, how singers and listeners interpret the singing experience.
Aims The current symposium offers a broad range of research perspectives on the voice as a means of musical
expression and communication. The focal point for all presentations is an exploration of the psychodynamics elicited as
individuals engage in performing and listening to the voice. In the first presentation, the impact of the singer and his or
her stage presence is explored, and criteria are described which encapsulate what makes a moving or beautiful
performance. In the second paper (highlighting the impact of stage behaviour and opera house convention) a detailed
study of how non-vocal gestures are used to achieve enhanced emotional expression in singing is considered. In the
third paper, the role of the voice to monitor degenerative illness is considered in a case study of Music Therapy with a
Multiple Sclerosis client. Finally, the fourth paper offers some insight into communication in singing within Christian
worship.
Back to index
file:///g|/Wed/Voisymp.htm [18/07/2000 00:36:40]

Symposium: Assessment of Children's Music Compositions: In Looking Out
Symposium Introduction
Assessment of Children's Music Compositions: In Looking Out and Out Looking In
Convenor: Webster, P
The aim of this series of presentations and group discussion is to explore what
we know about the approaches to evaluating music compositions of children.
Music composition in the schools has had a long tradition in the United
Kingdom, Australia and in certain Western European countries and is how
beginning seriously in the United States as part of the Voluntary National
Standards in the Arts. Experiments in the effective assessment of these
compositions as part of a long-range plan to develop well-rounded music
experience for children is relatively new.
This two-hour symposium will address both past research and practice on this
topic, as well as offer possibilities for future work. Four researchers from
the United States and the UK will offer perspectives. Two of the papers (Mellor
and Seddon) will deal with children's own self-assessment of compositions and
the other two papers (Hickey and Webster) will explore assessments of the
compositions by adults. Mellor will concentrate on self-assessment patterns
across age and gender boundaries and Seddon will present data how children view
their own computer generated compositions in light of their past music tuition.
Webster will review the formal approaches to the assessment of compositions by
adult judges (rating scales, rubrics, written descriptions)and Hickey will
present evidence to support consensual assessment.
The convenor will begin the symposium with some introductory remarks and will
moderate the discussion. Contributions from the audience will be an important
part of the symposium, encouraging participants to share their experiences with
the topic. Each paper will be restricted to 20 minutes, allowing for about 30
minutes of discussion.
Back to index
file:///g|/Wed/Websymp.htm [18/07/2000 00:36:40]

Current Trends in the Study of Music and Emotion
Current Trends in the Study of Music and Emotion

Organisers: Marcel R. Zentner & Patrik N. Juslin
The study of music and emotion should be at the very heart of music psychology. Yet this is a topic that has been
seriously neglected during the last few decades. Contemporary volumes on music psychology rarely discuss musical
emotions at any length. However, there has recently been a resurge of interest in research on music and emotion. The
intention of this symposium is to bring together researchers who have made theoretical and empirical contributions to
this the field in order to show current trends in the study of music and emotion. Topics addressed by this symposium
include: the nature of musical emotions, the relationship between emotion and cognition, the development of musical
emotions, similarities between speech and music, continuous recording of emotional responses, parallels and contrasts
between recognition and induction of emotion. The symposium is organised into two parts. Each part concludes with a
discussion of important problems featuring an invited expert as discussant.
Back to index
file:///g|/Wed/Jusymp.htm [18/07/2000 00:36:41]

Understanding the Artist: Exploring how singers are evaluated
Proceedings paper

Daniela da Coista Coimtra, Dimitra Kokotsaki, Jane W. Davidson
The Power of the Voice for the Listener

The singing performance, possibly more than any other sort of musical performance, involves a very
direct relationship between the performer and the audience. Firstly and most importantly, the singer
stands in front of the audience in a face-to-face position. There is no instrument between the two parts
of the performer and the listeners that may impair the view of the whole body of the performer. Thus,
the singers are often preoccupied with the physical fact that their bodies include their instruments
(Radocy, 1989). In this respect, the body is not only the initial but also the most fundamental means
through which communication between the singer and the audience is attempted and eventually
established. Indeed, singing per se incorporates two powerful factors, the voice and the body. These
constitute the primary level of human communication as it has been described by Noy (1993). Within
this system, while the objective knowledge is expressed on the secondary level by means of language,
the subjective feelings and experiences are transmitted on the primary level through the voice and all
the accompanying gestures and movements. It is evident that the combination of voice and movement
as means of primary communication, are active components in every singing performance.
Thus, through the physical and the vocal aspects of singing, the signal transmitted by the performer is
received by the audience. If this transmission is done successfully, it will most probably result in the
audience being affected.
Because of the inevitable vocal and physical requirements imposed upon singers to produce their
voices in a technically appropriate manner, it has been argued that they need to develop a high degree
of 'body sensitivity'. According to Howard (1982), there are three specific factors which lead to this
necessity. Firstly, singers cannot hear themselves in the same way as others hear them, secondly, there
will always be a particular sensation accompanying each sound produced and, thirdly, singers need to
develop the ability to relate particular desirable sounds with specific body sensations.
The relationship between the quality of the vocal production and the body again becomes apparent. In
this paper, however, we are primarily concerned with the way the intended meaning is finally
perceived by the audience. What in fact seems to be happening, according to Balk (1991), is that the
performer's inner processes guide him/her to project the required energy through his/her outer visible
and audible processes.
Performance observation
file:///g|/Wed/Coimtra.htm (1 of 7) [18/07/2000 00:36:43]

To date, there has been limited research interest in the topic. Wapnick (1997) carried out a brief study
on University entrance auditions for singers and discovered that singers who were more animated,
smiled more often and established more eye contact with the assessors were rated as attractive. Thus it
seems important to consider the role of gestures in the assessment of performance quality. Indeed,
Davidson (1991, 1993, 1994), in a series of studies about the content of body movements discovered
that musical intentions were more clearly revealed in physical gestures than in musical sounds,
suggesting the critical perceptual role of the body in understanding and communicating a musical
work to the audience.
Furthermore, Sundberg (1982) in a study on speech, song and the emotions argues that there is a
connection between the psychological emotion to be transmitted, the performer's external body
movements and the acoustic consequences of the gesture of the speech organ - the internal body
movements of the vocal tract which result in varying the timbre of the singing tone.
In spite of the fact that some performers and teachers are aware of the importance of body movement
in conveying musical expression, it is not clear the extent to which body movements are considered
important criteria in the assessment of musical performance.
It is noteworthy that in a study by Saunders and Holahan (1997) it was discovered that both teachers
and assessors discriminate readily between levels of technical and artistic attainment, and use these
two distinct categories to determine an area of 'performance error'. To date, no explanation of how the
elements that are said to constitute musical performance quality are judged has been attempted. There
is an acknowledgement that performances comprise both technical merit and aesthetic appeal like
Saunders and Holahan's work describe, yet there are no indications about what stocks of knowledge
are being drawn upon when assessments are made. The current study is an attempt to explore what
these criteria may be.
The population under consideration in the current study is the staff and students of the Vocal Studies
Department at the Guildhall School of Music in London where second year mid-term assessments
were examined.
The Research Design and Procedure

Twenty-one second year vocal studies students' mid-term performance assessment procedures were
investigated. A panel of four highly experienced singer/assessors sat in judgement of the students.
On an assessment sheet, the assessors were asked to make free comments about the student's
performance of each of three pieces sung by the student. At the bottom of the assessment sheet we
asked the assessors to award a grade. There were no specific written criteria for the grade bands, but
none of the assessors asked for clarification, so we were intrigued to explore what features they
believed constituted each category.
After the singer's performance was completed and he/she left the concert room, one further sheet of
specific questions was completed by each assessor. The answers to these questions would shed light
on the influence of the previous knowledge on assessment, what the assessors considered to make a
good/bad performance and what struck them most about the singer. In other words, we were trying to
establish their rating criteria.
Then, the assessors were asked to participate in a round table discussion, during which time each
assessor was asked to state his/her opinion. During this discussion, the assessors not only summarised

their own written reports, but they offered comments in order to construct a collaborative final report.
This report was eventually given to the students along with their grades. All performances were video
taped and all discussions were tape-recorded.
Results
In summary, using an interpretative phenomenological approach to analysing the data, the assessors
revealed that the following 'criteria' were being used:
- Technical Control
The assessors were very concerned about how the voice and body were controlled. That is, how the
technical aspects of singing were embodied. For instance:
'At the moment, what we hear is a good voice. Everything else is slightly lacking: not enough
connection, support, and engagement with body and brain. (...) At the moment, the small voice
and big physique don't match'.
Perhaps the most significant sub-theme to emerge from the technical issues was the concept of vocal
support. If a singer does not have a strong foundation of support, with the correct development of the
muscles in question, the voice cannot reach its full potential.
'Today there was little to help the sound. No teamwork between the muscles'.
'Posture- she collapses too much, needs to cool it physically'
'There is quite shocking neck tension and an overall lack of support. The tension begins to make
the voice wobble (...) whilst the potential is very impressive we feel as though it will be
compromised unless the physical tensions are removed'.
In summary, it seems that the assessors looked for an ideal connection in the body for the achievement
of technical control. Additionally, there was a concern that the mind and body were working in
synchrony for the performance:
'The voice seems very young- the inflexibility-vocally, mentally and physically this was
worrying'.
'The voice breaks because he does not know how to draw on his physical strength, and
this leads to not harnessing the passion that is in him'.
The single most striking and surprising criterion to emerge was the emphasis the assessors placed on
the physical appearance of the singers.
- The Body and Appeal
Comments of this type were very often of a personal nature. Here are several examples:
'Odd looking chap' (female assessor of a male singer)
'Visually: Odd make-up and ill-fitting cardigan' (male assessor of a female singer)
'Pretty girl. Stood like a dancer' (female assessor)
'A big guy. With a high lyrical voice' (female assessor)

'A rather puppet like physical appearance' (male assessor of a female singer)
'Very (oddly) splayed feet' (male assessor of a female singer)
'Bow ankles and sweater covered hands. You seem a bit motherly matronly in this outfit' (male
assessor of a female singer)
Physical appeal was very often the first thing the assessors noted, when asked to write their
impressions of the performer and his/her performance.
- 'Bodily Communication'
The assessors also focused heavily upon bodily communication, and more specifically upon aspects
related to the use of facial expression and eye contact, as shown in comments such as:
'A self-possessed beam'
'A visual "performing" element missing. A problem of self-image: Does he need/want to
develop as a performer?'
'Very appealing visual/facial expression', or on the contrary, 'Eyes dead. Blank face'
'Body involvement needed'
'Lovely freedom of body movement'
'Eyes attempted audience involvement'
In summary, one could say that the body was regarded not only as the physical support of the singing
process, but also as a means of expression and so, a primary means of communication with the
audience. From these comments it is evident that the physicality of singing and how this interfaces
with the performer's inner mental processes - what we believe the assessor's label of 'artistic
communication' to mean - are key criteria in the assessment process.
Indeed, the next most strongly emergent criterion employed by all of the assessors was 'artistry'. Here,
we try to de-construct what they mean by this term by the issues they raised to discuss allied to its use.
- The importance of artistry
We have identified three components of artistry: communication, performance personality and
presence.
Communication
1) Communicating meaning
The meaning of the song or aria's message was a central concern. The assessors suggest that ideally
the interpretation should be 'heart-felt', 'from the centre of the person', and therefore with
'self-possession'. The consequence of these terms seems to be that an expressive performance is
created emerging out of the individual's personality and presentation on stage.
2) Interacting with the audience
The singers showed commitment in interacting with the audience in various degrees and in different
styles varying from physically approaching the audience to simply smiling, or introducing the
performance pieces to explain their performance intentions.

Connected to the concept of singer-audience interaction, there was much discussion of the singer's
'presence' - the assessors referred to it as the singer's projection of the 'self' on the stage.
Presence
The more focused the singer is on transmitting the musical intention with a strong projection of
personality, the more the assessors seem to be captivated and willing to interact, and therefore,
consider the singer as being 'appealing'. The underlying logic may be, as one of the assessors
commented: 'The singer is so focused that nothing interferes with our relationship, and so, not only the
composer's message is important, but the singer also acknowledges that I am here and I am important
too'.
If the singer is not sufficiently 'present', if he or she does not bring sufficient personality to the stage,
the assessors feel either that there is a lack of energy or interest, or that the singer is hidden behind the
song's message. Lack of energy or interest in acting is then considered as a deficiency in the
performance itself:
'A lovely sound, but rather disappointing as she doesn't get involved as an artist'
Singing Personality
Although the assessors are aware of a 'performing personality' which seems to be different to the
'inner self', on several occasions a process of identification between the performing and the inner
personality occurred simultaneously. This was perhaps due to the fact that the singer may have
identified with the song and so internalised its meaning, or contrarily, the singer may have acted in a
rather convincing way:
'Charming girl-Charming voice.'
'A sweet and sunny personality. A sweet and sunny voice.'
'The Barber was transfixing. Lots of intelligence, self-possession and humility here.'
'I am just flooded with pictures of Sarah Black, his girlfriend-Why not? This is the reality of
this song.'
'This is an engaging performing personality showing great intensity. It is all engaged and
heartfelt.'
From the above-mentioned it can be extrapolated that singers are expected to display their emotions in
overt behaviour. It seems that the assessors expect the overt behaviour to show internal states. How
much of this is 'acting' and what effect this 'acting' has on the singers' personality and identity is
clearly a fascinating emergent issue to which we have no further insight at the present time. We are all
aware of the cultural stereotype of the 'luvvy', 'loud' and 'extravagant' personality of the operatic Diva.
We would have to ask whether the job demands this kind of behaviour or if this kind of person is
attracted to the stage.
Discussion
From the analysis of the data, it is evident that in assessment the body is a critical factor for
consideration: how does the body look, how it is presented; how is the singing physically prepared
and executed? The interface of personality through both music and stage presence was critically

important. Also, the appropriateness of repertoire to the singer's level of achievement is of great
influence in assessing performance.
The results clearly show two different dimensions of criteria used by the assessors; those related to the
technique of sound production; and those related to the presentation of musical content such as
emotional expression and the personality of the interpreter. These two dimensions proved to be highly
interrelated, since it is evident that a correct technique not only enables but also integrates a greater
degree of artistry, which would be the main aim of each performance.
However, in the assessment procedure, technical proficiency and artistry seem to work as
'compensation laws'. That is, if a student has not acquired a sufficient technical proficiency, the
assessors feel obviously more involved and touched with the performance artistically; if everything is
technically correct but does not involve emotional expression and personality, the technical content
becomes the central focus. Hence the compensation of technical for artistic and vice-versa.
As far as technical proficiency is concerned, comments, critiques and even solutions were presented to
the singers in a far more objective way than the comments made about the artistry level of the
performance. Maybe this occurs because technical proficiency is less subject to stylistic and social
influence, and relates to more concrete (bodily, facial and physical) aspects of the sound production.
Another emergent criterion of assessment was the importance of body movements as a means of
expression, on the one hand, of the emotional state of the singer and, on the other, as a means of
conveying structural and expressive features of the music to the audience. Therefore, it would be
relevant to explore the existence of a vocabulary of both body movements and phonation processes,
which would enable the singers to achieve better performances.
References
Balk, H.W. (1991). The Radiant Performer: The Spiral Path to Performing Power. University of
Minnesota Press, Minneapolis.
Davidson, J.W. (1991). The perception of expressive movement in music performance. Unpublished
doctoral dissertation. City University, London.
Davidson, J.W. (1993). Visual perception of performance manner in the movements of solo
Davidson, J.W. (1994). What type of information is conveyed in the body movements of solo
Howard, V.A. (1982). Artistry: the work of artists. Hackett, Indianapolis.
Noy, P. (1993). How Music Conveys Emotion. In S. Feder, R.L. Karmel and G.H. Pollock (Eds.).
Psychoanalytic Explorations in Music. International Universities Press, Madison, Connecticut.
Radocy, R.E. (1989). A review of Singing and Self: The Psychology of Performing (by S.E.
Stedman). Council for Research in Music Education, 100, 23-26.
Saunders, T.C. & Holahan, J.M. (1997). Criteria-specific rating scales in the evaluation of high school
instrumental performance. Journal of Research in Music Education, 45, 259-270.

Sundberg, J. (1982). Speech, song and emotions. In M. Clynes (Ed.). Music, Mind and Brain: the
neuropsychology of music. Plenum, New York.
Wapnick, J., Darrow, A.A., Kovacs, J. & Dalrymple, L. (1997). Effects of physical attractiveness on
evaluation of vocal performance. Journal of Research in Music Education, 45(3), 470-479.
Back to index

Dr. Peter Webster
Proceedings Abstract
FORMAL APPROACHES TO THE EVALUATION OF MUSIC COMPOSITIONS OF CHILDREN BY
EXTERNAL JUDGES: RATING SCALES, RUBRICS AND OTHER TECHNIQUES
Dr. Peter Webster
pwebster@nwu.edu
Background:
Developing effective systems of formal feedback to children about compositional

efforts is a difficult problem for music teachers. For example, the state of
New York in the USA is currently attempting to develop a system of evaluating
student compositions as part of their state standards effort. It is possible to
easily evaluate whether a composition has met the technical demands of the task
(i.e. the number of measures, the presence of certain rhythmic and melodic
structures, the correct notational rules, and so forth), but the development of
more aesthetic-based assessment concerns (i.e. effective use of expression,
aesthetic appeal, craftsmanship, musical syntax) is much more problematic.
Aims:
The aim of this paper is to provide an overview of the major research-based and
conceptual work that is available on the evaluation of children's music
compositions which use rating scales, rubrics, checklists and other
psychometric techniques. Open-ended items will also be considered in this
analysis. Problems and opportunities of these approaches will be summarized.
Work from the international literature will be reviewed including the studies
from the British Journal of Music Education and the Australian literature. Work
from various state committees in the United States will be included as will
evaluation efforts like Harvard's Project Zero and the National Assessment of
Educational Progress. The emphasis of the review will be to spotlight the more
aesthetic-based assessment efforts.
Main contributions:
see above
Implications
Data from this review will be useful in designing new assessment tools and for
evaluating their effectiveness more rigorously.
Back to index
file:///g|/Wed/Websteab.htm [18/07/2000 00:36:43]

DRIFT AND TIMING VARIABILITY IN ISOCHRONOUS INTERVAL PRODUCTION WITH AND WITHOUT MUSIC IMAGERY
DRIFT AND TIMING VARIABILITY IN ISOCHRONOUS INTERVAL PRODUCTION WITH

AND WITHOUT MUSIC IMAGERY
Guy Madison, Department of Psychology, Uppsala University, Box 1225, SE-751 42 UPPSALA,
Sweden
Background. There has been some recent interest in the fact that isochronous serial interval
production (ISIP) exhibits a substantial amount of drift - i.e. higher-order dependencies. However,
these results are typically obtained in rather "unmusical" tapping experiments where singing or else
subdividing the intervals is prohibited. The question is therefore how
valid drift is in the typical ISIP context; performing (to) music.
Aims. To show if and how the cognitive representation of music affects ISIP drift and dispersion.
Specifically, one could hypothesize that music imagery would function as a subdividing temporal
"glue", and (a) generally decrease drift and/or dispersion, (b) decrease the difference in drift between
long and short inter response intervals (IRI) or, (c) impose
variability patterns that correspond to the music structure.
Method. Participants in a tapping experiment were later recruited to imagine listening to recordings of
familiar songs (selected for being stable in tempo). They were asked to play along with specified
multiples or subdivisions of the beat (corresponding to the IRIs in the tapping exp.: 0.5, 0.8, 1.1, and
1.5 s) and to maintain a stable tempo.
Results. The coefficient of variation (SD/M) and the local fluctuations in tempo were not affected by
music imagery, whereas long-term (monotonous) drift was larger during imagery for 1.1 and 1.5 s
IRI. Although autocorrelation functions were obscured by non-stationarity for IRIs above 0.8 s, there
was a clear effect of imagery for 0.5 and 0.8 s: While the first two lags were typically moderately
positive or negative (» ± .1) for the tapping data, the imagery data demonstrated periodicity for
various higher lags.
Conclusions. A musical context does not seem to improve simple timing performance,
although it does affect the variability patterns. A comparison with subdivision in tapping experiments
is the first step to explain these findings.
file:///g|/Wed/Madison.htm [18/07/2000 00:36:44]

Keele paper NJCB
Proceedings paper
Prosody, meaning and musical behaviour

Nicholas Bannan
(International Centre for Research in Music Education, University of Reading, UK)
Introduction: musical behaviour and the 'continuity paradox'

In the opening chapter of his book Language and Species, the linguistic scientist Derek Bickerton
establishes that evolutionary explanation of the origins of human language poses a 'Continuity
paradox' :
...language must have evolved out of some prior system, and yet there does
not seem to be any such system out of which it could have evolved.
(Bickerton, 1990, p. 8)
Bickerton proceeds to define the properties of human languages and to illuminate the means by which
speech is acquired, as well as to examine claims for linguistic abilities in other species such as the
chimpanzee.
Like Pinker (1994) and Juscyk (1997), Bickerton focuses especially on the properties of generative
grammar which are universal across languages, as formulated in the work of Chomsky (1975). In this
linguistic tradition, tools for the analysis of the syntactical, semantic and lexical elements of language
have been developed which explore convincingly the cognitive scaffolding through which language is
acquired and the structures on which its employment depends. The biological basis of language in
adapted respiration (Deacon, 1997, pp. 247-252), and its acoustical components (Laitman, J., et al,
1990) and antecedents in animal communication (Scherer, 1992) have, by comparison, received less
attention within this tradition. An outcome of this divergence of methodologies is Pinker's (1997)
conclusion that "music...shows the clearest sign of not being (an adaptation)": a hypothesis quite at
variance with that supported within the fledgling field of biomusicology (Wallin, 1991; Bannan, 1997;
Skoyles & Vaneechoutte, 1998; Cross, 1999; Wallin, Merker & Brown, 2000) that, to quote Tomatis
(1991), 'music is the substrate of language'.
Bickerton (990) argues that speech allows humans to exchange representations of the world which
language permits us to formulate: the feat of representation is as significant as communication. The
latter may be present, even elaborate, in a variety of species of monkey, bird and cetacean; but the
former with its empowerment of self-consciousness, is exclusive to our species.
This paper seeks to question assumptions that representation is confined to syntactic components of
language, and to assert that, by contrast, meaning can be both represented and communicated by
features of language which draw on musical perception and production.
file:///g|/Wed/Bannan.htm (1 of 7) [18/07/2000 00:36:46]

Keele paper NJCB
The human being as musical animal

Critical adaptive variations in our forbears have given rise to the phenomenon that each of us is a
walking musical instrument. Amongst the most significant are: upright posture, the development of
voluntary breathing, the descent of the larynx, the neoteny of the adult skull-shape, 'modern' dentition
and the cerebral processing on which musical perception and productivity depend (Bannan, 1997,
drawing on Gould, 1977; Mithen, 1996; Jürgens, 1992). The fossil record provides a tantalising
outline of this process (Tobias, 1987; Laitman et al, 1990). Comparison of how this capacity for
vocalisation is employed by modern humans not just in different languages but also in the song of
various cultures (Tumat, 1992; Campbell, 1991; Dargie, 1988; Blacking, 1987) illustrates the
extraordinary flexibility and efficiency of the musical design we carry in our genes. It plays a key role
in our development of responses to our environment even before birth (Woodward, 1992), to our
carers (Locke, 1993), and, in the cultures in which singing still flourishes, to our understanding of
who we are and how we should behave. It permits us to 'join in'.
Whilst ritual and co-ordinated behaviour varies enormously between cultures, the capacity for
simultaneous action moderated by musical response would seem to be an inseparable aspect of this
biological inheritance (Merker, 2000). It has its parallels in the animal kingdom:
There is something utterly awe-inspiring abut large group of animals -

especially their apparent unity of purpose. We wonder where it comes from,
how the individuals know what the group is supposed to do, and how they
play their part in achieving it. It is not just flocks of birds that exhibit
striking patterns of collective behaviour. Schools of fish create glittering
swirls of movement in tropical oceans, flashing this way and that, but never
leaving the group - stopping and starting in an instant.....
What can possibly be responsible for the remarkable behaviour of social
animals? What gives the appearance of possessing a group mind, as if some
central conductor were orchestrating their behaviour? Catchall terms such as
instinct only deepen the puzzle: They surely do not solve it, for what is
instinct? How come a humble termite is so massively endowed with instinct
that a group of them knows how to install air conditioning in the nest? It's
not instinct. It's rules....
There is - I strongly suspect - no genetic instruction "form a flock" in a bird.
Instead, there are genetic and behavioural analogues of the rules that
produce flocks. Evolution has constrained the repertoire of bird behaviour,
both genetically and culturally, to incorporate such rules.
(Stewart, 1998, pp. 196-7)
All the more so, then, with homo sapiens, for which species the project is underway (Merker, in
Wallin et al, 2000; Skoyles & Vaneechcoutte, 1999; Bannan, 1997) to investigate the origins of its
communicative behaviour in simultaneous action in which the medium of co-ordination is that of
sound. Stewart's mathematically-derived principles form a basis for modelling this hypothesis. He

Keele paper NJCB
insists that 'evolution has favoured the rules themselves and not their consequences'. His analysis
provides criteria for rule-based, species-specific behaviour as genetically and culturally evolved which
define the properties which need to be tested to illuminate such claims. His reasons are as follows,
somewhat adapted to illustrate their applicability to human vocality:
1 efficiency
2 consistency
3 adaptability
4 dependence
• efficiency: the rules are simpler than the behaviours they generate
• consistency: protection against the consequences of 'rogue' mutation
• adaptability: small changes in the rules cause big changes in behaviour
• dependence: sensitivity to the group protects the individual
The rules of engagement for musical behaviour

This promises to become an enormous field well beyond the scope of this paper, so representative
samples must suffice:
mimicry has a different role in musical interaction from verbal;
verbal aptitude can only develop through mimicry;
control of all the elements of musical production (pitch; duration; volume; tone)
can be achieved both via mimicry and through spontaneous play.
This all seems very general, so let's get down to specifics. In order to be understood in English, one
needs command of some 15 different tone colours (vowels).
The Main Vowels in English

illustrated by words and names beginning with the phoneme B and ending with T :
Boot (english) [Boo-wot (geordie)]

Boat
Bought
Bott
Bart
Bat

Keele paper NJCB
Bet
Bate
Bit [Birt/Burt]
Beet
Boot (scottish) [Bute]
Failure to select and perform the correct vowel sound requires the listener to rely more on context to
correctly decipher what is being communicated. One can see here a separation of roles between the
meaning derived from the sound of words themselves and that derived from context dependent on
grammatical relationships.
Experiments with sound and meaning
Bickerton (1990) adopts instances from his studies of pidgin languages to illustrate the limitations of
meaning which arise where primitive syntactic structures fail to embed one phrase within another. His
purpose in doing so was to test structural assumptions in Premack's (1985) contribution to the
'continuity paradox' debate, arising from his studies of chimpanzee communication, wherein he
extrapolated a hypothetical 'inter-language' which might bridge the gap between 'animal'
proto-language and (human) 'true language'.
On paper, Bickerton's resolution of ambiguities through the use of conjunctions and reflexives
conveys clearly the dependence of meaning on the certainties which are provided by the evolved
capacities of advanced grammar, and we can marvel at the engineering achievement this view of
language represents. But what of the pidgin user, himself probably illiterate and oblivious of such
means of parsing the speech he is uttering?
John tell Bill boil milk

'Without grammatical items, it would be impossible to determine whether
(this) meant 'John told Bill to boil the milk' or 'John told Bill the milk was
boiling' (Bickerton, 1990, p. 178)
[It could also be taken to mean: 'John - go and tell Bill to boil the milk' or
'John - go and tell Bill that the milk has boiled']
Bickerton lays down a challenge with his statement 'without grammatical items it would be
impossible' to determine what is meant. Let's take the alternatives available; the two Bickerton cites
plus two other potential 'performances'. Can intonation do the job of 'grammatical items'? If so, how
can the phrase
John tell Bill boil milk
be inflected in spoken language to yield these meanings?:
1 John told Bill to boil the milk
2 John told Bill the milk was boiling

Keele paper NJCB
3 John - tell Bill to boil the milk

4 John - tell Bill the milk has boiled
Experimentation with a listener illustrates immediately that phrases 1 and 3 could be unambiguously
conveyed through intonation. Phrases 2 and 4 are more problematic, until the word order is changed,
at which point
John tell Bill milk boil
could be inflected to convey both meanings clearly.
Bickerton anticipated this change of word order, citing it as a property of 'the mechanisms of true
language, even without grammatical items'. However, attempts to convey the meanings of phrases 2
and 4 without re-ordering leads to further distinction of meaning, through the emphasis placed on, say,
boil (as opposed to roast or whip) and milk (as opposed to oil or whisky). The possibility also exists
of forming various questions:
Did John tell Bill to boil the milk? etc.
through the convention of creating a contour for the phrase which rises in pitch.
Further experiments can be designed to yield meaning out of nonsense. In addition to two old
favourites of the teacher of English punctuation (and what is punctuation but the notation of
characteristics of timing and tone of voice?), it seems appropriate to borrow from Pinker (1994) an
example which provides the means to question his own subsequent position regarding the
evolutionary relationship of musical behaviour and language.
Intonation vs syntax
an experiment which the reader is invited to replicate
1 Smith where Jones had had had had had had had had had had had his teacher's approval
2 There's too much gap between George and and and and and Dragon
(traditional English punctuation-test examples)
3 Buffalo Buffalo Buffalo Buffalo Buffalo Buffalo Buffalo Buffalo

'A kind of bison that comes from Buffalo, New York, could be called a Buffalo buffalo. Recall that
there is a verb to buffalo that means "to overwhelm, to intimidate". Imagine that New York state bison
intimidate one another: (The) Buffalo buffalo (that) Buffalo buffalo (often) buffalo (in turn) buffalo
(other) Buffalo buffalo.'
[Buffalo buffalo buffalo Buffalo buffalo buffalo Buffalo buffalo]
(Pinker, 1994, p. 210)

Keele paper NJCB
Conclusions
In Wallin et al (2000), a range of researchers in animal communication, anthropology, linguistics,
music theory and neurology considered different aspects of The Origins of Music. Bickerton made his
own, cautious contribution, but not before exposure to the influence of others involved:
Until I heard the stunning presentation by François-Bernard Mâche, I would
probably have said, by analogy with language, that music was unlikely to be
in any sense a continuation of nonhuman song or any other form of
behaviour. After I heard Mâche's recordings of a vast range of different
traditions in human music, each one accompanied by an eerily similar effect
produced by an avian, mammalian, or even amphibian species, I was not so
sure. If anyone could produce such a performance with linguistic material, I
would be tempted to convert to continuism overnight.
(Bickerton, in Wallin et al, 2000, p. 161)
Can Bickerton himself being moving towards a position in which he embraces evolved, musical
vocalisation as a resolution of the 'continuity paradox' he defined?
Bibliography and References
Bannan, N. (1997) 'The consequences for singing teaching of an adaptationist approach to vocal
development', in Proceedings of the First International Conference on Music in Human Adaptation,
VirginiaTech/MMB Music Inc. (pp. 39-46)
Bickerton, D. (1990) Language and species Chicago: University of Chicago Press
Blacking, J (1987) 'A commonsense view of all music' Cambridge: Cambridge University Press
Campbell, P S (1991) Lessons from the world New York: Schirmer
Chomsky, N. (1975) Reflections on Language New York: Pantheon
Cross, I. (1999) 'Is music the most important thing we ever did? Music, development and evolution',
in Music, Mind and Science (Suk Won Yi), Ed., Seoul, Korea: Seoul National University Press
Dargie, D (1988) Xhosa music Cape Town: David Philip
Deacon, T. (1997) The symbolic species London: Allen Lane
Gould, S (1977) Ontogeny and Phylogeny Cambridge, Harvard University Press
Jürgens, U (1992) 'On the neurobiology of vocal communication', in Papousek, H. ,
Jürgens, U. and Papousek, M. Nonverbal vocal communication Cambridge University Press
Juscyk, P. (1997) The discovery of spoken language Cambridge, Mass.: MIT Press
Laitman, J., Reidenberg, J., Gannon, P., Johanson, B., Landahl, K. & Lieberman, P.(1990)'The Kebra
hyoid: what can it tell us about the evolution of the hominid vocal tract?' American Journal of
Physical Anthropology 18, 254

Keele paper NJCB
Locke, J (1993) The child's path to spoken language Cambridge, Harvard University Press
Mithen, S. (1996) The prehistory of the mind London: Thames & Hudson
Pinker, S. (1994) The Language Instinct London: Allen Lane
_________(1997) How the Mind Works New York: Norton
Premack, D (1985) ''Gavagai!' or the future history of the animal language controversy' Cognition 19:
207-96
Scherer, K (1992) 'Vocal affect expression as symptom, symbol and appeal' in Papousek, Jürgens and
Papousek Nonverbal vocal communication Cambridge University Press
Vaneechoutte, M. and Skoyles, J.R. (1998) 'The memetic origin of language: modern humans as
musical primates.' in Journal of Memetics - Evolutionary Models of Information Transmission, 2.
http://www.cpm.mmu.ac.uk/jom-emit/1998/vol2/vaneechoutte_m&skoyles_jr.html
Stewart, I. (1998) Life's other secret: The new mathematics of the living world London: Allen Lane
Tobias, P. (1987) 'The brain of Homo habilis: a new level of organisation in cerebral evolution'
Journal of Human Evolution 16, 741-61
Tomatis, A.A. (1991) The conscious ear Barrytown, NY: Station Hill Press
Tumat, G. (1992) Vocal solos in Tuva: Voices from the Land of the Eagles Leiden: Pan Records
Wallin, N. (1991) Biomusicology New York: Pendragon
Wallin, N., Merker, B. and Brown, S. (2000) The origins of music Cambridge, MA: MIT Press
Woodward, S. (1992) 'The transmission of music into the human uterus and the response to music of
the human fetus and neonate'. Unpublished doctoral dissertation, University of Cape Town, South
Africa.
Back to index

Zentner
THE NATURE OF MUSICAL EMOTIONS
Marcel R. Zentner
University of Geneva
40, Bd. du Pont d'Arve
CH - 1205 Geneva
SWITZERLAND
Background:
1. Scholars tend to rely upon labels of every-day life emotions when studying emotional responses to music.
However, it is unclear whether such emotional labels are musically plausible.
2. While past research has systematically examined the efficacy of different film excerpts to induce emotion,
comparable research with music excerpts is still fragmentary.
Aims & Methods:
Two studies with participants from diverse populations and age groups were conducted to empirically examine the
nature, structure, and organisation of musical emotions and to identify music excerpts that are effective in eliciting
certain emotions. The aim of Study 1 was to provide an empirical basis for developing a lexicon of musically plausible
affect terms. 138 subjects had to rate 149 affect terms (derived from a pre-study) in regard to the frequency with which
these states were both expressed and induced by their preferred style of music (classical, pop-rock, or techno). Rarely
occurring states were excluded yielding a reduction of approx. 80 emotion labels. The aim of the second study was (a)
gather comparative ratings on the emotional effects of a variety of music excerpts, (b) examine the basic dimensions of
musical emotions, and (c) on this basis, refine our lexicon of musically plausible affect terms. Subjects (N=184)
listened to 20 excerpts of either classical or rock-music and rated them on the new set of emotion words derived from
Study 1.
Results:
Study 1: First, occurrence of emotions, be they expressed or induced, differs according to the musical style. Second,
across all emotions and musical styles, there is a considerable difference between expression and arousal of emotion.
Third, there are also significant interactions between emotion clusters, musical styles and emotion modality (expressed
vs. induced).
Study 2: Factor analyses were carried out to identify a number of basic "musical emotions". Furthermore, the
proposenity of the music excerpts to arouse different emotions is described.
Implications:
Implications are discussed for the development of a taxonomy of "musical emotions", a scale to measure "musical
emotions", and for the choice of music excerpts to be used for emotion induction.
Back to index
file:///g|/Wed/Zentner.htm (1 of 2) [18/07/2000 00:36:47]

Zentner
file:///g|/Wed/Zentner.htm (2 of 2) [18/07/2000 00:36:47]

Auditory roughness estimation of complex tones
Proceedings paper
Auditory roughness estimation of complex spectra

Pantelis N. Vassilakis
UCLA, Systematic Musicology, Music Cognition and Acoustics Laboratory
Box 951657, Los Angeles, CA 90095-1657
E-mail: pantelis@ucla.edu
i Introduction
During the last forty years a number of models quantifying auditory roughness have been proposed
and have been employed in a series of studies, demonstrating a relatively low degree of predictive
power. Correct estimation of the degree of roughness of a pair of sines or of an arbitrary spectrum is
necessary before any claimed link between roughness and some acoustic, perceptual, or musical
variable can be tested, as well as an important step towards the difficult task of quantifying
inharmonicity.
Roughness is one of the perceptual attributes of amplitude fluctuation. Musical sounds are represented
by vibration signals whose characteristics, practically always, change with time. Amplitude
fluctuations (variations of a signal's amplitude around some reference value) constitute one such
change and can be placed in three broad perceptual categories depending on the amplitude fluctuation
rate. Slow amplitude fluctuations ( 20 per second) are perceived as loudness fluctuations referred
to as beating. As the rate of amplitude fluctuation increases, the loudness appears to be constant and
the fluctuations are perceived as 'fluttering' or roughness. The roughness sensation reaches a maximal
strength and then gradually diminishes until it disappears ( 75-150 amplitude fluctuations per
second, depending on the actual vibration frequency). These distinct perceptual categories do not
reflect any fundamental qualitative differences in the vibrational frame of reference and should be
approached as alternative manifestations of a single physical phenomenon.
If we accept that the ear performs a frequency analysis on incoming signals, the above perceptual
categories can be related directly to the bandwidth of the analysis filters. For example, in the simplest
case of amplitude fluctuations resulting from the addition of two sine signals, the following statements
represent the general consensus: If the filter bandwidth is much larger than the fluctuation rate then a
single tone is perceived, either with fluctuating loudness (and sometimes pitch) or with roughness.
And if the filter bandwidth is much smaller than the fluctuation rate then a complex tone is perceived,
to which one or more pitches can be assigned but which, in general1, exhibits no loudness fluctuation
or roughness.
In the first case, the degree, rate, and shape (sine / complex) of amplitude fluctuations are parameters
file:///g|/Wed/Vassilak.htm (1 of 8) [18/07/2000 00:36:51]

that are manipulated by musicians of various cultures, exploring the beating and roughness sensations.
Manipulating the degree and rate of amplitude fluctuation helps create a shimmering (i.e. Indonesian
gamelan performances) or rattling (i.e. Bosnian ganga singing) sonic canvas that becomes the
backdrop for further musical elaboration. It permits the creation of timbral (i.e. Middle Eastern mijwiz
playing) or even rhythmic (i.e. ganga singing) contrasts through gradual or abrupt changes between
fluctuation rates and degrees2.Whether those contrasts are explicitly sought for (as in ganga singing,
mijwiz playing, or even the use of 'modulation' wheels/pedals in modern popular music) or happen
more subtly and gradually (as may be the case in the typical chord progressions/modulations of
Western music), they form an important part of a musical tradition's expressive vocabulary.
Important clues regarding the ways various musical cultures approach roughness and other perceptual
attributes of amplitude fluctuation may be found through an examination of musical instrument
construction and performance practice. Additionally, the different choices among musical traditions
with regards to vertical sonorities (i.e. harmonic intervals, chords, etc.) can reveal a variety of
attitudes towards the sonic possibilities opened up by the manipulation of amplitude fluctuation in
general and the sensation of roughness in particular.
Similarly to the sensation of beats, the sensation of roughness has often been associated with the
concepts of consonance/dissonance, whether those have been understood as aesthetically loaded
(Rameau, Romieu, in Carlton, 1990; Kameoka & Kuriyagawa, 1969a; Terhardt, 1974a&b, 1984) or
not (Helmholtz, 1885; Hindemith, 1945; von Békésy, 1960; Plomp & Levelt, 1965.) Some of the
studies addressing the sensation of roughness have occasionally (i.e. Stumpf, 1890, in von Békésy,
1960: 348; Vogel, 1993; etc.) been too keen to find a definite and universally acceptable justification
of the 'natural inevitability' and 'aesthetic superiority' of Western music theory. This has prevented
them from seriously examining the physical and physiological correlates of the roughness sensation.
On the contrary, Helmholtz, the first researcher to tackle theoretically and experimentally the issue,
concluded that:
Whether one combination [of tones] is rougher or smoother than another depends solely
on the anatomical structure of the ear, and has nothing to do with psychological motives.
But what degree of roughness a hearer is inclined to ... as a means of musical expression
depends on taste and habit; hence the boundary between consonances and dissonances
has frequently changed ... and will still further change... (Helmholtz, 1885: 234-235.)
The present study adopts this position and treats the sensation of roughness simply as a perceptual
attribute that can be manipulated through controlling the degree and rate of amplitude fluctuation,
providing means of sonic variation and musical expression.
ii Existing roughness estimation models and their application
The two principle studies that have systematically examined the sensation of roughness (von Békésy,
1960: 344-354, Terhardt, 1974a;) have, to a large extend, been ignored by the majority of models
quantifying roughness / smoothness. Numerous such models have been proposed (Plomp & Levelt,
1965; Kameoka & Kuriyagawa, 1969a&b; Hutchinson & Knopoff, 1978), and have been employed in
later studies (Bigand et al., 1993; Vos, 1986; Dibben, 1999;) demonstrating a low degree of agreement
between predicted and experimental data. Dibben, for example, found no correlation between sensory
consonance (smoothness), as predicted by the Hutchinson & Knopoff model, and the completeness /
stability ratings of the final bars of selected musical pieces. She concluded that sensory consonance /
dissonance is not a good measure of musical stability / tension, or completeness / incompleteness,
interpreting her conclusion as supporting the need for an alternative model of consonance /
dissonance. Her study is a good example of an attempt to load the concept of consonance with

meanings that go far beyond the scope of the model employed. It basically demonstrates that the
degree of smoothness of a vertical sonority is not a good measure of its sense of stability or
completeness. This result could have been anticipated since the 'sense of stability' of any given event
may be highly related to the events that precede it, while roughness models calculate the roughness of
isolated vertical sonorities. The surprising fact is not the results of Dibben's study but the implied
expectations that: a) a measure of 'smoothness' could correlate with multidimensional and highly
temporal and context dependent notions such as stability or completeness, and b) any model of
consonance / dissonance should map to stability / tension responses. It appears that the concept of
consonance (even more than that of timbre - see Bergman, 1990: 93) has been a 'wastebasket' of all
kinds of aesthetic and evaluative judgments in music, as well as the box of treasures for justification
arguments regarding general stylistic trends or specific compositional decisions.
There are, however, many reasons (other than those posited by Dibben) for the revision of the existing
models quantifying roughness, some of which have already been pointed out by other researchers and
some of which will be addressed by the present study.
Vos (1986) pointed out a number of inconsistencies in the Plomp & Levelt and the Kameoka &
Kuriyagawa models3, with regards to the critical bandwidth model derived from loudness summation
experiments (Zwicker et al., 1957). In his study, Vos suggested some adjustments that would bring the
predictions of all three models to a better agreement. Hutchinson & Knopoff's model has been
criticized (Bigand et al. 1996) for its relatively crude representation of the nonlinear relationship
between the amplitude fluctuation rate corresponding to maximum roughness and the frequency of the
lower of the interfering sines.
A recent model (Sethares, 1998) has the advantage of being based on a large number of direct
smoothness / roughness experimental ratings of pairs of sines, fitting a function that accounts for the
above mentioned nonlinear relationship. Sethares' model offers the best theoretical fit to the observed
relationship between roughness, frequency separation of the two interfering sines, and frequency of
the lower sine. In this model, the experimentally derived roughness curves (i.e. graphs plotting the
perceived roughness of a pair of sines with equal amplitudes as a function of their frequency
separation) are essentially interpreted as positively skewed gaussian distributions:
Eq. (1)
where x represents an arbitrary measure of the frequency separation (f2 - f1), while b1 & b2 are the
rates at which the function rises and falls. Using a gradient minimization of the squared error between
the experimental data (averaged over all frequencies) and the curve described by Eq. (1) gives: b1 =
3.5 and b2 = 5.75. For these values, the curve maximum occurs when x = x* = 0.24, a quantity
interpreted as representing the point of maximum roughness. To account for the non-linearity in the
relationship between the fluctuation rate corresponding to maximum roughness and the frequency of
the lower sine, Sethares introduced the following modification, which includes the actual frequency
spacing ( ) and the frequency of the lower component ( ), into the calculation of roughness
(R):
Eq. (2)

where b1 = 3.5, b2 = 5.75, x* = 0.24, and .
The parameters s1 and s2 allow the function to stretch / contract with changes in the frequency of the
lower component so that the point of maximum roughness always agrees with the experimental data.
A least square fit gave s1 = 0.0207 and s2 = 18.96.
iii Drawbacks of existing models - Introducing a new roughness estimation model

An inaccuracy that Sethares' model shares with all earlier ones regards the expected contribution of
the amplitudes of the interfering sine-pairs (and therefore of the degree of amplitude fluctuation of the
resulting complex signal4) to the degree of roughness. The roughness function is usually multiplied by
the product of the two amplitudes ( ), ensuring minimum roughness if either of the amplitudes
approaches zero. At the same time, however, it severely overestimates the increase in roughness with
increasing amplitudes and, most importantly, it fails to capture the relationship between the amplitude
difference of two sines close in frequency and the salience of the resulting beats or roughness.
Terhardt (1974a) examined experimentally the influence of modulation depth (m) and sound pressure
level (SPL) on the roughness of amplitude modulated tones, as well as the relationship between the
roughness of amplitude modulated tones (modulation frequency = fmod, modulation depth = m = 1)
and the roughness of tone pairs that result in amplitude fluctuations of the same rate (
) and degree (A1 = A2.) By manipulating modulation depth, Terhardt attempted to

essentially link the degree of amplitude fluctuation to roughness. As it is shown in a different study
(Vassilakis, in preparation), however, modulation depth and degree of amplitude fluctuation are not
quantitatively equivalent. Therefore, the functions that describe the above relationships (as revealed
by Terhardt) need to be adjusted accordingly:
a) The power function that describes the relationship between degree of amplitude fluctuation (m) of
an AM tone and perceived roughness (R) is adjusted from Terhardt's (c: constant) to:
Eq. (3)
b) The contribution of SPL to the sensation of roughness of AM-tones (Eq. (4)) is negligible,
especially when compared to the contribution of the degree of amplitude fluctuation (Eq. (3)):
(c: constant) Eq. (4)
c) The roughness of a beating tone pair (f1, f2; A1, A2), , is related to the roughness of an AM
tone ( ; , ), , as follows:
Eq. (5)

Eqs. (3), (4) & (5) illustrate that all existing models calculating the roughness of pairs of sines (Plomp
& Levelt, 1965; Kameoka & Kuriyagawa, 1969a & 1969b; Hutchinson & Knopoff, 1978, Sethares,
1998), have largely underestimated the importance of amplitude fluctuation depth5 (i.e. relative
amplitudes values), while overestimating the importance of SPL (i.e. absolute amplitude values.)
Combining Eqs. (2), (3), (4), & (5) gives the new model for the calculation of the roughness, R, of
pairs of sines (with frequencies f1 & f2, amplitudes A1 & A2 [ ], and zero initial phases):
Eq. (6)
where b1 = 3.5, b2 = 5.75, , x* = 0.24, s1 = 0.0207, & s2 = 18.96.
The roughness of complex spectra with more than two sine components will be calculated by adding
up the roughness of the individual sine-pairs. Although von Békésy (1960: 350-351) has suggested
that the total roughness can be less than the sum of the roughness contributions of the individual
sine-pairs, depending on the relative phase of the respective amplitude fluctuations, initial
experiments indicated otherwise confirming previous experimental results (Terhardt, 1974a.)
iv Testing and application of the new roughness estimation model:

'Roughness degrees and harmonic interval consonance / dissonance ratings'
It is argued that a) Eq. (6) is able to capture subtle variations in roughness degrees and b) in the
Western musical tradition where sensory roughness is in general avoided as 'dissonant,' the
consonance hierarchy of harmonic intervals corresponds to subtle variations in roughness degrees.
More specifically, two hypotheses are postulated: a) dissonance ratings match the roughness degrees
estimated by the model and determined by listeners and b) for musicians within the Western musical
tradition, roughness ratings of harmonic intervals within the chromatic scale match the dissonance
degrees suggested by Western music theory.
Experimental design - procedure - analysis:
The thirteen harmonic intervals of the chromatic scale (unison and octave included) within the octave
above middle C (fundamental frequency: 256Hz) constitute the experiment stimuli. The intervals were
constructed out of synthesized complex tones with six components each and static, sawtooth spectra.
The frequency components were shifted slightly away from a harmonic relationship. The sawtooth
and slightly detuned spectra were chosen to introduce 'naturalness' to the stimuli. Based on previous
studies (von Békésy, 1960; Terhardt, 1974a.), the low-frequency / low-level beating caused by the
detuning was not expected to influence the roughness ratings
In the initial stage of experiments each interval was presented binaurally to musically trained subjects
though headphones. One group of subjects was asked to rate the stimuli on a 'roughness' scale,
outlined either by the adjectives 'rough' - 'not rough' or by two comparison stimuli spanning an

appropriate roughness range. A second group of subjects was asked to rate the stimuli on a
'dissonance' scale, outlined by the adjectives 'dissonant' - 'not dissonant'. No comparison stimuli were
included in this case, since the goal was to get at the assumed cultural associations of the terms
'consonance' and 'dissonance.' Subjects were able to listen to the stimuli as many times as needed
before making their decision. Preliminary analysis of the results indicates that the roughness ratings
correlate with the roughness of the stimuli as estimated by the proposed model (Eq. (6)) and that the
responses of the first group of subjects correlate with those of the second group.
v Summary - conclusions
All existing models quantifying roughness have demonstrated limited predictive power due, for the
most part, to:
a. an underestimation of the contribution of the degree of amplitude fluctuation (i.e. relative
amplitudes values of the interfering sines) to roughness and
b. an overestimation of the contribution of SPL (i.e. absolute amplitude values of the interfering
sines) to roughness.
With the roughness calculation model introduced by Sethares (1998) (see Eq. (2) above) as a starting
point, a new model has been proposed (Eq. (6)), which includes a term that accounts for the correct
contribution of the amplitudes of interfering sines to the roughness of the resulting complex tone. This
term is based on existing experimental results (Terhardt, 1974a, von Békésy, 1960) with an additional
adjustment that accounts for the important quantitative difference between amplitude modulation
depth and degree of amplitude fluctuation (Vassilakis, in preparation.) The roughness of complex
spectra with more than two sine components is calculated by adding up the roughness of the
individual sine-pairs.
The final model has been tested experimentally and has been applied to the testing of a hypothesis
linking the consonance hierarchy of harmonic intervals within the Western chromatic scale to
variations in roughness degrees. Analysis of the pilot data indicates that, for isolated harmonic
intervals, the proposed roughness estimation model agrees well with observation.
End Notes
1)The beating and roughness sensations associated with certain complex tones are essentially
understood in terms of sine-component interaction within the same critical band. However, studies
(von Békésy, 1960: 577-590; Plomp, 1966) examining the beating and roughness of mistuned
consonances (i.e. sine-pairs with frequency ratio slightly removed from a small integer ratio) indicate
that these sensations arise even when the added sines are separated by frequencies much larger than
the critical bandwidth. The experimental results of von Békésy and Plomp challenge earlier
explanations of this phenomenon that were based on the nonlinear creation of combination tones
(Helmholtz, 1885: 197-211) or harmonics (Wegel & Lane, 1924, in Plomp, 1966: 463; Lane, 1925)
inside the ear. Although their final interpretations differ, both studies link the beating and roughness
sensations of mistuned consonances directly to the complex signal's amplitude-fluctuations.
2) Changes in the rate of amplitude fluctuation exploit the differences not only between the beating
and roughness sensations but also between various degrees of roughness. Depending on the rate of
fluctuation, three 'shades' of roughness have been distinguished (von Békésy, 1960: 354.)
Approximately 45 fluctuations per second give roughness of an intermediate character, lying between
that of slower rates ("R" character) and that of higher rates ("Z" character.)

3) The Plomp & Levelt (1965) model underestimates roughness because of its bias against a power
function for roughness, while the Kameoka & Kuriyagawa (1969a&b) model overestimates roughness
because of its bias for a power function for roughness. The fact that some sort of power function
(although not exactly the one relating amplitude to loudness) is called for is supported by the
relationship between the mechanisms associated with the sensations of roughness and loudness. (Von
Békésy 1960: 344-350.)
4) If two sines with different frequencies: f1, f2, ( ) and amplitudes: A1 and A2
( ) are added together, the amplitude of the resulting signal will fluctuate between a maximum
(Amax = A1 + A2) and a minimum (Amin = A1 - A2) value. The degree of amplitude fluctuation (Daf) is
defined as the difference between the maximum and minimum amplitude values relative to the
maximum amplitude value. So .

5)Hutchinson & Knopoff assumed a linear relationship between degree of amplitude fluctuation and
roughness while all other models completely ignored the degree of amplitude fluctuation from their
calculations.
References
von Békésy, G. (1960). Experiments in Hearing. New York: Acoustical Society of America Press
(1989.)
Bregman, A. (1990). Auditory Scene Analysis. Cambridge MA: MIT Press.
Bigand, E., Parncutt, R., and Lerdahl, F. (1996). Perception of musical tension in short chord
sequences: The influence of harmonic function, sensory dissonance, horizontal motion, and musical
training. Perception & Psychophysics, Vol. 58: 125-141.
Carlton Maley,V. Jr. (1990). The Theory of Beats and Combination Tones, 1700-1863. [Harvard
Dissertations in the History of Science. O. Gingerich, editor.] New York: Garland Publishing Inc.
Dibben, N. (1999). The perception of structural stability in atonal music: The influence of salience,
stability, horizontal motion, pitch commonality, and dissonance. Music Perception, Vol. 16(3),
265-294.
Helmholtz, H. L. F. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of
Music (2nd edition.) Trans. A. J. Ellis. New York: Dover Publications, Inc. (1954.)
Hindemith, P. (1945). The Craft of Musical Composition; Book 1 (4th edition). New York: Associated
Music Publishers Inc.
Hutchinson, W. and Knopoff, L. (1978). The acoustic component of Western consonance. Interface,
Vol. 7, 1-29.
Kameoka, A. and Kuriyagawa, M. (1969a). Consonance theory, part I: Consonance of dyads. JASA,
Vol. 45(6): 1451-1459._ (1969b). Consonance theory, part II: Consonance of complex tones and its

calculation method. JASA, Vol. 45(6): 1460-1469.
Lane, C. E. (1925). Binaural beats. Physics Review, Vol. 26: 401-412.
Plomp, R. (1966). Beats of mistuned consonances. JASA, Vol. 42(2): 462-474.
Plomp, R. and Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. JASA, Vol. 38:
548-560.
Sethares, W. A. (1998). Tuning, Timbre, Spectrum, Scale. London: Springer-Verlag.
Terhardt, E. (1974a). On the perception of periodic sound fluctuations (roughness.) Acoustica, Vol.
30: 201-213._ (1974b). Pitch, consonance and harmony. JASA, Vol. 55(5): 1061-1069.
_ (1984). The concept of musical consonance: A link between music and psychoacoustics. Music
Perception, Vol. 1(3): 276-295
Vassilakis, P. (in preparation). AM depth versus degree of amplitude fluctuation:implementation

error, adjustment, and implications. Paper to be presented at the 140th meeting of the Acoustical
Society of America, Newport Beach, CA.
Vogel, M. (1993). On the Relations of Tone. Bonn: Verlag fur systematische Musikwissenschaft,
GmbH [Lehre von den Tonbeziehungen, 1975. Bonn, trans. V. J. Kisselbach.]
Vos, J. (1986). Purity ratings of tempered fifths and major thirds. Music Perception, Vol. 3(3):
221-258. <
Zwicker, E., Flottorp, G., and Stevens, S. S. (1957). Critical band-width in loudness summation.
JASA, Vol. 29(5): 548-557.
Back to index

Voice, Emotion and Facial Gesture in Singing
Proceedings paper
Symposium: The Power of the Voice for Singer and Listener

Antonio G. Salgado
Departamento de Comunicação e Arte,
Universidade de Aveiro, Aveiro, Portugal

Structured Abstract: Modified to give results details
1. Background: Recent enquiry into facial gesture and the information it conveys has shown
that human face is a commanding site to investigate a person´s emotional state and
personality. Either through compared observer´s judgments of emotion from facial
behaviour, or through measurements of the facial movement itself a series of data have
been collected relating facial gesture and the emotion communicated through it.
Alongside this, recent psychodynamic, phenomenological and therapeutic research has
approached singing as a creative musical experience occurring in both time and space, and
existing precisely at the threshold between 'self and the world' - a resonant field where self
and other may experience feelings of oneness and wholeness, a channel through which one
expresses and communicates something from the interior world. In other words, singing
provides a bridge between the inner world of mood, emotion, image, thought, experience
and the outer world of relationship, discourse and interaction.
2. Aim: The aim of the current study is to investigate empirically how voice, emotion and
facial gesture might be connected in singing and to verify how (or if) this possible connection
might have any impact on the singer's awareness of his/her self-perception.
3. Method: A series of videotapes of three singers performing a line from a German Lied
with different emotional intention from nautral to anger were analysed measuring facial
movements, spectographic analyses of the singing, and singer's and onlookers' judgements
of emotional content from the facial behaviour and the vocal communication.
4. Results
Data analysis revealed that both larynx and face move and work very differently according to
emotional state. The particular profiles of each share some common characteristics. For
instance, when singing with a sad expression, the face contracts, reducing its overall surface
area, and so too does the vocal sound, producing a more breathy, whispered tone, indicating
that the ventricle folds are further apart than in the other emotional conditions. Interestingly,
a similar result was obtained for the communication of fear for singers A and C, whereas
singer B used a much fuller sound and the frontalis was lifted, like is typically used in
surprise (Ekman and Friesen, 1969) . This suggests that the three singers had slightly
file:///g|/Wed/Salgado.htm (1 of 7) [18/07/2000 00:36:59]

different interpretations of what fear was and how it was created. One possible reason for
the differences is that in the case of singer B, he was thinking about the surprise within the
Erlkonig itself, where the child suddenly, in a fearful state, calls out for his father. Singers A
and C said that they focused more on a generic expression of fear, and did not have the
element of surprise in mind when singing.
Whilst there was a high correlation for all states, especially happiness, it is worth mentioning
that all the singers noticed discomfort in their throats when singing in the angry condition.
Upon analysis of the data, it seems that in this state the muscle platysma is involved. Since
the singers were wearing throat straps, the two electrodes were forced downwards as the
platysma was brought into action, causing the discomfort. The sung effect was to create a
rougher, less vibrant tone.
Interview data can be touched upon here to show that all the singers believed that their
emotions were 'authentic' in that they were constructed out of memories of those states.
However, they were all able to recognise that in their different interpretations some were
more 'successful' than others. The quantitative data so far does not show any statistical
differences in these more or less successful interpretations. Thus, it is only possible to begin
to theorise about what might allow for the differentiation between the interpretations to occur.
Form the commentaries of the singers and the viewers, it appears that there are qualitative
differences in the intensity of how the muscles are used. That is to say that if the singer is
clearly working with a stronger inner intention the effect is more successful.
5. Implications: Within the signifying process of singing, it is evident that facial behaviour
plays an important role when communicating. The impact of the singer's awareness of
his/her vocal expression and vocal self-perception clearly can be gained through the
awareness of facial behaviour.
Paper
Intentions
The current study investigates empirically how voice, emotion and facial gesture might be
connected in Western classical singing. The need for such a study arose out of an
awareness of several issues. Firstly, facial expression in singing is often discussed
anecdotally, but has rarely been subjected to any empirical analysis. Secondly, singing
teachers often ask singers to 'sing with the eyes', 'make a smile' and so on, to achieve
technical ends in singing. Thus, it was considered important to know whether these different
facial movement expressions do in fact affect the quality of the produced vocalisation.
Thirdly, given the second point, it seemed important to know if there was an objective
correlation between the facial gesture and the sound made - in terms of its expressive
intention. From a perceptual perspective, for instance, does the audience understand more if
the singer looks as well as sounds 'sad', and what do these emotions objectively look and
sound like? Fourthly, it is well known that singers and actors often show empathy with an
emotional state, without entering into it completely or 'authentically'. In fact, singers 'act' out
emotions. It was a final intention of this work to explore the extent to which the emotional
expressions requested were perceived as being authentic by both the performer him/herself
and the audience. It was possible to match these data against 'norms' for emotional

expression in the face by comparing the profiles of the singers with measurements of facial
formation/musculature arrangement for real emotions recorded by Ekman and Friesen
(1969) from still photographs taken when people were subjected specific emotion eliciting
situations.
Background
According to Manen (1974), historically, Bel Canto vocal technique was a musical
exploration of the different vocal expressions for the different emotional states. So, in a
practical way there has been an exploration of vocal emotion in singing, but few systematic
empirical studies. Of the existing empirical work, key research has been undertaken by
Kotlyar and Morozov (1976) and Sundberg (1980) who have demonstrated that when asked
to sing with the emotional intentions of joy, sorrow, anger, fear, and no emotion, very
different spectrographic analyses of the vocal sounds emerge. In joyful, for instance, there is
a much higher frequency than the other emotions, the tonal course of the pitches is
moderate, both up and down, the tonal colour comprises many overtones and the volume is
loud. In sadness, there is a much lower frequency produced, the tonal course of the pitches
is downward, the tonal colour is very restricted, with few overtones and the volume is soft.
Given the complete absence of research precedents for what happens to the face in singing,
it was hypothesised that the face would differ greatly according to the emotion, with
happiness involving very different gestures to sadness, as it was felt that there would be a
correlation between size of expression and loudness of sound produced. These hypotheses
was in part based on intuitions from everyday observations, but also emerged out of drawing
parallels with the work of Davidson (1994) who discovered that when a pianist was asked to
perform with different emotional intentions, the louder he played, the larger his movements
were in order to produce the sounds.
In the general emotional expression literature, key contributions to the field have been made
by Ekman and Friesen (1969) and their co-workers. Whilst they largely support Davidson's
findings, it is worth noting that in terms of musculature, fewest muscles are involved in
happiness than the other emotions.
Linking these general research findings about musculature to singing technique, it is
important to note that in classical singing, the intention is generally to keep the larynx free, to
allow for optimum vibration. Additionally, the singer is taught to use the resonating cavities of
the face and the pharynx. To achieve this, vowel sounds are often modified from those used
in everyday speech, with the mouth opening rather more at the back than the front
(Helmsley, 1998). These factors may have an impact on how the face works when the highly
trained singer is asked to produce an emotional expression. In fact, there may even be some
source of conflict, with natural facial expression involving a muscle in one direction which
may need to function in another way for the sake of optimum vocalisation of the same
emotion when interfaced with the technique of singing.
Fonagy (1962) undertook some pioneering research examining glottal behaviour during
emotional speech. He found that very different glottal behaviours for the different emotions,
with the ventricle folds being further apart in sadness or tender whispering, and the laryngeal
ventricle squeezed together for pressed phonation in anger, for instance. But, if, as
Helmesley (1998) argues, the emotions have to come first from the mind (thought of anger),
then through the eyes (visualise to realise the emotion) and eventually into the voice, it is
important to examine whether the facial musculature leads to the sound production or if the
sound production causes the facial expressions. The issues of learning, formation or innate

expression of emotion require careful consideration, for contrarily, Fonagy (1962) refers to
the glottal profiles created in his study as 'pre-conscious expressive gestures'.
Singers are, of course, actors, often asked to characterise different people or emotional
states in their work. They are people who apparently learn to 'fake' behavioural moods.
Runeson and Frykholm (1983) believe that faked emotional states and expressions are
formed in a slightly different manner to genuine ones, and thus an expert viewer can
distinguish between the two. Of course, in singing, we accept that in an operatic
characterisation there is an element of dramatic play, and so perhaps even expect the
gestures and expressions used to be 'fake' or 'larger' than in real life. There is an expectation
which needs to be fulfilled. It seems critical for these reasons to compare data from singers
producing 'faked' emotional facial expression and sounds with those of genuine ones.
Alongside all of the research described above, it is important to note that recent
psychodynamic, phenomenological and therapeutic research has approached singing as a
creative musical experience occurring in both time and space, and existing precisely at the
threshold between 'self and the world' - a resonant field where self and other may experience
feelings of oneness and wholeness, a channel through which one expresses and
communicates something from the interior world. In other words, singing provides a bridge
between the inner world of mood, emotion, image, thought, experience and the outer world
of relationship, discourse and interaction (Salgado, 1999, Draffan, 2000). Thus, finally, to get
a deeper insight into the issue of emotion versus faked emotion, it is important to interview
performers to ask how they feel when producing these emotions, and to explore audience
reactions to the facial gestures used.
Methods
Recordings
Two male and one female singer (mean age 32 years) with an average professional
experience of 8 years singing in solo oratorio, opera and recital work were used as the
subjects of investigation. They were asked to prepare the musical phrase "Mein Vater, mein
Vater" from the Lied Elrkonig by Schubert for recording. This phrase was chosen since the
word could apply to almost any situation of emotional state. The musical line itself both rises
and falls within a limited range of a third, so does not make particular technical demands on
the singer, and so again leaves interpretative possibilities open. Recordings were made in
five different conditions:
Neutral: this was based on Fonagy examination which always used the emotional state of
neutral as a state against which to measure other emotional states. It acted as a base-line
measurement.
Happy.
Sad.
Fearful.
Angry.
These are the four fundamental emotions which are now well-reported as being the most
strong and clearly recognisable in many contexts (Ekman and Friesen, 1969)

For the recording, it was necessary to video tape the singers in full-face close up.
Measurements of facial muscle activity were taken by digitising and tracking muscle activity
over time with a specially designed software package. To track the movements it was
necessary to mark the muscles to be mapped with 25 colours circular stickers, 12 on each
side of the face and an anchor marker on the bridge of the nose. For the vocal recordings,
the betacam sound channel was used to input a spectogram through software which allowed
for an immediate plot of the harmonic spectrum and the singer's formant. Additionally, an
electrolaryngograph was used to collect data about the opening and closing of the glottis.
This was recorded by placing two electrodes on the outside of the larynx. These were kept in
place using an elasticated neck strap.
After being videotaped, the singers collectively, along with three other viewers watched the
recordings to assess the success of the tasks. From between two and four different attempts
at each emotional state made by each singer, the viewers assessed which were perceptually
the most/least authentic, and these bi-polar pairs were used as sources of data for analysis.
The singers were also interviewed about their views on emotion and singing and these
qualitative comments were used to help interpret the data.
Results
Sound analysis
Emotion - Neutral. In this condition, the singers formant was not dominant, and the voice is
weak in both amplitude and harmonics.
Emotion - Sadness. Relatively low harmonic content to that in the other
emotions. With singer B's voice being far weaker in both amplitude and harmonics than in
his other examples. In all cases, the singer's formant is not particularly dominant (the strong
harmonics between about 2500 and 3500 Hz).
Emotion - Fear. Singer's A and C both show very low harmonic content, with only the first
two harmonics coming out strongly in singer A's case. Singer C is totally lacking the singer's
formant, and singer A's is very weak, the tendency to half voice/ whisper to create the
impression of fear. For singer B, however, there is quite strong harmonic content, indicating
a much fuller voiced interpretation.
Emotion - Happy. Singer's C and A still show a weaker harmonic content, though the
2500-3500Hz area is stronger than in the fearful example. Singer B's plot, however, is not
greatly different from that for fear.
Emotion - Anger: This showed the most dramatic change and both amplitude and harmonic
content are considerably greater for singers A and C, and slightly greater for singer B. Singer
C's singers formant still seems relatively weak - but the overall amplitude for all plots is
considerably weaker than for either of the
Other singers. (That is, singer C is singing rather quieter).
Visual analysis
Emotion - Neutral. In this condition, the measurement of the movements show a very limited

range of muscle activity, with a high degree of correlation between the three singers' use of
their faces in this condition.
Emotion - Sadness. Corrugator muscles are used extensively in this condition, with singer B
showing the most movement activity here, singer A, a moderate range of activity, and singer
C the least activity. There is a correlation between individual's data for the bi-polar pairings
of sadness recordings, showing that whether authentic or inauthentic emotion is expressed,
the same muscles are involved.
Emotion - Fear. Levator labu and frontalis muscles are used here, but the degree of
involvement varies according to individual and bi-polar interpretation. For instance, in Singer
B, both lots of muscles are equally involved in both interpretations of fear. For singer C,
there is limited activity in both. For singer A, there is little frontalis activity, but more in her
more authentic interpretation of fear.
Emotion - Happy. Zygomaticus major and risorius muscles are involved. Like in neutral,
there is a high degree of correlation between singers and bi-polar pairs.
Emotion - Anger. Platysma and procerus are involved. Here, singers A and B use very
similar formations and degrees of activity in both renditions of the emotion. For singer C,
there is less overall facial activity.
Conclusions
In summary, the data analysis reveals that both larynx and face move and work very
differently according to emotional state. The particular profiles of each share some common
characteristics. For instance, when singing with a sad expression, the face contracts,
reducing its overall surface area, and so too does the vocal sound, producing a more
breathy, whispered tone, indicating that the ventricle folds are further apart than in the other
emotional conditions. Interestingly, a similar result was obtained for the communication of
fear for singers A and C, whereas singer B used a much fuller sound and the frontalis was
lifted, like is typically used in surprise (Ekman and Friesen, 1969) . This suggests that the
three singers had slightly different interpretations of what fear was and how it was created.
One possible reason for the differences is that in the case of singer B, he was thinking about
the surprise within the Erlkonig itself, where the child suddenly, in a fearful state, calls out for
his father. Singers A and C said that they focused more on a generic expression of fear, and
did not have the element of surprise in mind when singing.
Whilst there was a high correlation for all states, especially happiness, it is worth mentioning
that all the singers noticed discomfort in their throats when singing in the angry condition.
Upon analysis of the data, it seems that in this state the muscle platysma is involved. Since
the singers were wearing throat straps, the two electrodes were forced downwards as the
platysma was brought into action, causing the discomfort. The sung effect was to create a
rougher, less vibrant tone.
Interview data can be touched upon here to show that all the singers believed that their
emotions were 'authentic' in that they were constructed out of memories of those states.
However, they were all able to recognise that in their different interpretations some were
more 'successful' than others. The quantitative data so far does not show any statistical
differences in these more or less successful interpretations. Thus, it is only possible to begin

to theoriesabout what might allow for the differentiation between the interpretations to occur.
Form the commentaries of the singers and the viewers, it appears that there are qualitative
differences in the intensity of how the muscles are used. That is to say that if the singer is
clearly working with a stronger inner intention the effect is more successful.
References
Davidson, J.W. (1994) What type of information is conveyed in the body movements of solo
Draffan, K. (2000) Singing from the Soul. Unpublished MMus Dissertation, University of
Sheffield.
Ekman, P. and Friesen, W. V., (1969) The repertory of nonverbal behaviour:Categories,
origins, usage, and coding. Semiotica, 1, 49-98.
Fonagy, I. (1962) Mimik auf glottaler Ebener. Phonetica, 8, 209-219.
Helmsley, T. (1998) Singing and Imagination. Oxford: Oxford university press.
Kotlyar, G.M. & Morozov, V.P. (1976) Acoustical correlates of the emotional content of
vocalised speech. Soviet Physiology and Acoustics, 22, 208-211.
Manén, L. (1974) 'The Art of Singing', London: Faber Music Ltd.
Runeson, S. and Frykholm, G. (1983) Kinematic Specification of Dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition, and
decpetive intention, Journal of Experimental Psychology: General, 112, 585-615.
Salgado, A. (1999) Rethinking voice evaluation in singing, Conference Proceedings from
European Society of Cognitive Sciences of Music conference on Research Relevant to
Music Training Institutions. Lucerne, Switzerland, September.
Sundberg, J,. (1980) Röstlära. Stockholm: Proprius Förlag.
Back to index

THE USE OF CONSENSUAL ASSESSMENT IN THE EVALUATION OF CHILDREN'S MUSIC COMPOSITIONS
Proceedings paper
THE USE OF CONSENSUAL ASSESSMENT IN THE EVALUATION OF CHILDREN'S

MUSIC COMPOSITIONS
By Maud Hickey, Northwestern University
Introduction
The purpose of this paper is to explore the use of consensual assessment as a tool to rate creative
artworks, and more specifically to look at consensual assessment as a tool for rating children's musical
composition. I begin by describing the conceptual background of, and technique for using consensual
assessment to rate the creativity of works of art. Secondly, I share results of research in which this
particular assessment technique is used in music composition. Two studies currently in progress are
summarized which use computer-generated compositions as a base and employ music educators and
other experts as evaluators.
Background
The identification of creative products or people as more or less creative is a difficult and controversial
task. Guilford, and subsequently Torrance, had an immense influence on the field of psychometric
measurement of creative people and products. The divergent thinking factors of fluency, flexibility,
originality and elaboration that Guilford first hypothesized (1950, 1957) and that are used in the
Torrance Tests of Creative Thinking (Torrance, 1966, 1974, 1981) are still widely used in some
combination or variation in many current creative thinking tests and measurements (Brown, 1989).
However, the concept that divergent thinking and its components (i.e. fluency, flexibility, originality
and elaboration) are synonymous with the outcome of creative thinking has been challenged. The
greatest criticism is that the theoretical constructs came first, and then were validated using specialized
tests, such as factor analyses, to identify the factors (Brown, 1989). They were not validated against any
external measure of creative productivity. "The basic problem seems to be that creativity tests had only
apparent construct validity and certainly not criterion validity" (Brown, 1989, p. 8). While Guilford
spent many years validating the construction of test items for creative thinking through complex
factorial analysis, he ignored the need to validate these factors against real-life processes and products
of creative people. More specifically, Cattell, among others (Hocevar & Bachelor, 1989; Michael &
Wright, 1989), criticized the use of fluency as a factor of creative thinking, as well as the often
well-regarded written tests used to measure it: " . . . output per minute is unimportant compared to
quality and aptness. The speed and productivity measures taken on artificial test situations are on a very
different and possibly irrelevant level in relation to the productivity we encounter in real life
originality" (1987, p. 509).
Amabile has proposed that we abandon "the hope of finding objective criteria for creativity and,
instead, to adopt a definition that relies upon clearly subjective criteria" (1996, p. 34). In her interest to
fill this need and work toward a more social psychological perspective of creativity assessment,
Amabile developed and hence has repeatedly tested a "consensual assessment technique" for rating
various forms of artistic creativity (Amabile, 1982, 1983, 1996). This technique is based upon
Amabile's operational definition of creativity which is:
file:///g|/Wed/Hickey.htm (1 of 9) [18/07/2000 00:37:01]

A product or response is creative to the extent that appropriate observers independently

agree it is creative. Appropriate observers are those familiar with the domain in which the
product was created or the response articulated. Thus, creativity can be regarded as the
quality of products or responses judged to be creative by appropriate observers, and it can
also be regarded as the process by which something so judged is produced" (1996, p. 33).
The consensual assessment technique for rating creativity, then, is to rate the creativity of products
using experts agreed-upon-consensual-rating of these products.
Amabile outlines clear task, judge and procedural protocol that must be met for correct implementation
of the consensual assessment technique. These are: 1. the task must lead to a clearly observable
product, 2. the task should be open-ended enough to allow for flexibility, 3. All subjects should be
presented with the same set of materials, instructions, and working conditions, 4. there should not be
large individual differences in "baseline performance" skills required by the task, 5. Judges should have
experience in the domain in question, 6. Judges should make their assessments independently, and 7.
Judges could rate products on dimensions other than creativity (such as aesthetic appeal or
craftsmanship) but judges should rate all products on one dimension before rating products on any other
dimension. (Amabile, 1996).
Amabile, along with colleagues and others have successfully used the consensual assessment technique
for rating the creativity of products in several artistic as well as problem-solving domains ranging from
visual art portfolios to computer programs to business solutions. Amabile reports by author,
task/product, subjects, and judges used, the reliability of approximately 53 different studies that utilized
the consensual assessment technique (1996). The judge reliability for these reported studies is
remarkably and consistently high.
Consensual Assessment in Music
To date, there are few studies that have explicitly used or examined the use of the consensual
assessment technique in music composition. In an analysis of creativity assessment literature, Webster
and Hickey (1995) found an inconsistent and wide variety of techniques used for rating musical
creativity. Rating scales used for the measurement of creative musical thinking not only cover a wide
range of methods, but also lack in concurrent validity-that is, in forming a comparison of creative
thinking "test" scores to overall "best" and "worst" compositions or products (Webster & Hickey,
1995). Utilizing their own test which employed different kinds of forms (objective and subjective) for
rating the creativity of musical compositions, Hickey and Webster discovered that scores from implicit
(subjective) rating forms proved to be most predictive for the constructs of originality/creativity
qualities and aesthetic value of children's compositions and that scores from more explicit forms were
most predictive for the constructs of craftsmanship and technical qualities of musical compositions
(1995).
Bangs (1992) successfully adapted Amabile's consensual assessment technique to rate the creative
quality of children's musical compositions. The Dimension of Judgment tool was utilized to rate the
musical creativity of pre- and post-treatment compositions of 37 third-grade children in order to
compare the effects of intrinsic and extrinsic motivation factors on musical creativity. All of the
compositions were rated by three independent judges using the Dimension of Judgment form which
required judges to rate the compositions on 19 different criteria (adapted from Amabile's Dimension of
Judgments for Artist-Judges [1982]). Interjudge reliability for the "creativity" item among the three
judges was .76 and .82 indicating a reliable assessment form.
In a study seeking to understand the effect of problem finding and creativity style on creative musical
products, Brinkman (1994) used a modified version of Amabile's consensual assessment technique.

Brinkman asked 32 high school instrumental music students to compose two melodies. Three judges
independently rated the melodies using a consensual assessment technique. That is, each judge was
asked to rate each melody on a 7-point scale ("low" marked on one end and "high" on the other) in the
categories of originality, craftsmanship and aesthetic value. The reliability of the three judges creativity
ratings of the 64 melodies ranged from .77 to .96.
Reliability figures for 3 judges ratings of 14 children's musical compositions ranged from .62-.73 for
creativity and .81-.95 for craftsmanship in a study by Hickey (1995).
The reliability of "creativity" ratings for children's musical compositions was .93 in another study by
Hickey (1996).
Most recently, Hickey (in process) examined one of the assumptions of the consensual assessment
technique that states that "experts" must be used as assessors of the creative products. In the domain of
music, just who are the "experts" when it comes to dealing with children's compositional products?
Amabile answers this question for visual art and problem solving studies which used the consensual
assessment technique by reporting: ". . . . we are now convinced that for most products in most
domains, judges need not be true experts in order to be considered appropriate" [for judging products]
(1996, p. 72). She qualifies this, however, by stating that in many domains, some form of training in the
field may be necessary for judges "to even understand the products they are assessing" (p. 72) and
specifically cites computer-programming tasks and judging portfolios of professional artists. Based on
analyses of several studies, Amabile concludes with a suggestion:
...the level of judge expertise becomes more important as the level of subjects' expertise in
the domain increases. In other words, the judges should be closely familiar with works in
the domain at least at the level of those being produced by the subjects. (1986, p. 72-73)
The purpose of the present study is to report the findings from 2 recent experiments which use the
consensual assessment technique in order to refine this technique in music and to find which group of
"experts" are best qualified to assess children's musical composition. The studies and results are
reported next.
Study A
The purposes of this study were to: a) determine which group of judges-composers, theorists, music
teachers, or children-would make the most reliable creativity ratings of children's musical
compositions; and, b) determine the relationships of mean creativity scores between these groups of
judges.
Subjects
Five groups of judges' creativity assessments of children's musical compositions were compared. The
groups were: 17 music teachers, 3 composers, 4 music theorists, 14 seventh-grade children, and 25
second-grade children. The music teachers were broken down into the following groups for analysis: 10
"instrumental" music teachers (teachers who taught only junior or senior high school band/orchestra); 4
"mixed experience" teachers (teachers who taught a combination of instrumental and choral or
instrumental and general music), and; 3 "general/choral" music teachers (elementary general music
teaching with some choral music). From the group of composers, two were college composition
professors, and the 3rd was a graduate student in composition. All had at least 15 years of experience
writing music in a wide variety of genres ranging from jazz to classical. The music theorists were
college theory professors. The two groups of children came from contained classrooms in a private
grade-school.

The 11 musical compositions which were rated by all of the judges were randomly selected from a pool
of 21 compositions generated by fourth- and fifth-grade subjects in a previous research study (Hickey,
1995). In the 1995 study, the subjects were given unlimited time to create an original composition using
a synthesizer connected via MIDI interface to a Macintosh computer. The final compositions were
captured in MIDI file format using a computer program that allowed the recording of up to three
simultaneous tracks of music. No compositional parameters were given. Students were encouraged to
re-record their compositions as often as necessary until they were satisfied with their finished product.
Procedure
Amabile (1983) recommends that in order to assure discriminant validity between other areas and
creativity, that dimensions such as craftsmanship and aesthetic appeal be included on the rating form.
The form used by the theorists and composers for this study was developed by combining and adapting
items from Amabile's Dimensions of Creative Judgment (1982) and Bangs' Dimension of Judgment
rating forms (1992). The final form was used and tested in two previous studies (Hickey, 1995; 1996).
The rating form contained 18 items which fell under one of three dimensions: creativity, craftsmanship,
and aesthetic appeal. The items consisted of 7-point Likert-type scales with anchors marked "low,"
"medium," and "high." The music teachers in this study used a 3-item form with 7-point rating scales
for creativity, craftsmanship and aesthetic appeal. The creativity item for the music teachers, theorist
and composers was worded: "Using your own subjective definition of creativity, rate the degree to
which the composition is creative."
The children rated the compositions first for "Liking," and on a second listening, for "Creativity," using
a separate form for each scale. The Creativity form asked the students to rate each composition on a
5-item scale with "Not Creative" and "Very Creative" marked on the low and high ends. The
second-grade children's form had icons (from plain to more elaborate/silly faces) at each point on the
scale to aid them in understanding the continuum.
The groups of children listened to the compositions together in their respective classrooms. Before
listening to and rating the compositions, the author engaged the children in discussion about "Liking"
music and/or thinking that music is "Creative." The children shared ideas about what "creative" meant
to them and the discussion was guided to help them focus on understanding this term for rating music.
They then rated each compositions first for "Liking," and on a second listening, for "Creativity."
All of the judges were informed that the compositions were composed by fourth- and fifth-grade
children. And, following Amabile's suggestion for proper consensual assessment technique procedures
(1996), the judges rated the compositions independently and were instructed to rate them relative to one
another rather than against some "absolute" standard in the domain of music.
Results
The analyses in this report are based on the judges' ratings on only the Creativity item of the assessment
forms. Interjudge reliabilities were calculated using "Hoyt's analysis," an intra-class correlation
technique which reports a coefficient alpha (Nunnally & Bernstein, 1994). The statistics were computed
using GB-Stat (1994) software. Because each group had a different numbers of judges, reliability
coefficients were adjusted in order to compare the groups as if only 3 judges were used in each group.
The adjusted interjudge reliabilities for each group's creativity ratings on the musical compositions
were: composers, 0.4; all music teachers, .64; instrumental music teachers, . 65; "mixed teachers," .59;
general/choral teachers, .81; music theorists, .70; seventh-grade children, .61; and, second-grade
children, .50.
The correlations of mean creativity ratings among the different groups of judges is presented in Table 1.

Due to the lack of agreement among the composers, each composer is represented separately rather
than using the group mean for correlation with the other groups. Significant correlations were found
between the three groups of music teachers, between the music teachers and music theorists, and
between the two groups of children. Though music teachers and music theorists agreed with each other,
and the groups of children had a high positive correlation with each other, the theorists and teachers
showed moderate to low correlations with the groups of children. There were no strong positive
correlations amongst the composers nor between the composers and the other groups.
Table 1
Correlations of Mean Creativity Ratings Between Groups of Judges
Judges 1 2 3 4 5 6 7 8 9
1. Composer A
2. Composer B -.02
3. Composer C .07 -.26
4. Music Theorists .16 -.02 .58
5. All Music .35 .01 .37 .90 * *
Teachers
6. Instrumental .45 -.09 .39 .88 * * -
Teachers
7. Mixed Teachers .18 .11 .35 .86 * * - .78 * *
8. General/Choral .14 .17 .19 .63 * - .68 * .72 *
Teachers
9. 7th-grade .09 .08 .37 .26 .03 -.01 .27 -.21

Children
10. 2nd-grade .19 -.03 .19 .38 .18 .11 .41 -.01 .83 * *
Children
* *p <.01, * p<.05
Study B
The purpose of this most recent study was to test the reliability of a one-item creativity rating form
using the consensual assessment technique for rating children's musical compositions and to test the
reliability of a small group of judges.
Subjects
The judges in this study were 6 music teachers who came from slightly varied teaching backgrounds.
Three of these teachers were active teaching music composition to students in their general music
classes. One was a high school band and general music teacher, and the other two were middle school
general music teachers. Two judges were elementary general music teachers who taught music
composition to their students only on a few occasions in the past (music composition was not a regular

part of their curriculums). The final judge was a student teacher who was student teaching in
elementary level general music.
The 53 compositions that were rated in this study were created by 28 third-grade children (8 and 9
years of age). The children were volunteers who came to the University over three, 2-hour Saturday
sessions to learn about music composition using Macintosh computers and synthesizers. The students
were shown how to use a simple music sequencing software with Korg X5D synthesizers to create
original music compositions. The compositions collected for this study were composed on the first and
third day of the sessions. The children's instructions were to simply create a composition that they
liked. They could use as many tracks as they wished, and any combination of the available 128 General
MIDI timbres. They were given as much time as needed (no child needed more than 45 minutes) and
could revise and re-record as much as needed until they were satisfied with their composition. Several
children recorded more than 1 composition during each of these sessions. They were asked to choose
their favorite composition for purposes of this study. Twenty-five of the children completed 2
compositions for the project while three children only completed the first session.
Procedure
The MIDI compositions were converted to audio files and saved onto a CD ROM for judges to listen to.
Each judge received a CD with the 53 compositions in a different and random order. The judges then
independently listened to the compositions and rated each on creativity using a 7-point Likert-type
scale with anchors marked "low," "medium," and "high." The instructions for rating each composition
were: "Using your own subjective definition of creativity, rate the degree to which the composition is
creative."
Results
The average creativity score for all 53 compositions was 3.8, with a range from 1.34 to 6.17. Interjudge
reliabilities were calculated using "Hoyt's analysis," an intra-class correlation technique which reports a
coefficient alpha (Nunnally & Bernstein, 1994). The statistics were computed using GB-Stat (1994)
software. The reliability coefficient for all 6 judges was .61 (p < .01). To test the hypothesis formed
from the results found in study "A"-that is that general/choral elementary teachers are the best experts
in judging children's compositions-I calculated reliability coefficients with a variety of combinations of
judges to see which produced the best reliability. The best reliability figure was . 65 (p < .01) when
calculated without the high school band/general music teacher.
Discussion and Implications for Further Research
The main purposes of this paper were to describe the conceptual background of and technique for using
consensual assessment to rate the creativity of children's music compositions and to share results of
research in which this particular assessment technique is used. Study "A" sought to determine who
might be the best group of experts to judge the creativity of children's musical compositions when using
a consensual assessment technique. Based on the results of this study, it seems that the best "experts,"
or at least the most reliable judges, may be the very music teachers who teach the children-the
general/choral music teachers. Perhaps the extensive music training that music teachers have along with
their experience in the classroom with children provides them with the tools necessary to make
consistent and valid judgments about the creative quality of children's original musical products. It is of
interest to note that the composers used in this study were the group least able to come to an agreement
on the creativity of the children's compositions. In music education in the United States, music
composition is sometimes viewed as "mysterious," and often, the only experts considered in this realm

are the professional composers. Perhaps music teachers should have reason to feel more confident in
their ability to accurately assess the relative creativity of their students' musical compositions.
Study "B" further tested the reliability of subjective creativity assessment of children's musical
compositions and also examined the differences in judges' ratings based upon their teaching
backgrounds. The best reliability figure was obtained without the high school band/general music
teacher. This corresponds with study "A" in that perhaps the high school teacher does not have the same
sense, hence criteria, of what young children are capable of creating in musical compositions and may
not be the best "judge" for rating creativity in children's musical compositions.
The reliability of .65 is significant, yet lower than figures obtained previously. One reason may be that
the rating form asked judges to rate the musical compositions on only one item for "creativity."
Amabile suggests that at least aesthetic appeal and craftsmanship (in addition to creativity) be used as
items to rate creative products in order to force judges to think more carefully about the "creative"
aspects of the product. Though rating on only 1 item is easier and quicker for judges, this may prove
that this method is not as reliable for consensual assessment as using at least 3 rating items.
Another way to make this procedure more reliable is to include a general creativity definition for the
judges. This definition would be that creative musical compositions are both original and "appropriate"
(this seems to be the most common definition in the literature [Amabile, 1996]). Amabile suggests that
a definition may be needed when judges are uncomfortable with the idea of rating the creativity of
products in the absence of a guiding definition (1996).
A final hypothesis for the unsatisfactory reliability coefficient in Study "B" is that children at this age
(8- and 9-years) are not developmentally "ready" or able to create an original and musically satisfying
composition. The compositions that were rated very high or very low may have been done so by chance
and not with any real intent or ability. We need more research in our field to understand the
developmental trend of creative musical thinking in children in order to test this hypothesis.
Why bother with this pursuit of consensual assessment for rating creativity in children's composition?
For one, and mentioned briefly above, it is to show that teachers indeed do know, and can reliably
assess the creative quality of children's compositions without the need for clear-cut objective criteria.
Of course critieria for assessing compositions should be made clear to children when the consequence
might be a grade, but these studies show that teachers naturally have a subjective idea of compositions
which are more or less creative when compared to others.
Using a subjective consensual assessment technique, one might collect and examine the compositions
from children which are consistently rated as highly "creative." What are the features of these
successful compositions? From these compositions we may be able to formulate sensible rubrics to aid
in assessing children's musical compositions in schools. Furthermore, compositions rated highly
"creative" could also be used as models for elementary music classrooms-models are desperately
needed for teachers who strive to do more musical composition activities in their classrooms.
The subjective consensual assessment of children's musical compositions, for the most part, has
worked. It may provide the most appropriate measure because of its subjectivity and because it does
not presume objective criteria for creativity. This line of research may prove fruitful for the pursuit of
understanding better the genesis and factors surrounding a creative musical "aptitude" in children. In
order for consensual assessment to be the procedure for this identification, however, the next step is to
identify children who repeatedly produce creative musical compositions and which are rated such by
experts. And then to pursue more answers to questions about these children: what are the social and

external factors that surround these children's background? Is there a relationship between scores on
general as well as musical creativity tests and the creative musical production? Is there a relationship
between musical creativity (based on musical composition assessment) and musical "aptitude" (based
on a standardized musical aptitude test)?
Creative musical thinking in children is a complex phenomenon in need of further study. The use of the
consensual assessment technique for identifying creative musical compositions and their creators, may
prove to be the most reliable measure to aid in this research endeavor.
References
Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment
technique. Journal of Personality and Social Psychology, 43, 997-1013.
Amabile, T. M. (1983). The social psychology of creativity. New York: Springer-Verlag.
Amabile, T. M. (1996). Creativity in Context. Update to The social psychology of
creativity. Boulder, CO: Westview Press.
Bangs, R. L. (1992). An application of Amabile's model of creativity to music instruction:
A comparison of motivational strategies. Unpublished doctoral dissertation, University of
Miami, Coral Gables, Florida.
Brinkman, D. (1994). The effect of problem finding and creativity style on the musical
compositions of high school students. Unpublished doctoral dissertation, University of
Nebraska, Lincoln.
Brown, R. T. (1989). Creativity. What are we to measure? In J. A. Glover, R. R. Ronning,
& C. R. Reynolds (Eds.), Handbook of creativity, (pp. 3-32). New York: Plenum Press.
Cattell, R. B. (1987). Intelligence: its structure, growth and action. Amsterdam: Elsevier
Science Publishers.
GB-Stat [Computer software]. (1994). Silver Spring, MD: Dynamic Microsystems, inc.
Guilford, J. P. (1950). Creativity. American Psychologist, 5, 444-454.
Guilford, J. P. (1967). The nature of human intelligence. New York: McGraw-Hill.
Hickey, M. (1995). Qualitative and Quantitative Relationships Between Children's
Creative Musical Thinking Processes and Products. Unpublished doctoral dissertation,
Northwestern University, Evanston, IL.
Hickey, M. (1996). Consensual Assessment of Children's Musical Compositions.
Unpublished paper presented at the Research Poster Presentation, New York State School
Music Association Convention.
Hocevar, D., & Bachelor, P. (1989). A taxonomy and critique of measurements used in the
study of creativity. In J. A. Glover, R. R. Ronning, and C. R. Reynolds (eds.), Handbook
of creativity, pp. 53-76. New York: Plenum Press.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York:

McGraw-Hill.
Torrance, E. P. (1966). Torrance tests of creative thinking. Princeton, NJ: Personnel Press.
Torrance, E. P. (1974). The Torrance tests of creative thinking: Technical-norms manual.
Bensenville, IL: Scholastic Testing Services.
Torrance, E. P. (1981). Thinking creatively in action and movement: administration,
scoring, testing manual. Bensenville, IL: Scholastic Testing Service, Inc.
Webster, P. & Hickey, M. (1995, Winter). Rating scales and their use in assessing
children's compositions. The Quarterly Journal of Music Teaching and Learning, VI (4),
28-44.
Back to index

FUNCTIONAL BRAIN IMAGING OF RHYTHM PERCEPTION
Proceedings paper
Functional BRAIN imaging of rhythm perception.

Renaud Brochard1, André Dufour2, Carolyn Drake3 and Christian Scheiber4.
1 Keele University, Psychology Department, Staffs ST5 5BG, United-Kingdom (psa32@keele.ac.uk).
2 Centre d'études de Physiologie appliquée, Université Louis Pasteur, Strasbourg, France
3 Laboratoire de Psychologie Expérimentale, Université Paris V and CNRS, Boulogne/Seine, France.
4 Institut de Physique Biologique, Hôpital Civil, Strasbourg, France
Very few biological studies have investigated how we perceive musical rhythm. In the present research, we aimed to
find the cerebral bases of perceptual and cognitive processes of rhythm using Functional Magnetic Resonance Imaging
(fMRI). Many authors (e.g. Lerdahl and Jackendoff, 1983; Povel and Essens, 1985; Parncutt, 1994; Drake, 1998; see
Clarke, 1999 for a recent review) have proposed the existence of two types of temporal processes that appear
fundamental in the perception of simple rhythmic sequences: the segmentation of an ongoing sequence into groups of
events on the basis of their physical characteristics and the extraction of underlying temporal regularities.
The first process ("grouping") is based on gestalt principles and depends, among other physical characteristics, on the
relative proximity in time of sound events. The occurrence of a longer gap between two events creates a boundary
leading to the perception of two distinct perceptual units. A sound sequence is thus perceived as a succession of
rhythmic groups.
The second fundamental process ("meter") may occur in parallel. It corresponds to the extraction of temporal regularities
in the sequence in the form of an underlying pulse, often influenced by the alternation of strong and weak beats in the
musical sequence (that is the metric structure stricto sensu).
Both grouping and metric processes seem to depend on the complexity and the hierarchical structure of the musical
sequences but seem to be functional in early life (see Drake, 1998 for a review). Neuropsychological studies in
brain-lesioned or "surgical" patients have shown that the processing of these two rhythmic organizations could be
impaired relatively independently from one another, suggesting that processing metric and grouping characteristics may
involve different mechanisms and networks in the human brain (Peretz, 1990; Liégois-Chauvel et al, 1998). In order to
investigate this problem, we asked subjects to compare pairs of short rhythmic sequences differing in the position of one
event moved towards another group ("grouping" condition) or slightly displaced in order to disrupt the underlying pulse
("metric" condition). During these tasks, we measured their brain activity in an fMRI scanner.
Method :
Subjects
9 subjects (3 males, 6 females, mean age: 25; right-handed) participated in the experiment. All reported normal hearing
and no neurological history.
Stimuli
Our stimuli consisted of pairs of short (3.8 sec) rhythmic sequences separated by a silence of 1.7 sec. Each rhythmic
event consisted of a complex percussive sound with a fundamental pitch corresponding to A4 (F0=440Hz). Each sound
lasted 50ms and had a level of 90 dB SPL. The sounds were played through headphones via a pneumatic system. The
average level of the noise produced by the scanner was reduced to 75dB SPL by the means of sound protecting
headphones. Depending on the sequences, the number of events varied from 8 to 12.
file:///g|/Wed/Brochard.htm (1 of 7) [18/07/2000 00:37:07]

Grouping task: In this task, the sequences were irregular, with alternating Inter Onset Intervals (IOI) of 250,
700 and 1300 ms in order to prevent subjects from extracting a regular pulse or metric structure (ratio = 1
:2.8 :5.2). In half of the trials, one event was displaced from one group to the next. An example for one trial
is shown in Figure 1.
Metric task: In this task, the sequences were composed of the alternation of 3 IOI sharing integer ratios
1:2:4 (250, 500 and 1000 ms). In half of the trials, one event was delayed or advanced by 70 ms, locally
disrupting metric expectancies but still remained in the same group. An example for one trial is shown in
Figure 2.
Baseline condition: Both rhythmic tasks alternated with the same control task. In this task, subjects heard a
continuous sound with the same pitch and timbre, of a total duration around 10 seconds. We used such a
sound in order to remove any activity involving non temporal auditory processing. We decided to use a
continuous sound rather than regular filler patterns as other authors did (e.g. Sakai et al, 1999) in order to
prevent the subject from performing any cognitive task involving processing of temporal information
during the control condition.
Procedure
Before scanning, subject were presented with practice trials in order to check their understanding of the tasks. During
scanning, series of several pairs of sequences were presented for both conditions. The subjects had to press the right
button of a mouse if both sequences of one pair were identical and the left button if they were different. During the
control condition, they pressed alternatively the right or left button as soon as the sound stopped (subjects were informed
that no reaction time was measured). Specific instructions were given to the subjects: they were not allowed to
inner-sing or tap the rhythm and had to wait until the end of the second sequence to give their answer. They had to focus
on the regularity in the metric task and on the succession of events in the grouping task (we did not mention the presence
of distinct groups).
Data Acquisition and Analyses
A Bruker (Karlsruhe, Germany) 2.0 T system equipped with a 30mT.m-1 gradient coil set for echo planar imaging (EPI)

was used to perform all studies. Measures were averaged over four different repetitions for both conditions. The
acquisition consists of 32 transaxial gradient-echo planar (GE-EPI) 64*64 brain isotropic (4 mm) slices, so-called
volume, and repeated each 5500ms (repetition time), resulting in 120 scans in total. Before statistical analysis,
pre-processing of the images had been performed, namely realignment, normalisation and smoothing according to the
procedures proposed by Friston et al. and as implemented in the Statistical Parametric Mapping (SPM99b) software
(Wellcome Department of Cognitive Neurology; Friston et al., 1995. Statistical analysis was performed: the single
condition paradigm (two tasks) was modeled using delayed (5.5 s) boxcar hemodynamic model function in the context
of the general linear model (Friston et al., 1995), resulting in a t statistic for each and every voxel. These t statistics were
transformed to z statistics. Voxels that survived the statistical criteria of significance (p < 0.01 one tailed) corrected for
multiple comparisons constitute a statistical parametric map (SPM). Anatomical identification of activated areas was
performed individually by mapping areas onto the subject's own anatomical normalized (T1) images (T1 images on T1
template). Following individual anatomical identification of activated areas for each subject, the identified activated
areas from multiple subjects were mapped onto the best fitted area of the normalized template T1 image in the Talairach
(Talairach and Tournoux, 1988) reference coordinate system.
Results:
Several brain regions revealed significant activation during grouping and metric processing relative to the baseline
(control non rhythmical task). Table 1 presents the areas significantly activated for both conditions. Figure 3 and 4
report the activation obtained in the grouping and the metric conditions (N=9) respectively, on a standardised brain. We
will review in turn the cerebral regions activated and identify the brain regions active in only one of the conditions.

Figure 3: Statistical parametric analysis (SPM{z}) for 9 subjects in the grouping condition. The foci of signicantly
increased activities (shown in red) were rendered on the surface template of a standard brain as implemented in SPM99b
(Wellcome Department of Cognitive Neurology, London, UK). The dimmer the color, the deeper the activation.
Figure 4: Statistical parametric analysis (SPM{z}) for 9 subjects in the metric condition (see legend of Figure 3).

Temporal lobes
The temporal lobes (especially Brodmann areas -BA-s 22 and 40), were bilaterally activated in both tasks. This confirms
the idea that musical rhythm is processed in associative auditory areas in the temporal lobes. However, there was a
spread of activation in the anterior part of the superior and medial left temporal gyri (BA 22 and 38) in the metric task.
The spread of activation in the temporal regions was more posterior (towards BA 40) in the grouping than in the metric
condition. These results partly confirm the anterior/posterior dissociation observed in metric and "rhythmic" contrasting
tasks observed recently by Liegeois-Chauvel (1998).
Frontal and Prefrontal lobes
A broad bilateral activation in the SMA (supplementary motor area) and some activation in the PMC (premotor cortex,
BA 6) were obtained in both grouping and metric tasks. This activation was spread towards more anterior regions in the
grouping task. The activation in motor-associated areas may seem surprising since any motor activity (in our case
pressing a button to give the answer) should have been eliminated by the motor activity in the control condition.
However, activation in these areas has already been reported during the processing of visual and auditory temporal
patterns (Tracy et al., 1999; Sakai et al., 1999; Schubotz et al., 2000), but all these tasks were assessed by the
reproduction (tapping) of the rhythmic stimuli. To our knowledge, this is the first report of the activation of
motor-related areas in a purely perceptual rhythmic task. This is also a strong evidence against previous assumptions
that motor areas would only be involved in the programming of the reproduction of rhythmic sequences.
The frontal opercular areas (including Broca's area and its equivalent in the right hemisphere) were activated bilaterally
in both tasks. This pattern had already been observed in nonverbal rhythmic tasks (Schubotz et al., 2000). Since part of
the motor system (SMA and PMC) was involved in our tasks, it was not possible to assess if the activity in this frontal
area was related to inner singing, linked with the articulation of verbal sounds or if it was related to a broad nonverbal
time processing system. However after single subject analyses revealed that activation in such regions did not perfectly

overlap between the two tasks. A right/left asymmetry of frontal opercular activity was observed for each subject.
However, there was no consistency between subjects regarding the preferential use of the right or left side depending on
the task.
Middle frontal areas (including BA 45, 46 and 9) were also activated in most subjects, with a larger spread of activation
on the right side in the grouping task. This could be related to the memorization of the sequences during the comparison
since These brain regions have classically been proposed to mediate attentional processes and working memory when
listeners perceive melodies (Zatorre et al., 1994). However, selective attention to the time intervals could be a better
explanation since our inter-sequence delay was too short for the participants to completely rehearse the first sequence of
each pair.
Cerebellum
We found cerebellar activation in both rhythmic tasks. This also supports the involvement of motor areas in the
time-based activities (Tracy et al., 1999; Sakai et al., 1999; Schubotz et al., 2000). However the cerebellum is considered
to play a role in perceptual timekeeping tasks. The bilateral cerebellar activity was more lateral in the grouping task and
more medial in the metric task (including the vermis). We did not find any evidence of a posterior/anterior opposition as
reported by Sakai et al. (1999) in the case of reproduction of metric and non metric sequences.
Other areas
The superior parietal gyrus (BA 7) was actived on the right side of the brain in both tasks, which had been observed in
other rhythmic tasks (Platel et al., 1997; Sakai et al.; 1999). This associative multimodal cortex is supposed to take part
in timing mechanisms during both perceptual and motor tasks (Sakai et al., 1999). This region has been proposed to be a
component of a general time-keeping system (Maquet et al, 1996; Sakai et al, 1999), sometimes associated with the
attentional binding of sequential events across time (Posner and Dehaene, 1994).
Finally, in 6 out of 9 participants, left occipital regions (BA 19) were activated in the grouping task only. This could be
explained by the use of mental imagery as a strategy to visualise the rhythmic groups and the displacement of an event
from one group to another. In fact, such a strategy was reported by most subjects during debriefing. The activation of the
visual cortex had already been observed for pitch discrimination in musical sequences (Platel et al., 1997), but was not
observed in any rhythmic task.
Conclusions:
Our results show that common areas are used by subjects in rhythmic tasks based on two distinct cognitive processes.
The brain networks involved in these tasks comprised auditory and motor associated areas. The latter have already been
shown to be part of a general time processing system. Our study is the first to report an activation of motor structures
during a perceptual task using musical stimuli. Most brain imaging studies of rhythm used motor reproduction to assess
subjects' performance. The common areas activated in the metric and grouping tasks included both right and left
superior and middle temporal cortices, prefrontal and right superior parietal areas.
We found a large overlap of the areas activated in both tasks for our 9 subjects. Only little activation was specific to one
of the two tasks. Even if a double dissociation between metric and nonmetric rhythmic processes has clearly been
evidenced in neuropsychological studies (Mavlov, 1980; Peretz, 1990; Liégeois-Chauvel, 1998), very little research has
been carried out on the interaction between these processes. Hence, there are still discrepancies about the exact nature of
each of these processes and their independence from each other. Further research may show, as already suggested by
Liégeois-Chauvel et al. (1998) that these two processes would be less independent than has been observed, especially
since they occur in parallel. This could explain why so much overlap was found between metric and grouping task in our
study. Since grouping and metric features in musical sequences depend mainly on their hierarchical structures, we
envisage in our next brain imaging study to focus on the relation between basic and hierarchical levels of rhythmical
sequences.
However, a few areas were activated in only one task. In the metric condition, we observed a large spread of activity in
the anterior part of the left temporal lobe and in the left medial frontal gyrus (premotor cortex). In the grouping
condition, a larger anterior activation was found in the right prefrontal cortex. Moreover, cerebellar activity showed
distinct patterns between the two tasks. Thus, although our results confirm the existence of a general time processing
system involved in both of our rhythmic conditions, there are specific regions related to only one of the two distinct
types of cognitive processes. However, there was a high variability between subjects and no clear distinctive activation

profiles were shared by all subjects. Hence, preliminary analyses after grouping subjects according to their musical
expertise suggest that the cerebral networks used by expert musicians to process rhythm seems to be spread over more
cerebral structures than those of nonmusicians. Further analyses will be carried out to confirm this view.
References:
Drake, C. (1998) Psychological processes involved in the temporal organisation of complex auditory
sequences: universal and acquired processes. Music Perception, 16, 11-26.
Clarke, E. F. (1999) Rhythm and Timing in Music. In The Psychology of Music, 2nd Edition. D. Deutsch,
Ed. New-york, Academic press, 473-500.
Lerdahl, F., & Jackendoff, R. A. (1983). A generative theory of tonal music. MIT Press. Cambridge, MA.
Liegeois-Chauvel C, Peretz I, Babai M, Laguitton V, Chauvel P (1998) Contribution of different cortical
areas in the temporal lobes to music processing. Brain;121 (10):1853-67.
Maquet P, Lejeune H, Pouthas V, et al. (1996) Brain activation induced by estimation of duration : a PET
study. NeuroImage 3: (2) 119-126.
Mavlov, L. (1980) Amusia due to rhythm agnosia in a musician with left hemisphere damage : a
non-auditory supramodal defect. Cortex, 16, 331-338.
Parncutt, R. (1994) A perceptual model of pulse salience and metrical accent in musical rhythms Music
Perception 11: (4) 409-464
Peretz, I. (1990) Processing of local and global musical information in unilateral brain-damaged patients.
Brain, 113, 1185-1205.
Platel H., Price C., Baron J-C., Wise R., Lambert J., Frackowiak R.S.J., Lechevalier B., Eustache F., (1997)
The structural components of music perception : A functional anatomic study. Brain, 120, 229-243.
Posner, M. I. and Dehaene, S. (1994) Attentional networks. Trends in Neuroscience, 17, 75-79.
Talairach, J. and Tournoux, P. (1988) Co-planar stereotaxic atlas of the human brain: 3-dimensional
proportional system: an approach to cerebral imaging. Stuttgart: Thieme.
Tracy, JI, Faro, SH, Mohamed, FB, Pinsk, M and Pinus, A. (2000) Functional localization of a
"time-keeper" function separated from attentional resources and task strategy. NeuroImage 11, 228-242.
Zatorre, RJ., Evans, CE. and Meyer, E. (1994) Neural mechanisms underlying cerebral melodic perception
and memory for pitch. Journal of Neuroscience, 14, 1908-1919.
Back to index

THE BIRTH OF MUSIC IN SYNCHRONOUS CHORUSING AT THE HOMINID-CHIMPANZEE SPLIT
Proceedings paper
THE BIRTH OF MUSIC IN SYNCHRONOUS CHORUSING AT THE

HOMINID-CHIMPANZEE SPLIT
Bjorn Merker
Center for Research in Music Education
Royal University College of Music
Stockholm, Sweden
One of the few constraints available to guide a search for the origins of music is the existence of
human universals (Brown, 1991) in the domain of music, that is, types of music present in every
human culture without exception. Chief among these is measured music, i.e. music whose temporal
patterns are based on an evenly paced (isochronous) time-giver, the musical pulse or beat (Arom
1991, 2000; Nettl 2000). This structural feature of music avails itself of the universal human capacity
to entrain to isochronous pulse trains, or to "keep time" (Fraisse 1982; McNeil 1995), an ability
virtually confined to humans among higher animals. As a species-wide behavioral capacity not shared
with a relative as close to us as the common chimpanzee (Williams 1967), our ability to entrain to an
isochronous pulse is a candidate diagnostic feature of our species relevant to the reconstruction of the
origins of music.
Though isochronous entrainment is rare among higher animals, it does occur spottily distributed
among lower animals. It is present in certain species of fireflies, cicadas, crickets, crabs and frogs
under the name of synchronous chorusing (see review by Greenfield, 1994). In these cases males
synchronize their behavior to a common, isochronous pulse during courtship behavior to attract
mobile females in the breeding season. Such behavioral synchrony need not be a cooperative behavior
aimed at synchrony itself, but may be a default position used by males as a baseline for re-setting the
timing of their signal to occur just ahead of neighbors in the sequence. This exploits a mechanism of
temporal contrast in the perceptual system of the female which makes her turn to that out of several
competing sounds which occurs just ahead of the others (Greenfield and Roizen 1993; Backwell et al.
1998). A few species do, however, appear to engage in genuinely cooperative synchrony. For
example, when males - for any of a variety of reasons - live in groups which compete with one
another for mobile females, cooperative behavioral synchrony to an isochronous pulse can increase
the geographic reach of the courtship signal by amplitude summation of individual signals
superimposed through temporal synchrony. The more exact the synchrony - the more perfect the
entrainment to the common pulse - the more effective the amplitude summation and thus the greater
the geographic reach of the signal, and the greater the number of female ears it will reach (Wells
1977; Buck and Buck 1978; Morris et al. 1978. See also Greenfield, 1994). Well synchronized groups
of males would thus tend to attract females to themselves at the expense of less well synchronized
groups.
Chimpanzee sociality differs from that of many other primates and mammals by being based on male
group territoriality and female exogamy. Females leave their natal territory to settle in a new territory
where they mate and rear their young (Pusey 1979; see also Pusey et al 1997). Males, on the other
file:///g|/Wed/Merker.htm (1 of 5) [18/07/2000 00:37:08]

hand, generally do not leave their natal territory but defend it by patrolling its borders against
neighboring groups (Wrangham 1975; Ghiglieri 1984). Stable groups of territorial males must, in
other words, attract females from other territorial groups, and are thus in competion with one another
regarding migrating females. Female exogamy is also present in humans, characterizing a majority of
hunter-gatherer societies (Ember 1978). A trait shared by humans and chimpanzees can be assumed to
have been present in their common ancestor as well, whose social system accordingly featured groups
of associated males competing for migrating females. This global pattern of sociality is strikingly
reminiscent of the pattern of sociality associated with the evolution of genuinely cooperative chorus
synchrony in insects (Morris et al. 1978; Greenfield 1994), and raises the question of whether
synchronous chorusing might have been a factor in the evolutionary history of the
hominid-chimpanzee clade.
There are two major points of divergence in this history which have left representatives surviving to
this day, one being the late miocene split between hominids - who eventually gave rise to homo - and
ancestral chimpanzees some five to six million years ago, the other being the split between ancestral
common chimpanzees and ancestral bonobos a few million years later. Synchronous chorusing
appears to have played a role in both speciation events, since in either case the present day
representatives of one branch of either split, namely human beings and bonobos, appear to possess the
capacity for synchronous chorusing, a capacity not possessed by the common chimpanzee. As already
noted, the capacity for entrainment to an isochronous pulse is a cross-cultural human universal, and it
has been reported that bonobos engage in a unique vocal behavior without homologue in common
chimpanzees - so called "staccato hooting" - in which multiple individuals synchronize their hooting
to a common steady beat (de Waal 1988).
The possibility that the behavioral adaptation of entrainment for vocal synchrony arose twice by
independent evolution from the common ancestor of humans and chimpanzees implies that there must
have been strong predisposing factors for it in ancestral behavior, making the step to cooperative
synchrony a short one, taken twice by independent evolution. Beyond the global pattern of sociality
already referred to there is an additional feature of chimpanzee behavior which might provide the key
in this regard, namely the so called "carnival display" (Reynolds and Reynolds 1965; Sugiyama 1969,
1972; Wrangham 1975, 1979; Ghiglieri 1984). When a subgroup of foraging chimpanzee males
discovers a ripe fruiting tree they tend to launch a noisy display of combined vocal and locomotor
excitement which attracts other males and females to the site. The new arrivals join the group display
before eventually settling down to feed on the newly discovered resource. The display is cooperative
in that it attracts additional mouths to the resource, and it is an honest signal of resource discovery,
since false alarm are likely to provoke retaliation.
Over time, the frequency of carnival displays on a given territory would tend to reflect a combination
of male cooperativity and abundance of fruit trees on that territory. This signal would provide a source
of potentially important information not only for other members of the territorial group, but for those
inter-territorially migrating females who are in the process of choosing a territory on which to settle
permanently for the rearing of their young. However, the voice resources of common chimpanzees are
such that the chaotic noise of their carnival display is unlikely to span even the diameter of their home
territory - measuring some four kilometers across - let alone penetrate the circle of surrounding
territories. This constraint imposed by vocal limitations could, however, be overcome by cooperative
male synchronous chorussing, allowing the amplitude of individual voices to sum to the extent of the
precision of their entrainment to a common, isochronous pulse. Since sound attenuates according to
the square of the distance, 4 perfectly synchronized males would double the reach of their signal,
while 16 synchronized males would quadruple its reach, allowing the carnival display to penetrate
beyond territorial boundaries to reach the ears of migrating females.

It only remains to suggest that humans are the direct descendants of a subpopulation of the
human-chimpanzee common ancestor in which the selection pressure of female choice of migratory
target territory led to the evolution of temporal synchrony in the vocal behavior of males performing
the carnival display. Since the extent of summation of individual voices, and thereby the geographic
reach of the summed signal, depends on the precision of synchrony, a premium attached to timing
precision. It is further suggested that just as today we assist the precision of our musical timing by
relying on repetitive locomotor rhythms like foot tapping, this early break-away subpopulation of the
common ancestor supported the precision of its vocal timing by repetitively rhythmic locomotor
movements performed in place, that is, by a form of dancing. Synchronous chorusing and dancing to
an isochronous pulse in a group setting would seem to qualify as a form of music by a wide range of
construals of that elusive term (see, for example, the Bantu term "ngoma" (Keil 1979), the Blackfoot
"saapup" (Nettl 2000) and the old greek "mousiké" (Merker 1999)).
If the above evolutionary scenario has any merit, the ultimate roots of human music extend back to the
parting of ways between pre-chimpanzees and hominids through a late miocene "breakthrough
adaptation" (synchronous chorusing) which allowed a subpopulation of group-living ancestors to
broadcast the witness of their voices regarding their own cooperativity and the resource-richness of
their territory to the tuned ears of migrating females in search of a suitable territory on which to settle
to rear their young. The present-day universal human capacity to keep time (entrain to an isochronous
pulse) would accordingly be an adaptation retained from this time, informing the cross-cultural
ubiquity of measured music as well as other uses of the human capacity to entrain to an isochronous
pulse. This evolutionary scenario for the origins of human music harbors farreaching consequences
for numerous issues pertaining to the subsequent trajectory of human evolution, including the issues
of brain expansion, the evolution of language, and the relationship between human music and
language (see Merker, 2000).
Acknowledgement
Work on this paper was supported by a grant from the Bank of Sweden Tercentenary Foundation to
Bjorn Merker.
References
Arom, S. (1991). African Polyphony and Polyrhythm (Book V). Cambridge: Cambridge University
Press.
Arom, S. (2000). Prolegomena to a Biomusicology. In N. L. Wallin, B. Merker and S. Brown (Eds.)
The Origins of Music. Cambridge, Mass.: The MIT Press, Ch. 2.
Backwell, P., Jennions, M., and Passmore, N. (1998). Synchronized courtship in fiddler crabs. Nature,
391, 31-32.
Brown, D. B. (1991). Human Universals. New York: McGraw Hill.
Buck, J. (1988) Synchronous rhythmic flashing in fireflies. II. Q. Rev. Biol., 63, 265-89
Buck, J. and Buck, E. (1978). Toward a functional interpretation of synchronous flashing in fireflies.
Am. Nat., 112, 471-492.
Ember, C. R. (1978). Myths about hunter-gatherers. Ethnology, 17, 439-448.

Fraisse, P. (1982). Rhythm and tempo, In D. Deutsch (Ed.) The psychology of music (pp. 149-180).
New York: Academic Press.
Ghiglieri, M. P. (1984). The chimpanzees of Kibale Forest. New York: Columbia University Press.
Greenfield, M. D. (1994). Cooperation and conflict in the evolution of signal interactions. Annual
Review Ecol. Syst., 25, 97-126.
Greenfield, M. and Roizen, I. (1993). Katydid synchronous chorusing is an evolutionarily stable
outcome of female choice. Nature, 364, 618-620.
Keil, C. (1979). Tiv Song. Chicago: The University of Chicago Press, Ch. 2.
McNeill, W. H. (1995). Keeping together in time. Dance and drill in human history. Cambridge,
Mass.: Harvard University Press.
Merker, B. (1999). Synchronous chorusing and the origins of music. Musicae Scientiae. Special Issue:
Rhythm, Musical Narrative, and Origins of Human Communication, pp. 59-73.
Merker, B. (2000). Synchronous chorusing and human origins. In N. L. Wallin, B. Merker and S.
Brown (Eds.) The Origins of Music. Cambridge, Mass.: The MIT Press, Ch. 18.
Morris, G. K., Kerr, G. E., and Fullard, J. H. (1978). Phonotactic preferences of female meadow
katydids (Orthoptera: Tettigoniidae: Conocephalus nigropleurum). Can. J. Zool., 56, 1479-1487.
Nettl, B. (1983). The Study of Ethnomusicology: Twenty-nine Issues and Concepts. Urbana:
University of Illinois Press.
Nettl, B. (2000). An Ethnomusicologist Contemplates Universals in Musical Sound and Musical
Culture. In N. L. Wallin, B. Merker and S. Brown (Eds.) The Origins of Music. Cambridge, Mass.:
The MIT Press, Ch. 25.
Pusey, A. (1979). Inter-community transfer of chimpanzees in Gombe National Park. In D. A.
Hamburg and E. McCown (Eds.) The great apes (pp. 465-79). Menlo Park, Calif.:
Benjamin/Cummings.
Reynolds, V. and Reynolds R. (1965). Chimpanzees of the Budongo Forest. In I. DeVore (Ed.)
Primate Behavior: Field Studies of Monkeys and Apes. New York: Holt, Rinehart and Winston, pp.
368-424.
Sugiyama, Y. (1969). Social behavior of chimpanzees in the Budongo Forest, Uganda. Primates, 9,
225-258.
Sugiyama, Y. (1972). Social characteristics and socialization of wild chimpanzees. In F.E. Poirer
(Ed.) Primate Socialization. New York: Random House, pp. 145-163.
de Waal, F. B. M. (1988). The communicative repertoire of captive bonobos (Pan paniscus) compared
to that of chimpanzees. Behavior, 106, 183-251.
Williams, L. (1967). The dancing chimpanzee: A study of primitive music in relation to the vocalizing
and rhythmic action of apes. New York: Norton.
Wrangham, R. W. (1975). The behavioural ecology of chimpanzees in Gombe National Park,
Tanzania. Ph.D. dissertation. Cambridge, England: University of Cambridge.

Wrangham, R. W. (1979). On the evolution of ape social systems. Soc. Sci. Int., 18, 335-368.
Back to index

IS THE EMOTIONAL SYSTEM ISOLABLE FROM THE COGNITIVE SYSTEM IN THE BRAIN
IS THE EMOTIONAL SYSTEM ISOLABLE FROM THE COGNITIVE SYSTEM IN THE BRAIN?
Isabelle Peretz
University of Montreal
C.P. 6128
succ. Centre-ville
Montreal, Québec H3C 317
CANADA
Background: A central question in cognitive neurosciences is related to the functional and neuroanatomical autonomy
of the emotion recognition system with regard to the perceptual and memory system. Such a distinction is well
established for faces. For music, a similar dissociation has recently emerged in the literature.
Aims: The goal of the presentation will be to summarise these recent studies performed in neuropsychology, with a
brain-damaged patient and with brain imagery techniques.
Main Contribution: It will be suggested that emotion and recognition share a common perceptual analysis system but
differ in the type of structural characteristics that are needed to achieve their respective goal. For instance, minor and
major mode and tempo are important perceptual determinants of the happy-sad distinction in music. In contrast, mode
and tempo are of little importance for discrimination and identity recognition. Similarly, I will show that emotional
judgement of dissonance in subcortical structures cannot take place without initial perceptual analysis in the auditory
cortex.
Implications: Emotions cannot be totally divorced from structural organisation of the musical input. Emotional
judgements provide a novel and indirect way to study implicit knowledge of musical structure.
Back to index
file:///g|/Wed/Peretz.htm [18/07/2000 00:37:09]

Creating ambiguous auditory figures using the rhythmic masking release paradigm:
Proceedings paper
The Perceptual Organisation of Tones in a Free Field

Martine Turgeona) and Albert S. Bregman
Auditory Laboratory, Psychology Department, McGill University
1205 Dr. Penfield Avenue, Montreal, Canada, H3A 1B1
Telephone: (33 1) 44.78.12.78 FAX: (33 1) 44.78.15.40

E-mail: Martine.Turgeon@ircam.fr
This research was presented as part of a Ph.D. thesis submitted to the Psychology Department of McGill University. Funding was provided in
part, by a grant from the National Sciences and Engineering Research Council of Canada (NSERC), and in part by a team grant from the to A.S.
Bregman and R.J. Zatorre.
a) The first author, Martine Turgeon is currently affiliated with "Institut de Recherche en Coordination Acoustique/Musique" (IRCAM),
Perception et Cognition Musicales. Reprints are available from Martine Turgeon at IRCAM, 1 Place Igor-Stravinsky, 75004, Paris, France.
Introduction
Issues of interest
The perceptual organization of complex tones depends on the detection of biologically-relevant cues in the acoustic signal,
such as those providing evidence for a common spatial location of the components of a sound source, and those reflecting
spectro-temporal regularities typical of causally-related sounds such as simple harmonic ratios and temporal synchrony.
There is evidence that the auditory system groups perceptually sounds that share a common fundamental frequency
(McAdams and Smith, 1990), and a common spatial location of their sources (Kidd et al., 1998). Furthermore, Turgeon and
Bregman (1999) have shown that the fusion of noise bursts in a free field is promoted by temporal synchrony. Though the
contribution of many specific cues to auditory grouping has been established empirically (reviewed by Darwin and Carlyon,
1995), their interaction is poorly understood, especially in a free field. It is important to study grouping in the context of
many interacting cues, since in real-world situations, no grouping cue acts in isolation. The present study was conducted in a
free field with a semi-circular array of speakers to look at how the spatial separation of sound sources interacts with two of
the most robust cues for the grouping of concurrent tones: harmonicity and temporal synchrony.
Rationale of the Rhythmic Masking Release (RMR) paradigm
We used the RMR paradigm (Turgeon and Bregman, 1996) to study the relative contribution of onset asynchrony, deviations
from simple harmonic ratios, and the spatial separation of sources for the segregation of concurrent brief tones. In this RMR
study, a rhythm was perceptually masked by embedding identical tones irregularly among the regular tones. The rhythm is
camouflaged because no acoustic property distinguishes the regular subset of tones from the irregular one. We refer to the
irregular tones as "maskers"; though they do not mask the individual tones, they mask their rhythmic sequential organization.
"Captor" tones can be added in different critical bands simultaneously with the irregular maskers. These tones release the
rhythm from masking when they are completely simultaneous (Turgeon and Bregman, 1996); that is, temporal coincidence
fuses them perceptually. The newly formed masker-captor units have emergent properties, such as a different timbre and a
new pitch; this distinguishes the irregularly-spaced components from the regularly-spaced ones. The accurate perception of
the rhythm is thus contingent upon the fusion of the irregular maskers and captors. Measuring the listener's ability to identify
the embedded rhythm thus provides an estimate of the degree of perceptual fusion of the maskers and captors. We
manipulated the spatial, spectral and temporal relations between the maskers and captors to see how their fusion was affected
by these factors, using a two-alternative forced choice task, in which one of two rhythms was embedded in the sequence.
Objectives and hypotheses
file:///g|/Wed/Turgeon.htm (1 of 7) [18/07/2000 00:37:11]

Relative contribution of a common onset and offset, F0, and location of source on fusion. One of the objectives of the study
was to assess the relative importance of auditory-grouping cues by creating competition among them. For instance, suppose
that the masked rhythm sequence and the captors are presented in different speakers. While the common relation to a
fundamental frequency (F0) and the common speaker location of the masked-rhythm tones should promote their sequential
grouping, the temporal coincidence and common F0 between the maskers and captors should promote their simultaneous
grouping. If common spatial location and frequencies overcome the segregating effects of temporal synchrony, the rhythm
should remained perceptually masked; on the other hand, if temporal synchrony and a common F0 (among spatially and
spectrally distributed components) win the competition, the maskers and captors should fuse perceptually and the rhythm
should be heard clearly.
We expected simultaneity of onset and offset to make a much greater contribution to the fusion of complex tones than would
their harmonic relations or their separation in space. This expectation was based on the high ecological validity of temporal
coincidence for the perception of components as a single event as well as the empirical evidence showing its powerful effect
on the fusion of components (reviewed by Darwin and Carlyon, 1995; Turgeon, 1999). Despite the importance of simple
harmonic ratios on pitch perception (Hartmann, 1988), we did not expect it to have a strong effect on the fusion of our brief
tonal stimuli. This expectation was based on recent results showing that harmonicity only weakly affects the diotic and
dichotic fusion of the same stimuli over headphones (chapter 4 in Turgeon, 1999). The weakness of the harmonicity effect
was attributed to the short duration of the tones (i.e., 48 ms). The tones typically used to study the effect of harmonicity on
fusion range from one to several hundreds of milliseconds in length.
Past results suggest that the perceptual organization of sounds is influenced by the spatial separation of sound sources (Kidd
et al., 1998). However, the results of a recent RMR experiment (Turgeon and Bregman, 1999), which presented noise burst
stimuli in a free-field setting, showed that presenting them in different speakers only weakly affected their fusion, compared
to when they came from the same speaker. Moreover, up to an angular separation (∆θ) of 180 degrees of the sources, ∆θ was
not sufficient for the full segregation of synchronous or slightly asynchronous bursts, and the magnitude of ∆θ did not affect
the strength of the fusion. This weak effect can be contrasted with the strong effect of onset asynchronies (SOA) of 36 and 48
ms, which fully segregated the maskers and captors at all ∆θ's (from 0 to 180 degrees). The weak effect of ∆θ, compared to
SOA, might be related to temporal coincidence being a more robust cue than a common location in space for sound-source
determination. Unlike reflected light, sounds go around and through rigid surfaces (as such they are like transparent objects).
As a consequence, in estimating the point of origin of a sound (i.e., the spatial location of the vibrating source), echoes may
suggest more than one point of origin. We believe that echoes and reverberation present the auditory system with a degraded
signal, so that spatial information is often unreliable. Given these ecological considerations and the results of our earlier
free-field study with noise bursts (Turgeon and Bregman, 1999), we expected ∆θ to only have a weak effect on cross-spectral
fusion.
Temporal limits for event perception. Another objective was to evaluate the minimum temporal deviation from perfect
temporal synchrony which triggers the perception of concurrent tones as separate sound events. Onset asynchrony was
expected to have a powerful effect on cross-spectral grouping, because it is a highly reliable cue for the segregation of
sound-producing events. In a natural context, it is likely for sounds coming from different environmental sources to have
some degree of temporal overlap; however, it is unlikely that they happen to be perfectly coincident in time. Given the
adaptive value of detecting deviations from perfect coincidence, an empirical question of interest was to estimate the physical
range of tolerance for the perceived simultaneity of sound events. Past research in this laboratory addressed this issue by
estimating to what extent there could be a deviation from onset and offset synchrony before concurrent sounds were
perceived as separate events 75% of the time (Turgeon, 1999). Such an SOA threshold for perceiving separate events was
estimated to be between 28 and 35 ms, when brief complex tones were presented diotically and dichotically over headphones
(chapter 4 in Turgeon, 1999). Such an SOA threshold for perceiving separate events was estimated to be between 28 and 35
ms, when brief complex tones were presented diotically and dichotically over headphones (chapter 4 in Turgeon, 1999). In
that study, we estimated individual SOA thresholds, within each of four conditions: diotic and dichotic presentation of
maskers and captors, either harmonically related or not. When they were presented dichotically, which induced a difference
in perceived lateralization, the value of SOA required to segregate them was 12 ms lower than when they were presented
diotically. This was true for both harmonic tones (40 vs. 28 ms) and inharmonic ones (38 ms vs. 26 ms), a lower threshold
indicating less fusion of the maskers and captors. However, whether or not the tones shared a common F0 had little influence
on the SOA threshold (the SOA value required to segregate them). A difference in F0 caused only a 2-ms difference in mean
SOA thresholds, and the standard errors overlapped. Turgeon (1999) concluded that dichotic presentation, but not
harmonicity, influenced the temporal disparity between concurrent tones that was needed for their perception as separate
sounds.
The present experiment examined whether similar temporal limits hold for the presentation of the same stimuli in a free field.
We did not expect harmonicity to have a significant effect on SOA thresholds, though it might affect them weakly. Assuming

that the earlier observed effects of dichotic presentation had acted through differences in perceived lateralization, we
expected that larger angular separations of maskers from captors in a free field should diminish SOA thresholds.
Methods
Subjects. The listeners were 18 adults who were naive to the purpose of the experiment. All had normal hearing for the
250-8000 Hz frequency range, as assessed through a short air-conductance audiometric test.
Stimuli. Stimuli were synthesized and presented by a PC-compatible 486 computer, which controlled a Data Translation DT
2823 16-BIT digital-to-analog converter. The rate of output was 20000 samples per second. Signals were low-pass filtered at
5000 Hz, using a flat amplitude (Butterworth) response with a roll-off of 48 dB/octave. Listeners sat at the center of a
semi-circular array of 13 speakers, one meter away from the listener. The speaker array was situated in the sound-attenuated
chamber of Dr. Zatorre, at the Montreal Neurological Institute. The head of the listener was fixed so as to point in the
direction of the central speaker of the array. The RMS intensity level was the same for all the four-partial tones; it was
calibrated as equal to that of a 1000-Hz tone presented at 60 dB SPL at the central position of the listener's head, that is, at
the center of the array of speakers, one meter away from all of them. When temporally-overlapping tones were presented in
two different speakers (a four-harmonic tone was presented in each speaker) the RMS level was the same at each speaker.
Two rhythmic patterns were to be discriminated by the listeners. Each was repeated to form a sequence that had a total
duration of 9.5 seconds, was composed of 15 tones, and had a tempo of 1.7 tones per second. The two rhythms were different
temporal arrangements of a short 384-ms inter-stimulus interval (ISI) and a long 768-ms one. Rhythm 1 repeated an
alternation of short, long, short, long ISI's three and a half times. This gave rise to perceptual grouping of tones by pairs.
Rhythm 2 repeated a cycle of short, long, long, short ISI's three and a half times; this gave rise to perceptual grouping of
tones in which triplets alternated with a single tone. Both rhythms started and ended with an alternation of a short and a long
ISI. To perceptually camouflage each rhythm, irregular maskers were interspersed among the rhythmic tones. The rhythms
had a constant temporal density of one irregular masker for each 192-ms interval; there were thus two maskers in the short
384-ms ISI and four in the long 768-ms one. The variability in the distribution of irregular intervals was the same in all
conditions, including the no-captor controls. There was no overlap between the rhythmic and masking tones.
In any condition, the same spectrum was used for the rhythmic and masker tones: the same four harmonics of 300 or 333 Hz
of equal intensity. Together they formed the masked-rhythm sequence, which was presented in isolation for the no-captor
control conditions. In all the other conditions, some captor tones were added; they were composed of four harmonics either
of the same F0 as the maskers, or of a different F0. The four possible combinations of maskers and captors were: odd and
even harmonics of a 300-Hz F0; odd and even harmonics of a 333-Hz F0; odd harmonics of a 333-Hz F0 and even harmonics
of a 300- Hz F0; odd harmonics of a 300-Hz F0 and even harmonics of a 333-Hz F0. For each of these combinations, there
were two versions, one in which the maskers (and the rhythm) had the high pitch (even harmonics of 300 or 333 Hz), the
captors having the low pitch (odd harmonics of 300 or 333 Hz), and the other, which had them interchanged.
Each tone was 48-ms long, including a 8-ms onset and offset. The captors could be either simultaneous with the maskers or
delayed from them by 12, 24, 36 or 48 ms. The maskers and captors were of the same duration; hence, for each onset
asynchrony there was an offset asynchrony of the same duration. The amount of temporal overlap between the maskers and
captors varied from a full 48-ms overlap to no overlap. The asynchronous maskers and captors were aligned in phase during
their period of overlap so that the positive peaks of their waveforms were aligned at the period of their common F0. The
masked-rhythm sequence and the irregular captors were either presented in the same central speaker, or else in two different
speakers, equally distant from the central speaker. The speakers could be off center by 30, 60 or 90 degrees; these relative
positions of the sources of the maskers and captors yielded threes angular separations (∆θ): 60, 120 and 180 degrees. For
each ∆θ, the presentation of the masked rhythm and captors on each side of the array was counterbalanced across trials.
Procedure. The subjects had to judge which one of the two rhythms was embedded in the sequence and how clearly it was
heard on a 5-point scale. After each trial, feedback about the accuracy of rhythm identification was provided. There was a
short training session. Listeners were told that they would hear a warning tone followed by one of the two rhythms that they
had previously heard in isolation. They were instructed to direct their attention to the location of the speaker that had sent the
warning tone and to tell which of two rhythms was played. The two isolated rhythms (without captors) were randomly played
at each of the 13 possible speakers until the listeners reached the criterion of 10 correct identifications in a block. This was
followed by a practice session which randomly presented each combination of SOA, ∆θ and harmonicity. This session
allowed the listeners to become familiar with the task and to hear the variations across the conditions, so as to better use the
full range of the rating scale. During the experimental trials of the experiment proper, a 1000-Hz warning tone was played in
the speaker of the masked rhythm so that listeners could pay attention to the location of that rhythm. The listeners' heads
remained fixed despite their attention being directed to speakers in different locations.
Results
Computation of scores

Measure of rhythm sensitivity and response bias. Different accuracy measures were derived from listeners' responses: d'
scores, proportion-correct scores (PC) and weighted-accuracy (WA). WA weights the rated accuracy by the clarity of the
identified rhythm. For parsimony purposes, this short paper focuses on d'scores, occasionally reporting PC scores. The
d'scores and response bias or c were evaluated according to standard procedures (Macmillan and Creelman, 1991). The d'
scores measured sensitivity to Rhythm 1. In terms of Z (i.e., the inverse of the normal distribution function), d' is defined as
Z(H) - Z(F); where H is the proportion of Hits (i.e., Rhythm 1 is reported when it is physically present) and F is the
proportion of False Alarms (i.e., Rhythm 1 is reported when Rhythm 2 is physically present). In Z-scores units, c is given by:
0.5* [Z(H) + Z(F)]. A standard table of the normal distribution was to convert H and F to Z-scores (Macmillan and
Creelman, 1991).
When listeners cannot discriminate at all between the two rhythms (i.e., chance-level performance), H=F and d'=0. On the
other hand, perfect accuracy implies an infinite d'. To avoid values of infinity in the computation of d', proportions of 1 and 0
were thus converted into 0.999 and 0.001 respectively. Proportions of 0.999 and 0.001 yield d' values of 6.18 and -6.18.
However, a lower value of d', namely, 4.65 is usually considered as the effective ceiling (Macmillan and Creelman, 1991);
this is obtained when H=0.99 and F=0.01. As for response bias, a positive c indicates a higher tendency to respond Rhythm
1, a negative c indicates a higher tendency to respond Rhythm 2. Mean bias parameter c close to the zero-bias point are thus
considered as indicative of the absence of a systematic response bias for a given subject.
Estimates of asynchrony threshold for perceiving separate events. To obtain an estimate of the magnitude of stimulus onset
asynchrony (SOA) required for the perception of concurrent sounds as separate events, we determined the 75% SOA
threshold from psychometric "Weibull" functions for the individual listeners (Weibull, 1951). Separate SOA thresholds were
evaluated for the eight different spectro-spatial relations of this experiment (harmonic and inharmonic conditions for each
∆θ). For each of the eight ∆θ-by-harmonicity conditions, the mean goodness of fit (as measured by r, the Pearson correlation
coefficient) of the data was equal to, or larger than 0.87.
Description of the main trends in the results
No-captor controls and measures of biases. The no-captor controls yielded mean PC of 0.54 (SE=0.03) and mean d' of 0.26
(SE=0.12); this is close to the chance level PC performance of 0.5 and d' performance of 0. This verifies that the rhythm was
perceptually masked in the absence of captors. The results also verified that there was no bias for one rhythm over the other.
For the conditions with captors, the mean response bias parameter c for the 18 listeners was -0.03 (SE=0.05); for their
no-captor counterparts, the mean c across individuals was -0.009 (SE=0.09). Given the mean c value very close to zero and
the small standard errors (SE), we concluded that response bias did not diminish the power of our statistical comparisons.
Effect of stimulus onset asynchrony (SOA). For each of the eight ∆θ-by-harmonicity conditions with no temporal asynchrony,
the rhythm-identification performance was at the ceiling value, namely, a PC of 0.99 and a d' of 4.65 for each listener. It thus
seems that temporal coincidence caused frequency components to be perceptually fused, whether they were harmonically
related or not, and whether they came from the same location or from spatially-separated sources, 60, 120 or 180 degrees
apart.
There was a clear monotonic decrease of d' with SOA [p < 10-5]. This powerful effect of SOA upon the fusion of the
maskers and captors is consistent with past results found in the laboratory for the diotic and dichotic fusion of the same tonal
stimuli presented over headphones, as well as for the fusion of brief noise bursts in a free field (Turgeon and Bregman, 1996,
1999). From the mean SOA thresholds estimated for the eight ∆θ-by-harmonicity conditions, an SOA between 26 and 37 ms
(i.e., the range extending from one SE below the lowest mean threshold found in the present experiment to one SE above the
highest one found) seems to trigger the perception of concurrent brief tones as separate events. This is in good agreement
with the estimated 23-to-42 ms range for the perception of the same tones as separate events when they were presented over
headphones (chapter 4 in Turgeon, 1999).
The 25-to-40-ms range of the mean SOA thresholds for the diotic, dichotic and free-field segregation of brief tones agrees
with the literature on auditory grouping, reviewed by Darwin and Carlyon (1995), showing that an SOA of 30 to 40 ms is
required for removing a partial from contributing to the overall timbre, to the lateralization and to the vowel identity of a
complex sound. There is a close correspondence between the magnitude of the SOA leading to the perception of separate
sounds and that for the computation of its emergent properties, since timbre, vowel quality and lateralization are properties of
perceptually-segregated sound events. It is worth noting that this does seems not to apply to all perceptual properties of
sounds. For instance, the SOA needed to prevent a partial from entering the computation of the pitch of a complex tone,
estimated as 300 ms by Darwin and Ciocca (1992) is an order of magnitude higher than our estimated 30 ms SOA for event
perception. This discrepancy between the temporal limits for pitch and event perception may be related to differences in their
underlying neural mechanisms (Brunstrom and Roberts, 2000).
Effect of harmonicity and of the spatial separation of sound sources (∆θ). There was a weak but consistent effect of

harmonicity in promoting the fusion for asynchronous masker and captor tones, as measured by d' scores for each listener
[p<0.01]. The spatial separation of sources (∆θ) did not affect at all rhythm sensitivity as estimated by d' [p>0.1]. The d'
scores are compatible with the highly consistent mean SOA thresholds found across the different spatial and spectral
relations, as shown on Figure 1. The mean thresholds all fell between 28.5 (for ∆θ of 120 degrees and different F0's) and
34.2 ms (for ∆θ of 180 degrees and same F0). Note that a higher threshold indicates more fusion, since a larger asynchrony is
required to perceptually segregate the maskers from the captors. From this figure, it is clear that the effect of harmonicity was
weak and that ∆θ had no effect on fusion; still, harmonicity slightly affect the temporal disparity needed for the perception of
separate events - a mean SOA of 32.4 (SE=3 ms) for harmonic stimuli, versus 30.1 (SE=3.9) for inharmonic ones. This 2-ms
difference between the mean SOA threshold estimates for harmonic and inharmonic tones corresponds to that found for their
presentation over headphones (chapter 4 in Turgeon, 1999). The present results suggest that only spectro-temporal
regularities matter for the cross-spectral segregation of concurrent brief tones in a free field, SOA making by far the greatest
contribution.
Figure 1: Mean SOA thresholds across individual listeners for different spectral and spatial relations between the masker and
captor tones having the same F0 (harmonic) or different F0s (inharmonic), at four angular separations of their sources in a
semi-circular speaker array. Standard errors (SE) are indicated.

Discussion
Temporal coincidence and deviations from it, as induced by onset and offset asynchrony, was by far the most important
factor for the perception of short-duration tones as one or two sound(s). Whereas masker and captor tones fused into a single
masker-captor event when they were synchronous, when they were separated by an SOA of about 30 ms, they were
segregated as two distinct events. Strong fusion was clearly shown by the perfect rhythm-identification performance at 0-ms
SOA (PC of 0.99). On the other hand, clear segregation was shown by the low performance at 36 ms (mean PC of 0.70) and
48 ms SOA (mean PC of 0.67). Intermediate values of SOA of 12 and 24 ms produced ambiguous cases of grouping, in
which the maskers and captors were neither fully fused, nor fully segregated. This ambiguous grouping might be linked to
the inherent temporal constraints of the auditory system due to short-term adaptation of the auditory-nerve fibers (Kiang et
al., 1965). As a result of the 10-to-20 ms period that it takes for an onset-sensitive neuron to return to its baseline activity,
there might be a minimum temporal disparity required for the system to distinguish two consecutive sound events which are
temporally contiguous. This is the situation when two sounds are close together in time and separated by a brief period of
silence, as is the case for the detection of a temporal gap, or when they are temporally overlapping, as is the case in our RMR
studies. The hypothesis of short-term adaptation as imposing some limit for the temporal resolution of sound events at
different places in the spectrum is consistent with the estimated minimum 30 ms disparity needed to detect a gap across the
spectrum, i.e., an offset-to-onset interval (Formby, Sherlock and Li, 1998) and to detect an onset-to-onset and offset-to-offset
disparity across the spectrum in our RMR studies.
The presence or absence of a common F0 does not seem to play an important role for the segregation of brief concurrent
tones as shown by the small differences in PC and d' obtained for harmonic and inharmonic maskers and captors.
Furthermore, Figure 1 shows that it affected only weakly the temporal disparity needed for their segregation as separate
events. This is consistent with the results found for the presentation of the same stimuli over headphones (chapter 4 in
Turgeon, 1999). Further experimentation should attempt to determine whether the weak role of harmonicity for the fusion of
short-duration sounds is related to differences in the temporal limits for the segregation of sounds as separate events and the
computation of their pitch (Darwin and Carlyon, 1995).
In this study, the angular separation of the sources (∆θ) did not yield any difference in fusion, whether fusion was estimated
from d' or from SOA thresholds based on PC scores. This goes contrary to the results of research in which the same sounds
were presented over headphones (chapter 4 in Turgeon, 1999). It might be that dichotic separation is more efficient for sound
segregation because it is an extreme case of interaural differences for sounds happening simultaneously, the stimulation of
one sound being delivered to one ear only, while that of the other sound(s) is delivered to the other ear only. Free-field
testing is more akin to real-world situations in which each of many individual sounds stimulates both ears, though at slightly
different times and intensities, allowing for the computation of the location of each sound source. When drawing conclusions
about the contribution of spatial disparities, one should not consider dichotic presentation as reflecting ecologically valid
differences in sound-source locations. Even when two sound sources are close to different ears, a sound coming from one of
them usually stimulates the two ears, albeit with larger binaural differences in intensities and time of arrival than if sources
were closer to the midline axis. For this reason, the separation of sources in a free field is considered as more representative
of the true contribution of spatial separation to sound-source segregation. This contribution seems to be very weak when two
sound sources are simultaneously active. It is also worth noting that steady-state sounds were used in the present study.
Tones fluctuating in amplitude might permit spatial differences to cause segregation, especially with longer tones. This
remains to be empirically investigated.
An important implication of this research is that when brief complex tones happen at the same time, sound-source
segregation ("how many" individual sources are perceived) is independent of sound-source separation ("where" individual
sources are relative to each other in the immediate environment). This is consistent with the claim that localization ("where")
entails segregation ("how many"), but not the reverse. To localize a source, a source has to be perceived in the first place. For
instance, if the bark of a dog or the sound of an unknown animal is heard as coming from a precise location, its source has to
be segregated from the other environmental sources, and this whether it is identified or not. However, a source can be
perceived and identified without being localized. Everyone has experienced at some point, hearing a familiar sound
distinctly, without being able to tell exactly where it was coming from. A similar reasoning holds for pitch: pitch perception
is a property of a perceptually-segregated sound; nevertheless a sound can be segregated without having a pitch, as happens
when a brief click without a definite pitch is perceived. Sound segregation is such a basic property of audition that one might
expect that the system computes it even in the face of ambiguity in the signal (e.g., as to "where" it comes from).
Summary of conclusions
The use of the RMR paradigm, which creates ambiguous auditory figures, allows for the evaluation of the relative
importance of auditory-grouping cues in sound-source determination. It has shown that: i. temporal coincidence is sufficient
for the perceptual fusion of short-duration tones; ii. an onset-to-onset disparity between 28 and 35 ms segregates them as

separate sound events; iii. spectral regularities, such as simple harmonic ratios affect weakly the degree of fusion at
intermediate values of temporal disparities from 12 to 24 ms and iv. the fusion of short-duration tones which are spectrally
non overlapping appears to be independent of the angular separation of their sound sources. The short tones that were used in
this study may have been responsible for these results. Whether these conclusions apply to sounds of a longer duration and to
other types of complex sounds (e.g. speech sounds) awaits further experimentation.
References
Brunstrom, J.M., and Roberts, B. (2000). Separate mechanisms govern the selection of spectral components for
perceptual fusion and for the computation of global pitch. J. Acoust. Soc. Am., 107, 1566-1577 .
Darwin, C. J. and Carlyon, R. (1995). Auditory grouping. In B.C.J. Moore (Second eds.), Hearing: The handbook of
perception and cognition, Volume 6, (pp. 387-424). London: Academic Press.
Darwin, C. J. and Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asynchrony and ear of presentation
of a mistuned component. J. Acoust. Soc. Am., 91, 3381-3390.
Formby, C., Sherlock, L. P., and Li, S. (1998). Temporal gap detection measured with multiple sinusoidal markers:
Effects of marker number, frequency, and temporal position. J. Acoust. Soc. Am., 104, 984-998.
Hartmann, W. M. (1988). Pitch perception and the organization and integration of auditory entities. In G.W. Edelman,
W.E. Gall & W.M. Cowan (Eds.), Auditory function: neurobiological bases of hearing, (pp.; 623-645). New York:
John Wiley and Sons).
Kiang, N. Y-S. Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). Discharge patterns of single fibers in the cats
auditory nerve. Research Monograph No. 35. Cambridge, MA.
Kidd, G., Mason, C., Rohtla, T. L., and Deliwala, P. S. (1998). Release from masking due to spatial separation of
sources in the identification of nonspeech auditory patterns. J. Acoust. Soc. Am., 104, 422-431.
Macmillan, N. A. and Creelman, C. D. (1991). Detection Theory: a User's Guide. Cambridge, MA: MIT Press.
Turgeon, M. (1999). "Cross-spectral grouping using the paradigm of rhythmic masking release". McGill University.
Doctoral thesis dissertation.
Turgeon, M., and Bregman, A.S. (1996). " 'Rhythmic Masking Release': A paradigm to investigate the auditory
organization of tonal sequences.", In: Proceedings of the 4th ICMPC, pp. 315-316.
Turgeon, M. and Bregman, A.S. (1999). Rhythmic Masking Release II: Contribution of cues for perceptual
organization to the cross-spectral integration of concurrent narrow-band noises in a free field -- asynchrony,
correlation of rapid intensity changes, frequency separation and spatial separation. Unpublished manuscript, Dept. of
Psychology, McGill University Montreal, Quebec, Canada. Submitted to J. Acoust. Soc. Am.
Weibull, W. A. (1951). A statistical distribution function of wide applicability. J. Appl. Mech., 18, 292-297.
Back to index

The Voice In Therapy: Monitoring Disease Process In Chronic Degenerative Illness
Proceedings paper
The Voice In Therapy: Monitoring Disease Process In Chronic Degenerative Illness.
Wendy L. Magee PhD SRAsT(M)
Music Therapy Department
Royal Hospital for Neuro-disability
West Hill
London SW15 3 SW
Department of Music
Sheffield S10 2TN
Jane W. Davidson PhD
Department of Music
Sheffield S10 2TN
Background.
From the moment we are born, our voice is the instrument with which we communicate through
non-verbal vocalisations (H. Papousek, 1996). Intuitively, care-givers respond to these non-verbal
vocalisations in an interactive way, imitating, extending and developing the pitch, melodic contour,
rhythm, phrasing and volume of the infant's vocal gestures (M. Papousek, 1996; Stern, 1985). In this
way, a child learns to interact and develops in their social and emotional functioning.
Therefore, the voice, with its shifting, fluid, musical make-up provides the basic vehicle for human
communication and interpersonal relationships. However, an individual who acquires
neuro-degenerative disease faces the possibility of total loss of voice and the most primitive and
spontaneous means of communication. Although there are many augmentative communication aids
and assistive technologies now available for people who can no longer speak, the psychosocial
consequences of losing all ability to voice cannot be underestimated.
Music Therapy is the planned and intentional use of music to the meet an individual's social,
emotional, physical, psychological and spiritual needs within an evolving therapeutic relationship. In
the therapy session, the therapist and client explore the client's world together, basing all interaction
on the client's musical utterances or musical preferences. This forms the basis for the therapeutic
relationship.
Within the clinical literature with a neuro-degenerative population, particular focus has been given to
the use of the music for emotional expression and personal interaction skills (Magee, 1995a&b;
Brandt, 1996; O'Callaghan and Turnbull, 1987 & 1988; O'Callaghan and Brown, 1989) and life
review processes through song choice and song-writing (O'Callaghan, 1984, 1990, 1995, 1996, 1999).
Although music therapy programmes have also aimed to improve functional speech through rhythmic
speech drills and singing (Erdonmez,1976; Crozier and Hamill, 1988), there has been little detail
given to the role singing in therapy may play in the holistic social, emotional and physical needs of
the patient who faces gradual voice loss as part of their degenerative disease process.
Aim.
file:///g|/Wed/Magee.htm (1 of 6) [18/07/2000 00:37:13]

This paper presents a single case study taken from a larger study investigating music therapy in
chronic neurological illness (Magee, 1998). This case study explores the experience of the physical
act of singing in the therapeutic process for an individual living with chronic degenerative illness.
Method.
A group of adults with Multiple Sclerosis were recruited from multidisciplinary referrals and
self-referrals at a residential and day care facility for complex neuro-disability. Participants received
individual music therapy from a qualified, state registered music therapist as part of a wider clinical
programme. The music therapist was the primary researcher and so worked as a participant researcher
for the study.
The music therapy sessions took place weekly for a period of approximately six months for each
participant. The session format included active participation in exploring instruments, joint clinical
improvisation with the therapist and singing songs of the participant's choice which had particular
meaning to them. Discussion of the musical material or personal material relating to it was included in
the session if the participant indicated a desire to do so.
Primary data were collected in the form of focussed interviews held after sessions by the
therapist/researcher. Secondary sources of data included the verbal, musical and behavioural
responses from sessions, as well as open coding analytical notes made during transcription of the
interviews. Three forms of data therefore emerged from the process.
A modified grounded theory paradigm was used to analyse data employing the steps of open and axial
coding (Strauss & Corbin, 1990). Trustworthiness was gained through prolonged involvement,
persistent observation, long-standing clinical experience with this population, and peer debriefing
with the multidisciplinary team. Triangulation was implemented on several levels. Ongoing analysis
of the clinical material was taken to an independent music therapy supervisor whose theoretical
framework differed from the therapist/researcher's, offering alternative interpretations of events to
those made by the researcher and thereby enhancing objectivity. This process was also implemented
with selections of the interview analyses, using an independent auditor familiar with therapeutic
theory. Case-study design was used to report the findings.
Results.
Open coding of the data found that individuals overtly or subtly monitored the changes in their
physical, vocal or cognitive functioning resulting from their disease process. This phenomenon was
entitled 'Illness monitoring'. This action included individuals describing a particular ability in different
situational contexts, comparing one's ability with others', monitoring the type of change experienced,
the extent of any change, and making temporal comparisons of 'now' to 'before'.
Individuals often consistently assessed different aspects of their own physical, cognitive and vocal
functioning in relation to those around them, who lived on the same ward. Others with whom general
living space was shared may have had the same diagnosis, but may have been in a more advanced
stage of the disease. Monitoring change in this way served to increase awareness and self-knowledge
thereby regaining some sense of control. Furthermore, by increasing self-knowledge, one was better
prepared to employ strategies for dealing with the emotional consequences of a negative change in
abilities.
A single case study will be used to illustrate the results of axial coding, examining the particular
phenomenon of 'vocal monitoring'.

Case study: 'Jack'.

Jack was a Caucasian male in his late 50's who had a relapsing-remitting and chronic progressive form
of Multiple Sclerosis. He had lived in a continuing care ward at the facility for approximately five
years prior to this study, and had been diagnosed nine years prior to this study. Although he was
wheelchair dependent, an electric wheelchair enabled him to be independent in his mobility around
the confines of the hospital once he was in his chair. He had functional use of one hand and arm, and
was able to communicate clearly and effectively using speech.
Jack presented as coping adequately in all of his social interactions, although in reality he was rather
isolated from his family and had a medical background of depression and anxiety. He self-referred to
music therapy, being very eager to find a place where he could sing the songs which were of
particular importance to him. He saw music therapy as a place where he could experiment with his
voice to sing 'his' special songs. These particular songs included 'Ol' Man River', 'Some Enchanted
Evening' and 'What a Wonderful World'. It emerged in therapy that he used a variety of strategies to
cope with the changes in his life caused by his illness, and he rarely lowered his particularly resistant
coping front. His use of songs appeared to perpetuate this somewhat, as he talked about the meaning
the lyrics held for him.
Through the process of triangulation of the interview analyses, session evaluations and clinical
supervision, it became evident that the act of 'singing' bore a deeper meaning for this participant
which was not initially evident. In his daily life, he was surrounded by people on his ward who had
lost all use of their voices, communicating through augmentative communication aids or through eye
blinks for 'yes/no'. Some had no means of communication at all despite the wide range of assistive
technologies available. Jack perpetually referred to his voice and throat within sessions. It emerged
that, for Jack, music therapy was a physical activity in which he monitored his disease process
through 'vocal monitoring'.
Each week when Jack chose to sing songs which held particular personal meaning. After singing, he
would always engage in a critique of his voice, registering any changes in breath control, quality in
vocal production, range of pitch and dynamic range achieved. In every session, he referred to his 'sore
throat', 'croaky voice', 'cold', 'hay fever' or 'virus' in relation to how he was singing. He made
comparisons in his voice production between sessions or situations, such as 'in this chair' or 'in bed'.
He sought reassurances from the therapist to compare his voice from previous weeks. He not only
monitored the quality of vocal sound produced, but also the depth of his breathing and the duration of
notes he could hold. Jack did not overtly discuss his vocal monitoring, and in fact gave many reasons
as to why his voice may have changed.
Although Jack derived the greatest meaning in sessions from singing his particular songs, it appeared
tremendously difficult for him to overcome the physical experience of singing and allow himself to
engage emotionally with the songs he sang. The occasions when he monitored his vocal production
were marked by a lack of emotional engagement with or meaning attributed to the music. When he
experienced the music-making as physical, he measured a higher degree of change in his voice by
making temporal comparisons. Other important concepts which emerged in his experience were
feelings of success, ability and skill attributed to the activity of singing. A negative experience of
vocal monitoring was associated with lower levels of success and ability and lesser degrees of skill.
When Jack was able to engage emotionally with the music, singing became a less physical experience,
with lesser evidence of vocal monitoring taking place. Greater emotional engagement occurred when
he held stronger associations with songs. On these occasions, singing became a more emotionally
meaningful experience during which he did not monitor the changes experienced in his vocal

production. From the larger analysis of the group's data, it was found that individuals drew on coping
strategies to a greater degree when the individual felt a lower sense of control, a higher sense of threat,
and a greater sense of confrontation by their disease. Drawing on coping strategies in this way served
to mask deeper responses to the disease process. It can therefore be assumed that the process of vocal
monitoring stimulated emotional responses for Jack which he felt the need to control. When he was
able to engage more emotionally with the music however, vocal monitoring was less likely to take
place.
Conclusions.
This case study highlights the importance which singing can hold for an individual with chronic
degenerative disease. However, the meaning of singing found in this study does not support the ideas
put forward in previous music therapy literature. Song themes were not a primary facilitative factor in
Jack's use of song, as has been stated in previous music therapy literature reporting on the use of
song-based techniques. Despite his illness process and the difficulty which Jack was experiencing in
vocalising, he used the songs within his music therapy as a way to defy his illness process. Certainly
he gained greater meaning in life through his act of singing songs within a therapeutic relationship. In
reality, Jack died of pneumonia and respiratory failure two years after his participation in this study
finished. Retrospectively, it is apparent that his experience of singing his songs represented life's
breath running through him. The continual referral to his breathing, throat and voice, on reflection,
indicate a high level of anxiety which he was attempting to conceal.
Considering theoretical frameworks offered by health sociology, through the act of singing Jack was
testing the physical limits of his body and making comparisons in terms of temporal and situational
parameters (Corbin and Strauss, 1987). In this way, he achieved greater senses of independence, skill
and ability, which helped to shift his sense of identity using the therapist to validate his performance.
The phenomenon 'Illness monitoring' which emerged in this study has elsewhere been entitled the
'dialectical self' (Charmaz, 1991). This phenomenon, like illness monitoring, involved taking the body
as an object, appraising it, and comparing it with the self in different temporal and situational
frameworks.
Active involvement in music therapy through singing facilitates a physical expression in which
individuals explore their remaining physical capabilities. Through sustained exploration of their own
individual physical change and loss, the physical experience becomes an intensely emotionally
charged one relating directly to aspects of the illness identity. It is imperative for the music therapist
working with this population to understand that through physical monitoring during the act of singing,
individuals with chronic degenerative illness may become more acutely aware of their emotional
responses to their illness process.
References.
Brandt, M. (1996). "'This is my life.' Songwriting and song interpretation with Huntington's patients."
In: Smeijsters, H. and Mecklenbeck, F. (Eds.), Book of Abstracts, of 'Sound and Psyche' 8th World
Congress of Music Therapy, Hamburg, 1996, p. 216. Druck: Service-Cruck Kleinherne, Dusseldorf.
Charmaz, K. (1991). Good Days, Bad Days. The Self in Chronic Illness and Time. New Brunswick:
Rutgers University Press.
Corbin, J. & Strauss, A. (1987). Accompaniments of Chronic Illness: Changes in Body, Self,
Biography, and Biographical Time. In Roth, J. & Conrad, P (Eds.), Research In the Sociology of
Health Care: A Research Annual. The Experience and Management of Chronic Illness, 6, 249-281.
London: JAI Press Inc.

Crozier, E. and Hamill, R. (1988). The benefits of combining speech and music therapy. Speech
Therapy in Practice, November, 9-10.
Erdonmez, D. (1976). The Effect of Music Therapy in the Treatment of Huntington's Chorea Patients.
Proceedings of the 2nd National Conference of the Australian Music Therapy Association
Incorporated, 58-64.
Magee, W. (1995)a. 'Case studies in Huntington's Disease: music therapy assessment and treatment in
the early to advanced stages.' British Journal of Music Therapy, 9(2) 13-19.
Magee, W. (1995)b. 'Music Therapy as Part of Assessment and Treatment for People Living with
Huntington's Disease'. In: C. Lee (Ed.), Lonely Waters: Proceedings of the International Conference
Music Therapy in Palliative Care, 173-183. Oxford: Sobell Publications.
Magee, W. (1998) 'Singing my life, playing my self'. Investigating the use of familiar pre-composed
music and unfamiliar improvised music in clinical music therapy with individuals with chronic
neurological illness. Unpublished doctoral dissertation, University of Sheffield, UK, #9898.
O'Callaghan, C. (1984). Musical Profiles of Dying Patients. Australian Music Therapy Association
Bulletin, 7(2), 5-11.
O'Callaghan, C. (1990) Music therapy skills used in song writing within a palliative care setting.
Australian Journal of Music Therapy,1, 15-22.
O'Callaghan, C. (1995). Songs Written by Palliative Care Patients in Music Therapy. In: C. Lee (Ed.),
Lonely Waters: Proceedings of the International Conference Music Therapy in Palliative Care, 31-40.
Oxford: Sobell Publications.
O'Callaghan, C. (1996). Lyrical Themes in Songs Written by Palliative Care Patients. Journal of
Music Therapy, 33(2), 74-92.
O'Callaghan, C. (1999). Lyrical Themes in Songs Written by Palliative Care Patients. . In Ed. D.
Aldridge, Music Therapy in Palliative Care, 43-58. London: Jessica Kingsley Publishers.
O'Callaghan, C. and Brown, G. (1989). Facilitating Communication with Brain Impaired Severely Ill
People: Using Neuropsychology and Music Therapy. Presented at N.A.L.A.G.'S Sixth Biennial
Conference, Melbourne, September 1989.
O'Callaghan, C., and Turnbull, G. (1987). The Application of a Neuropsychological Knowledge Base
in the Use of Music Therapy With Severely Brain Damaged Adynamic Multiple Sclerosis Patients.
Proceedings of the 13th Conference A.M.T.A., Melbourne, 92-100
O'Callaghan, C., and Turnbull, G. (1988). The Application of a Neuropsychological Knowledge Base
in the Use of Music Therapy With Severely Brain Damaged Disinhibited Multiple Sclerosis Patients.
Proceedings or the 14th Conference A.M.T.A., Adelaide, 84-89
Papousek, H. (1996) Musicality in infancy research: biological and cultural origins of early
musicality. In: Deliege, I. & Sloboda, J. (Eds), Musical Beginnings: Origins and Development of
Musical Competence, 37-55. Oxford: Oxford University Press.
Papousek, M. (1996) Intuitive parenting: a hidden source of musical stimulation in infancy. In:
Deliege, I. & Sloboda, J. (Eds), Musical Beginnings: Origins and Development of Musical
Competence, 88-112. Oxford: Oxford University Press.

Stern, D. (1985) The Interpersonal World of the Infant. New York: Basic Books.
Strauss, A. & Corbin, J. (1990). Basics of Qualitative Research. Grounded Theory Procedures and
Techniques. Newbury Park: Sage Publications, Inc.
Authors' note.
Wendy L. Magee BMus PhD ARCM SRAsT(M) is Head of Music Therapy at the Royal Hospital for
Neuro-disability, London, holding a clinical post as a music therapist working with adults with
acquired and complex neuro-disability, and developing research projects with this population. This
research is part of doctoral research undertaken whilst registered at the Department of Music,
University of Sheffield. Jane W. Davidson BA PGCE MA PhD Cert. Counselling is Senior Lecturer
in Music at the Department of Music, University of Sheffield. She is editor of the international journal
Psychology of Music and has researched on a wide range of topics from self and identity in singers
through to expressive body movement and piano performance, having over 50 publications to her
name in international peer-reviewed journals. Besides researching, she teaches a wide range of
courses and is an active performer, artistic director and producer.
The authors would like to acknowledge the Living Again Trust, the John Ellerman Foundation, the
Juliette Alvin Trust and the Music Therapy Charity who all contributed to funding this project. The
author also would like to thank the research participants who took part in this study. The Royal
Hospital for Neuro-disability received a proportion of its funding to support this paper from the NHS
Executive. The views expressed in this publication are those of the authors and not necessarily those
of the NHS Executive.
Address for correspondence: Dr. Wendy L. Magee, Music Therapy Department, Royal Hospital for
Neuro-disability, West Hill, London SW15 3SW, UK
Back to index

Children's Assessments Of Their Own Compositions
Proceedings paper

Liz Mellor
l.mellor@ucrysj.ac.uk
This paper draws from my own PhD research on assessing pupils' compositions. Within the National
Curriculum in England composing is a statutory part of the programmes of study for Music. From
September 2000 the revised orders for music introduce level descriptors to be used by teachers to best
fit a pupil's performance at the end of Key Stage 1 (age 7), Key Stage 2 (age 11) and Key Stage 3 (age
14).
From the original implementation of the Music National Curriculum, to subsequent revisions in 1992,
followed by the Dearing revision (1995) and the recent simplification (1998), the expectations at each
of the three Key Stages have narrowed. This is presented in the form of a sequential progression
through the elements of music (pitch, duration, dynamics, tempo, timbre, texture and structure). Thus
:
Key Stage 1 pupils should be 'recognising the musical elements'
Key Stage 2 pupils should be 'distinguishing the musical elements'
Key Stage 3 pupils should be 'discriminating within and between the musical elements'
In 'Teaching Music in the National Curriculum' Pratt and Stephens (1995) presented this model in the
form of a table to indicate progression within each element. According to this, it follows that : pupils
would first talk about loud, quiet and silence before going on to talk about gradating levels of volume,
and before progressing to recognise subtle differences in volume. This implies that musical conceptual
understanding is developed though an increasingly discriminatory vocabulary based on the principle
of quantitative addition. As Swanwick (1996) argues, most of the Key Stage statements in the Music
National Curriculum document are essentially quantitative in character rather than qualitative; he
urges 'we need to have criterion statements to pick up these qualitative shifts' (p. 34).
The theoretical framework draws from several research projects which have tried to map out these
qualitative shifts. For example, Swanwick and Tillman's model (1986) was adapted by Swanwick to
suggest a basis for establishing criteria for assessing composition and more recently for assessing
performance and listening (Swanwick, 1988, 1996, Hentschke, 1993). Other researchers such as Flynn
and Pratt (1995) use a 'bottom-up' approach. This type of approach sought to make explicit the criteria
which the teachers identified in making such qualitative shifts.The DELTA Project (Development of
Learning and Teaching in the Arts) conducted by Hargreaves, Galton and Robinson (1996) devised a
methodology which claimed to make explicit the implicit criteria which teachers use to make
file:///g|/Wed/Mellor.htm (1 of 11) [18/07/2000 00:37:16]

judgements about children's products. The findings for music made ground in developing a language
of assessment. Hargreaves et al. (1996) provided a five phase model which incorporated domain
specific as well as general cognitive aesthetic developments. The researchers reported the
developments in terms of phases so as not to be confused with Piagetian 'stages'. The five phases are
denoted as sensorimotor , figural, schematic, rule systems and professional. The model draws from
current domain specific research and acknowledges its somewhat sketchy form at the stage of writing
as reflecting the parsity of research in this area. Nevertheless, it provides some interesting insights
which draw together psychological research into aesthetic development. Rather than attempting to
draw out level descriptors Hargreaves and Galton's model sketches phases of development which
invite further research and 'real-world' application.
This paper draws from that part of my earlier research which set out to exemplify a 'real world'
application. It was also set in the climate of the raised profile of literacy across the curriculum taking
into account the reports on the use of Language within the Common requirements of the National
Curriculum (SCAA, 1997) and their attempt to provide a way forward for the role of language in
music education:
Teachers are encouraging the use of technical vocabulary with greater confidence but more help is
requested with regard to the musical vocabulary which should be taught at each key stage. (SCAA
1998, Section 11)
Teachers in all key stages need guidance on subject knowledge and how this knowledge can be
integrated in practical work including the development of aesthetic awareness and musical
vocabulary. (SCAA, 1998, Summary)
One objective of the research was to investigate how children used the language of the musical
elements as defined by the English National Curriculum as pitch, duration, dynamics, tempo, timbre,
texture and structure. I was interested to see how the terms were used by the children and to what
extent these revealed qualitative shifts in their conceptual understanding of their own and their peers'
compositions.
Methodology
The research included 154 children, 78 girls and 76 boys aged 9-13 years. The sample was taken from
Years 5-6 in the Upper Primary School (Upper Key Stage 2: ages 9-11) and Years 7-8 of the
Secondary School (Key Stage 3; ages 11-13). The research task was designed as part of the children's
curriculum music sessions led by the teacher (new in post) who was also the researcher. The children
had been asked to compose 'what they thought made a good tune'. The design took account of the pilot
study research in the following ways.
First, the composing task was presented in an open-ended way. To avoid influencing the listening
responses the task was not directed in a series of sequential stages. Second, the starting point was a
melodic composition; as such, it did not have an extra-musical referent. Third, the composing activity
was organised on an individual basis and not in groups, thereby allowing each composition to be
identified with each pupil's individual response. Fourth, the school was fortunate to have sufficient
keyboards for the children to share one between two. Although there may have been some pair-work
influence the children used separate headphones and were asked to work individually on their own
tunes. Keyboards were used as a means of 'controlling' the sound sources used, so that the children's
subsequent responses were not limited to a simple recognition of the instrument. They also proved to
be a highly motivating sound source across the sample age range. For the purposes of the research the
pupils were asked to choose their own sound from the sound bank but not to use a rhythm or beat

accompaniment. Note names and/or staff notation could be used as a means to map out the tune for
performance but this was not obligatory. The children worked on their tunes in 3 x 50 minute lessons
and performed and recorded their compositions onto audio tape.
The design took into account that for many children composing was a new experience. Some had
more experience of playing an electronic keyboard than others. The research also acknowledged that
some children had piano skills and that this would influence the musical outcome. However, it was
considered that the task was equally accessible for all children, open-ended enough to allow
individual approaches and age appropriate (in both the instruments used and also the type of task).
Equally, the task fulfilled the requirements of the Music National Curriculum and was presented in
such a way as to encourage pupil ownership of the learning which was synonymous with the school's
philosophy of education.
In the final week the children were invited to appraise their compositions. For research purposes the
design investigated the childrens' listening responses to their own composition and to those of their
peers. In practice the children listened to the recording of their classes' compositions and, in the pause
in between each piece, they gave a mark out of 10 and wrote a reason for their choice on a given pro
forma. The results were collated and used as a basis for both quantitative and qualitative analysis.
Data Analysis
In order to analyse the data a coding scheme was developed to categorise the content of written
listening responses. An initial survey of the data produced 22 categories of response. These were
subsequently reduced to five broad categories as follows :
1. Musical Elements
Responses in this category refer to the elements of music as defined in the Music National
Curriculum. They include references to: Pitch, Duration, Dynamics, Tempo, Timbre, Texture,
Structure. Responses in this category might include for example, 'it was loud', 'it was short', 'the notes
went up and down', 'it had a hollow sound', 'it repeated'.
2. Style
In this category are responses which make stylistic references, for example, 'it sounds classical', 'it
sounds like Jazz', 'it sounded Japanese'.
3. Mood
Responses in this category indicate an affective response to the music, for example, 'it made me feel
happy', 'it was depressing', 'it was spooky'.
4. Evaluation of Composition
Responses in this category demonstrate an evaluative statement of the composition itself, for example,
'it was good', 'it was well put together'.
5. Evaluation of Performance
Responses in this category refer to an awareness of the qualities of the performance, for example, 'he
missed a note', 'it was played well' .
An independent rater and myself performed a reliability study which categorised a sample of the
responses. The results were correlated using statistical measures and showed that the coding scheme

allowed a satisfactory level of agreement and that further analysis was justified.
Results
For the purposes of this paper I shall present a summarised version of the
qualitative results. The initial analysis involved mapping the responses into the five categories
described above as Musical Elements, Style, Mood, Evaluation of Composition and Evaluation of
Performance. I shall discuss each category in turn focusing on the language used by the children. The
extracts from the children's responses were chosen because they were representative of particular
types of response.
Musical Elements
Within this category the responses were subdivided into a further 7 subcategories which corresponded
to the musical elements within the Music National Curriculum (DFE, 1995). I shall focus on each in
turn.
Pitch
In the Music National Curriculum pitch is described from Key Stage 1-3 as :
(KS 1) high /low
(KS 2) gradations of pitch e.g. sliding up/down, moving by step/leap, names for pitch
(KS 3) various scales and modes e.g. major, minor
The children's responses revealed a range of ways of talking about pitch in relation to their tunes
showing a greater degree of aesthetic differentation to the schema presented in the Music National
Curriculum. To summarise, responses which refer to pitch show:
● children at KS 2 and 3 identify high and low and gradations of pitch, some recognise
scales; e.g.'it is a really high tune', 'I like it because it goes up and down' , 'it did sound very like
a scale'
● children show personl preferences and prefer tunes which are not too high or too
low;e.g.'it was too low for me', 'too high and boring'
● younger children prefer tunes where the pitch does not move around too much whereas
older children tolerate a greater range of pitch; KS 2 e.g. 'it was a good tune and none of the
notes clashed', 'all the notes go really well', 'the notes moved around too much'.KS 3 e.g. 'the
notes are low and go with one another', 'very interesting - big range of notes'
● movement metaphors are used to describe pitch contour; e.g.'running up and down'.
Duration
In the Music National Curriculum duration is described from Key Stage 1-3 as :
(KS 1) long/short; pulse or beat; rhythm
(KS 2) groups of beats, e.g. in 2s, 3s, 4s, 5s; rhythm
(KS 3) syncopation, rhythm
The responses divided between those which focused on the qualities of the duration as beat or rhythm
and those which focused on the duration of the tune as a whole. The children's responses which

focused on duration/beat-rhythm were considerably less differentiated than those within the
category of pitch..To some extent some of the responses might have been made in relation to the the
quality of performance as much as for the rhythmic qualities of the tune. No explicit responses
demonstrated an understanding of groups of beats per se.To summarise, responses which refer to
duration/ beat-rhythm show:
● recognition of a beat or rhythm; e.g. 'I like the rhythm' 'I liked the beat'
● a sense of rhythm which can be followed and which flows; e.g.'needs a better beat - should
flow more', 'quite bitty, rhythm hard to follow'
● rhythm in time; rhythm off or on the beat; e.g. 'out of time', ' a good off beat'.
Apart from one response, which comments on the length of the notes ('[it's] good how it is staccato'),
most of the children's responses which use the words long and short refer to duration as the length of a
the tune as a whole. In this way duration is linked to the element of structure. To summarise,
responses which refer to duration/long-short show:
● preference for tunes which are neither too short or too long; e.g. 'it was short and boring',
'short and snappy', 'it went on too long and was boring'.
● preference for longer tunes by some KS 3 children; e.g.'good flow and rhythm, but abit
short'
● appropriate duration for each particular tune; e.g. 'short and effective', 'it was nice that it
was short'.
Dynamics
In the Music National Curriculum dynamics are described within Key Stage 1-3 as :
(KS 1) loud, quiet, silence
(KS 2) different levels of volume, accent
(KS 3) subtle differences in volume, e.g. balance of different parts
Not all the keyboards were touch sensitive and so the volume was controlled at source rather than
through touch. The most marked difference in this sub-category was that the Key Stage 2 children
made far more references to dynamics than the Key Stage 3 children. From this initial analysis there is
evidence to suggest that the girls produced more responses which showed a preference for quiet music
and boys produced more responses which favoured loud music, especially in upper Key Stage 2. To
summarise, responses which refer to dynamics show:
● recognition of loud or quiet, decreases and increases in volume; e.g. 'it is quiet', 'I like the
fading out bits'
● parts of the tune which varied in volume; e.g.'it has one side soft and one side loud', 'it was
quiet and has a slight echo'
● dislike for tunes which were either too soft or too loud; e.g.'too quiet', 'too loud all a long'
● preferences which relate mood to dynamics; e.g. 'it was soft and calming'
● KS 2 produced more responses than KS 3, boys prefer loud tunes especially at KS 2; e.g.
'it was not loud enough '(boy), 'I don't really likeit because it is loud' (girl).
Tempo
In the Music National Curriculum tempo is described from Key Stage 1-3 as :

(KS 1) fast, slow

(KS 2) different speeds, e.g. lively/calm, slower/faster than;
(KS 3) subtle differences in speed, e.g. rubato
Far fewer children produced references to tempo at Key Stage 2 and girls referred to this more than
boys. However at Key Stage 3, far more young people produced references to tempo and boys
referred to this more than girls. The initial analysis also shows that the boys' responses showed a
preference faster music. To summarise, responses which refer to tempo indicate:
● whether the tune is fast or slow, changes in tempo between and within tunes; e.g. 'it was
fast', 'and it got faster', 'a slower version than Sarah's'
● tempo to define the structure of a tune; e.g.'it was slow, fast, slow', 'the beginning was OK
but the end was slow'
● tempo in relation to mood, to movement and flow; e.g.'I liked it because it was fast and
happy', 'cheerful, good speed', 'very dismal, slow', needs to flow more, maybe faster'
● appropriateness of the tempo for the tune; e.g.'fast at one time and too slow at another', 'too
fast for that type of tune'.
● KS2 produced fewer responses than KS 3: more responses by the girls at KS 2, more
responses by the boys at KS 3: boys prefer faster tunes than girls; e.g. 'it is faster and more
interesting' (boy), 'too fast, boring, (girl).
Timbre
In the Music National Curriculum timbre is described from Key Stage 1-3 as :
(KS 1) quality of sound, e.g. tinkling, rattling, smooth, ringing
(KS 2) different qualities, e.g. harsh, mellow, hollow, bright
(KS 3) different ways timbre is changed, e.g. by mute, bowing/plucking, electronically; different
qualities, e.g. vocal and instrumental tone colour
To summarise, responses which refer to timbre express :
● qualities of the sound which relate to other sound sources and instruments; e.g.'it sounded
like bottles', 'it sounded like the flute', 'it sounded like a bassoon', 'it sounded like someone
playing the sitar'
● preference for changes of the sound; e.g.'its good with lots of sounds', 'I liked the sound
effects', 'I don't think the sound was relevant to the tune'.
● qualities of the sound in terms of mood, depth and association; e.g. 'funny sound', 'weird
sound', 'heavy sound', 'ghostly sound'
● more responses by the girls than boys;
● more KS 2 responses than K2 3.
Texture
In the Music National Curriculum texture is described from Key Stage 1-3 as :

(KS 1) several sounds played or sung at the same time/one sound on its own
(KS 2) different ways sounds are put together e.g. rhythm on rhythm; melody and accompaniment;
parts that weave, blocks of sounds, chords.
(KS 3) density and transparency of instrumentation; polyphony and harmony
There were very few responses in terms of texture and this can be accounted for by the nature of the
composition task. This was essentially a linear melodic construction and did not require more than one
part at once. Some children used chords to accompany their melodies and some responses reflect this
e.g. 'long, with nice chords', 'the chords go well together'.
Structure
In the Music National Curriculum structure is described from Key Stage 1-3 as :
(KS 1) different sections, e.g. beginning middle end, repetition e.g. repeated patterns, melody, rhythm;
(KS 2) different ways sounds are organised in simple forms, e.g. question and answer, round, phrase,
repetition, ostinato (a musical pattern that is repeated many times), melody;
(KS 3) forms based on single ideas e.g. riff, forms based on alternating ideas e.g. rondo, ternary,
forms based on developmental ideas e.g. variation, improvisation.
The responses showed that the pupils perceived structure in a number of ways.
To summarise, responses which refer to structure show:
● attention to beginnings and endings more than to middle events; e.g.'I liked the beginning
bit', 'good start', 'it finished well'.
● extra-musical associations and musical events within the structure; e.g.'at the start it is a bit
creepy and heavy', 'in the beginning it sounds like birds'
● structure used to locate one or more particular musical events within the same tune;
e.g.'in the middle it was a bit of a copy', 'a tiny difficulty in the middle'
● perception of simple/complicated structures, in terms of repetition, change and pattern;
e.g.'a bit repetitive to begin with', 'he mostly uses the same keys', 'it kept on continuing itself
forever and forever', 'it didn't change much and had no variation', 'he just repeated', 'it changed a
lot which was good'
● that KS 3 pupils are more aware of the structural process i.e. how a tune is built up and
its effectivieness; e.g. 'he should have added more in between', 'it was like it was gradually
building up', 'it was put together well', 'plain but good', 'its simple but it has something to it',
Style
Children's style sensitivity is represented in a number of ways. To summarise, responses in this
category refer to:
● chronology; e.g.'like a 1900's' tune', 'a bit old','medieval'
● musical features from the music of other countries; e.g.'it sounded Chinesy', 'it sounds good
like Indian Music', 'sounds Japanese', 'sounds oriental', 'I like the Caribbean beginning', 'sounds
very Egyptian'.
Children who responded in this way have picked out a quality in the sound, such as the use of the

Indian sitar in the sound bank, or a musical feature, such as the intervallic pitch relationships in the
'Egyptian tune', or the syncopated rhythm of the 'Caribbean tune'. As they do not yet have the
vocabulary to describe the specific musical features, they use stereotypes.
● particular musical styles, qualities of style and style preference; e.g..'sounded like jazz',
'very into rock music', 'quite classical'. 'it sounded jazzy', 'it was funky', 'it has a swinging beat'
● styles associated within the media, with other songs, styles which would be appropriate for
films, style similarity between peers; e.g.'it was obviously copied from a nursery tune', 'it was
copied off a pop song that came out recently', 'definitely heard it on TV before', 'mix of When
the Saints and London's Burning'. 'it's like a computer game', 'it sounds like its out of a cartoon',
'sounds like the beginning of a TV programme', 'sounds like a cat food advert', 'it struck me as
something out of a film', 'like space music', 'something out of a Walt Disney Film', 'like
something out of a fairy tale', 'like something out of a child's detective movie', 'too much like
the Snowman', 'like the Little Mermaid', 'something from Maid Marion', 'like something out of
Grease', 'like something out of Bugs Bunny', 'nearly the same as Jai's'.
● identification of tunes for and from a particular contexts; e.g.'like something from a circus
or a fair', 'I liked it because it reminded me of a church', 'for a horror movie', 'good for a play',
'sounds like a disco', 'sounds like a piece for ballet', 'it reminded me of a holiday'.
Whereas at Key Stage 2 style responses were dominated by references to film, video and TV, Key
Stage 3 pupils' perception of style related to personal experience, preference and identity.
Mood
Children responded in this category in a number of ways. To summarise, responses in this category
include:
● positive and negative moods; e.g.'it was jolly and fun', 'a nice happy tune', 'fun and
entertaining'
● responses where the listener identifies with the mood; e.g.'I liked it because it was a happy
tune it's a tune that makes me feel lonely', 'that tune makes me feel jolly', ' it makes you feel
good'
● responses where the tune is identified as having a mood relating to atmosphere or to
movement qualities; e.g.'OK and very calm', 'a relaxing piece of music', 'it sounds restful', 'its
lively','lumpy and springy', 'jerky but good', 'it's got bounce', 'I love this its so bouncy and great',
'like an old man walking', 'like a fairy dancing', 'like someone diving'
● recognition of a change of mood and juxtaposition of two mood states; e.g.'it goes scary,
then normal', 'sweet and catchy', 'funny and strange'
● moods relating to the 'life' and 'feeling' of a piece; e.g.it has a nice feel to it', 'it doesn't have
any life', 'it was good and full of life', 'she should have added a bit more spice to the tune', 'it
was playful'.
Evaluation of Composition
To summarise, responses in this category include :
● value judgements and qualified value judgements; e.g.'it is good', 'I like it', 'it is my kind of
music', 'not to my taste', 'I liked the beat'.
As above, many children justified their preference by valuing one or more aspects of the tune in the
categories of Musical Elements and Style.

● responses referring to the 'whole' fit and of a tune; KS 2 e.g. 'I didn't like this because it
bumped', ' it was a bit rickety', 'it was a bit wobbly'. KS 3 e.g.it doesn't fit together properly', 'it
didn't mix well', 'a bit unstable', 'it sounded together', 'synchronised', 'it was in place'
● responses referring to the quality of thinking/organisation of ideas and expectations
perceived in the music; KS 2 e.g. 'it was a bit messy', 'it was a bit muddled'. KS 3 e.g.'it was
well organised and she knew what she was doing', 'good and well organised', 'well thought up
and practised''too random', 'he didn't know what he was doing', 'it sounded like it was made up',
'it's just anything that he is playing',
● responses valuing to originality, imagination and creativity; e.g.'he was being quite
imaginative', 'very creative and good'.
Interestingly the focus of the responses showed that the younger children were more likely to express
themselves in terms of whether the piece was copied whereas the older Key Stage 3 children were
more concerned with the quality of originality, difference, imagination and creativity.
● responses which 'match' with the pupils' view; e.g. 'it suited her', 'not as good as I thought it
would be'
● responses referring to the expectations in the music; e.g. 'it was too predictable', 'you sort of
knew which note would come next', 'a bit off course towards the end'
● the more intuitive aesthetic judgements and comments on the 'properness' of the music at
KS 3; e.g. 'started brilliant - like a proper piece', 'good but no proper ending', 'very
professional', 'it sounded like a real tune' 'I just enjoyed it so much as it was getting to the point
of a piece'.
Evaluation of Performance
Children produced different types of response in this category. To summarise, responses in this
category refer to :
● how well the tune was played, identification of mistakes; e.g. 'very good, well played', 'good
but a few mistakes', 'it has few jolts and didn't go well'
● practice and technical mastery of the instrument; e.g.'could be played better', 'very good, but
I think she practised'
● differentiation between composition and performance; e.g.'it was skilfully played and
composed', 'it was a good effort and it came out well'
● differentiation between technical demands and the experience of the player; e.g. 'she
stumbled a bit but it was quite good', 'very good except for the slip at the start', '[it was] dull but
well played'.
Discussion
To a certain extent the responses in each sub-category confirm the way the Music National
Curriculum defines the increasing levels of discrimination within each musical element i.e. pitch,
duration, dynamics, tempo, timbre, texture and structure across Key Stages 1-3. The results therefore
provide verbal evidence of how children listen and appraise music in relation to the English Music
National Curriculum .
However, the results also show ways in which the children's musical understanding becomes
increasingly differentiated both within each sub-category and between categories of perception,

beyond the definition presented within the documentation of the Music National Curriculum. This
gives a more detailed picture of how children use language in their responses in each of the
subcategories and more particularly shows us what they value. Responses also change across the
categories, within the categories and with respect to age and gender, and leads to a fuller picture of the
qualitative shifts in the conceptual understanding of music. This is an area for future research.
The results also reveal that responses which show an absence of technical vocabulary may
nevertheless communicate a sense of the music. However, some responses use technical statements
inappropriately e.g. 'out of tune' was used of the intervallic range within the pitch contour. The
conclusion to be drawn is that the use of technical vocabulary may not be evidence of musical
understanding.
Another consideration in the analysis is how far the responses were influenced by musical expertise
and peer group issues of perceived musical expertise, status, friendship and competition. This is
illustrated by examples from the qualitative data which take into account biographical and social
observations of the children (Mellor, 1999). For example, the experienced pianist responds with a
voice of expertise : 'could have practised more', 'original and good for someone who doesn't play the
piano'. The saxophonist with an experience of playing jazz responds using a phraseology common to
jazz style e.g. 'doesn't make the most of the rests, needs to sit back on the beat'.
From my experience as teacher and researcher some responses reflect the relative social status of the
children within the class. The use of the term high/low status is defined by my observation of how the
children interacted and whom they held in esteem amongst their peers. The qualitative analysis
therefore reveals that additional factors need to be taken into consideration. The particular value of a
teacher/researcher is the ability to analyse internal social hierarchy within classes which produces
another level of subjectivity beyond that of gender.
As the full research shows, whilst some responses share characteristic patterns or phases of
development, concerned with issues of recognition, conformity, appropriateness, originality and
reflection, listening responses show individualised profiles which are mediated by the listening
context and the social structure of the group. The question for policy makers and music education
research must be how to integrate these observations into the assessment model. Whilst general
guidelines may be welcomed, over simplification as presented by the Qualifications Curriculum
Authority (2000) might be limiting and misleading. In seeking to 'level out' the types and range of
performance that pupils demonstrate I hope we don't inadvertently 'level out' the richness of this
inquiry which is still a largely uncharted territory.
References
Department of Education and Science (1991) Music for ages 5-14. London: HMSO.
Department of Education and Science (1992) Music in the National Curriculum. London. HMSO.
Department for Education (1995) Music in the National Curriculum, London: HMSO
Flynn, P. and Pratt, G. (1995) Developing an understanding of appraising music with practising
primary teachers. British Journal of Music Education, 12, 127-158.
Hargreaves, D.J., Galton, M.J. and Robinson, S. (1996) Teachers' assessments of children's classroom
work in the creative arts. Education Research, 38, 2, 199-211.
Hentschke L. (1993) Musical development; testing a model in the audience-listening setting.
Unpublished PhD Thesis. Institute of Education: University of London.

Mellor L. (1999) The Language of Self- Assessment: Towards aesthetic understanding in music. in E.
Bearne (Ed.) Use of Language across the Secondary Curriculum..London: Routledge.
Pratt, G. and Stephens, J. (1995) (Ed.) Teaching Music in the National Curriculum. National
Curriculum Music Working Group. Oxford: Heinemann.
Qualifications and Curriculum Authority (1998) Breadth and Balance in the National Curriculum
School Curriculum Assessment Authority (1997) Music and the Use of Language at Key Stage 3,
London: SCAA.
Swanwick, K and Tillman, J (1986) The sequence of musical development. British Journal of Music
Education 3, 305-39.
Swanwick, K. (1996) Music before the National Curriculum. In G. Spruce (Ed.), Teaching Music.
London: Routledge.
Back to index

7
7. ON THE PERSISTENCE OF METRICAL PERCEPTS

Edward W. Large, Center for Complex Systems, Florida Atlantic University, 777 Glades Road, P.O.
Box 3091, Boca Raton, FL 33431-0991, USA
Background. A fundamental concept underlying the theory of meter perception is that of stability. A
metrical accent structure, once perceived, may persist despite significant rhythmic complexity.
Rubato, syncopation, and even silence may be accomodated without disrupting an ongoing structural
interpretation. However, certain types of rhythmic changes force a reorganization of the perceived
accent structure. In brief, metrical percepts are stable, yet flexible, accomodating certain kinds of
changes and not others.
Aims. The aim of this ongoing work is to articulate a mathematical model of meter perception that
captures both the stability and the flexibility of meter perception. Building upon earlier theoretical
formulations of beat perception using entrained oscillation, I model meter perception using pattern
forming dynamics defined over a network of neural oscillators.
Main Contribution. This current work makes two main contributions. First, it represents a move from
discrete, single-oscillator models of beat perception to a continuous-time, multi-tiered model of meter
perception. Second, the model is analyzable, so it makes clear predictions about several significant
features of meter perception. For example, it predicts how fast a beat will be induced and what
pattern(s) of metrical accentuation will be extrapolated from a given sequence. It predicts how much
deviation will be accomodated within a given metrical scheme and how much is sufficient to force
structural reinterpretation.
Implications. This talk will focus on the implications for testing this model of meter perception. I
describe how the model may explain existing data in the literature, and I make some suggestions for
future tests.
Keywords: Metrical structure, nonlinear dynamics, pattern formation, oscillation
file:///g|/Wed/Large.htm [18/07/2000 00:37:17]

Dr Ian Cross
MUSIC IN HUMAN EVOLUTION
Dr Ian Cross
ic108@cus.cam.ac.uk
Background:
The theory of natural selection constitutes the ontological core of many recent
theories of biology, mind and culture. From the perspective of evolution, music
has been variously appraised as a significant agent in mechanisms of group
selection, as an elaborate means of mate selection, as an activity that is
parasitic on other, more adaptive behaviours and as an entirely hedonistic,
potentially maladaptive, optional extra in human evolution and behaviour.
Aims:
This paper re-examines the notion of "music" in the context of human evolution
and development, drawing on evidence from studies on infant capacities and
behaviours, from cross-cultural studies of musical behaviour, from theories in
cognitive archaeology and on the archaeological record to suggest that music
may have played a significant role in human evolution and still plays such a
role in cognitive development.
Main contributions:
"Music" as it is characterised here arises from a synthesis of ideas from the

diverse fields of cognitive science, ethnomusicology, critical theory,
developmental psychology, evolutionary psychology and archaeology;
re-evaluation and integration of these ideas permits the emergence of a notion
of music that is inclusive yet sufficiently circumscribed to serve as a focus
for empirical study across these different disciplines.
Implications:
The ideas presented here have implications that are scientifically pragmatic,
in that they present hypotheses that are intended to be empirically testable,
and political, in that they suggest a possible basis for the valorisation of
musical activity in contemporary cultures.
Back to index
file:///g|/Wed/Cross2.htm [18/07/2000 00:37:18]

\Trehub
MUSICAL EMOTIONS: A PERSPECTIVE FROM DEVELOPMENT
Sandra E. Trehub and Takayuki Nakata
University of Toronto at Mississauga
Mississauga, Ontario L5L 1C6
CANADA
sandra.trehub@utoronto.ca
Background: In recent years there has been increasing interest in
mothers\rquote vocal behaviour with prelinguistic infants. Not only do mothers
talk to their non-comprehending infants; they sing to them as well. The
apparent parallels between maternal speech and song to prelinguistic infants
may stem from the principal goals of such interactions, which are related to
modulation of infant arousal and attention.
Aim: This paper distinguishes between the broad parallels that have been made
across investigations of maternal speech and singing and those that are
emerging from direct comparisons of speech and singing within the same
mother-infant dyads.
Main contribution: Some of the reported parallels between maternal speech and
singing break down on closer inspection. Nevertheless, the differences shed
light on the nature and function of such vocal interactions. For example, the
style of maternal singing is relatively stable over considerably longer periods
than is that of maternal speech. Moreover, the two types of vocal interaction
have distinct consequences for infant attention. When infants watch and listen
to their mothers sing, they seem to be mesmerised, as reflected in prolonged
gaze at mother and relative passivity. By contrast, episodes of maternal speech
lead to intermittent looking at mother but to greater vocal and gestural
responsiveness.
Implications: Maternal speech and singing each serve important but

complementary caretaking functions. Singing is considerably more effective in
alleviating infant distress and prolonging infant contentment. There are
suggestions, however, that patterns of intermittent attention are associated
with enhanced learning in infancy compared to sustained attention. Thus
maternal singing may promote emotional goals while speech promotes
informational goals, even in the prelinguistic period.
Back to index
file:///g|/Wed/Trehub.htm [18/07/2000 00:37:18]

Roger A Kendal
Musical timbre beyond a single note: tessitura and context
Roger A. Kendall
kendall@ucla.edu
Abstract
The majority of research in timbre has involved a single note drawn from the
middle range of the group of instruments studied. A common finding among such
studies is a strong mapping (r>.89) between long-time-average spectral centroid
(spectral center of gravity) and the principle dimension in multidimensional
scaling analysis of perceptual similarities when continuant steady-states are
considered. Another acoustical variable, often mapping to the second dimension,
is spectral flux or varaiability. In fact, it has been shown that when
synthetic emulations of instruments fail to capture these two variables, their
similarity to natural instruments suffers (Kendall, Carterette, Hajda, 1999).
The present study extends previous research by exploring the patterning of

spectral centroid across the playing ranges of orchestral instruments. A
concert Bb scale consisting of quarter notes followed by a quarter reset (M. M.
= 72) was recorded across the playing ranges of a large set of wind and
stringed instruments. In addition, the melody "All Through the Night" in Bb was
also recorded in the middle tessitura. Spectral analyses were conducted for
every steady state (ca. 1000 msec) across these performed contexts, and the
unweighted spectral centroid (indpendent of frequency) calculated. Various
comprative statistical analyses of these physical data were conducted. Next,
resynthesis was employed with an average spectral centroid across the tessitura
of the instrument applied without variance to every note of both the scale and
melody. In addition, the lowest concert Bb centroid was applied, as well as the
highest concert Bb applied. Perceptual similarity and verbal ratings of timbral
quality were conducted within instrument across these independent variables.
Results are too complicated to present here, and analysis continues.
Back to index
file:///g|/Wed/Kendal2.htm [18/07/2000 00:37:19]

Singing on High: investigating the use of singing in Christian worship
Proceedings paper
Diana Meadows BA (Hons) MMus
42 Yarmouth Road
Blofield
Norwich
NR13 4LQ
Department of Music
Sheffield S10 2TN
Background
The role of singing in Christian worship has always been important. References to early examples can
be found in the Bible. In the Old Testament, in Exodus 15:1, after the Israelites had crossed the Red
Sea the Song of Moses was sung, and in the New Testament, in his Letter to the Colossians 13:16,
Paul writes:
Let the word of Christ dwell in you richly as you teach and admonish one another with all wisdom, and
as you sing psalms, hymns and spiritual songs with gratitude in your hearts to God.
Hymn singing is seen to play an important part in uniting a community during public worship, with
the congregation standing, or sitting and singing as one. This can provide comfort and strength to the
individual as well as the entire congregation. (Tamke, 1978). Singing in worship provides an
opportunity to praise and worship God, to provide a focus in prayer and can be seen as an aid to
evangelism. (Archbishop, 1992).
Music and dance have united communities for centuries, not only in a religious context but also within
battle and sport. The relationship between religion and sport is of special significance. Liverpool,
Everton, Celtic, Rangers, Manchester United and Manchester City Football Clubs have their roots in
Protestantism, Catholicism and Methodism. Christian rallies such as Spring Harvest, Easter People
and Soul Survivor share several characteristics with football matches. (Percy & Taylor, 1997)
Historically these rallies have attracted large crowds. In the eighteenth century two great preachers,
John Wesley and George Whitefield, preached to crowds in excess of 20,000 people, while during the
nineteenth century, the American evangelists Ira D. Sankey and Dwight L. Moody drew similar
numbers to their meetings in the large cities in Britain.
There is anecdotal evidence recording the effects of singing at these meetings. In 1859, when Revival
came to Britain from America there were reports that congregational singing changed from being
formal and constrained to joyful and full of praise, the result of which was a sense of peace (Phillips,
1989). This was followed by the Welsh Revival where the quality of the singing was especially noted
and the crowds were singing with great joy making use of their bodies as well as their voices (Evans,
1969). Similar accounts can be found in reports of Christian events past and present.
Today thousands of people attend the large Christian events, and now, as in the past, singing by those
file:///g|/Wed/Meadows.htm (1 of 5) [18/07/2000 00:37:20]

who attend has been of great importance.

Those who would be reluctant to sing anywhere else, carry out singing at large sports events and
Christian worship, whether at a large rally or a small congregation, and there has been little detail
given to the role congregational singing plays within Christian worship.
Aims
This paper seeks to investigate what might constitute a psychology of singing in Christian worship.
This is of particular importance because since the 1960s there has been a significant increase in the
numbers participating in worship in Evangelical and Charismatic churches which make use of more
popular styles of presentation. Anecdotal reports indicate that singing is a critical component in
drawing people into a particular church.
Method
A quantitative questionnaire was distributed to church leaders and music directors of 75 Christian
churches within a five-mile radius of the centre of Norwich, U.K. These churches represented all
denominations, including Roman Catholic, Anglican, Methodist, Baptist and New Churches. The
questionnaire sought to discover ways in which the congregation participated in worship. For example
by singing traditional hymns or modern worship choruses, playing musical instruments or dancing.
Leaders were asked to provide information on who selected the music for a service, their
qualifications and experience in church music. There were also questions about the age groups
represented within their congregations and whether music had resulted in bringing people to their
church or driving them away. Church leaders were asked whether they believed music to be important
in worship, and to place in order of importance the reason for including music in their services.
Norwich was chosen, as it is seen within the Christian community as an interesting and active site for
all forms of worship in the United Kingdom.
A second study followed with a series of qualitative interviews with worshippers from a carefully
chosen selection of these churches, as well as participants in the large Christian rallies, Spring
Harvest, an ecumenical event, and Easter People, organised by the Methodist Church. The churches
selected for this study represented a cross-section of congregations, including High Church,
Evangelical, Non-conformist and Charismatic churches.
A qualitative questionnaire was given to individual members of these churches asking for reasons why
they attended their particular church and in what way music is important to them in worship. They
were also asked whether they would sing in any other setting. Finally they were asked to name their
five favourite hymns or choruses and, if possible why they were of special significance.
Results
Analysis of the data found that the more popular Evangelical, including Anglican and Nonconformist,
and Charismatic churches recognised the importance of congregational singing in worship, especially
those which use music written in a style accessible to the singers.
There are eight churches within a five-mile radius of the centre of Norwich, which have congregations
in excess of 200 people. Of these, five churches, two Anglican, two New Churches and the Salvation
Army, regularly use modern worship choruses accompanied by a variety of musical instruments. Two
churches, one Anglican and one Methodist, occasionally sing choruses. All age groups are represented
at these churches, and at three of them the 18 - 30 age ranges made up the highest proportion of the
congregation.

At the other end of the scale, ten churches have congregations with less than 50 people. Five churches,
four Anglican and one Evangelical Free, rarely use worship choruses, and five Anglican churches
never use them. The majority of these churches have no members under the age of 50 in their
congregations.
Many church leaders reported that they chose the hymns or choruses purely for the words, and gave
little thought to the setting of the verse or whether the congregation could sing them. Worshippers
found that in many cases, the tunes to some of these texts were unknown and difficult to sing, which
often led to dull, uninspired singing.
On the other hand, church leaders liasing with their musical directors or worship leaders chose
settings to the texts, which enabled meaningful participation in worship.
A selection of reasons for including music in worship was given and leaders were asked to place them
in the order of importance. 'Praising God' was by far the most significant reason for the inclusion of
music, and 'establishing a mood' to aid worship followed this. 'Fostering a sense of community' and
'aiding evangelism' came at the bottom of the list. Interestingly, individual worshippers believed these
two points to be important factors for the inclusion of music.
Leaders were asked whether their congregations made specific requests for hymns or choruses. The
majority of congregations preferred a balance of traditional hymns and worship choruses. Some would
like to have the opportunity to sing new hymns or songs, but because of small congregations did not
have the confidence to attempt them. One congregation specifically often commented that they
wanted to enjoy "a good sing" when attending a service.
The wish to improve the music and singing in a service was widespread, but unfortunately the
resources were limited, either with personnel or restrictions of their building. One church, with a small
congregation, accompanied the singing with a flute, a trombone and a euphonium.
In reply to the qualitative questionnaire, it was clear that worshippers from all denominations
recognised the importance of singing in their worship. They reported that the involvement of joining
with fellow Christians in praising and worshipping God through song heightened their emotions,
encouraged an intimate personal relationship with God and provided a sense of belonging. A hymn or
worship song provided the opportunity for the individual to communicate on a personal level with
God, praising Him, giving Him thanks and asking Him into their lives. New traditional-style hymns
provided the opportunity to sing their concerns with texts written about issues of today such as
homelessness, the environment and racism.
Today's worshippers report that if the first hymn was dull, unknown or difficult to sing, this had a
detrimental effect on their enjoyment of the service, whereas if the first hymn was joyful and easy to
sing it lifted the spirits and prepared the way for worship. One person stated that they could be
enjoying a service, but when a hymn or song was sung which they did not like, it ruined the entire
service.
.
The confidence to sing was encouraged when the texts were set to music written in an accessible,
secular style. Many churches now make extensive use of musical instruments to accompany singing,
and this again helped to increase confidence. The use of percussion was particularly helpful in
heightening awareness and emotions. One church has a collection of African drums, tom-toms and
other percussion instruments for worshippers to use during worship.

The singing of hymns at funerals was found to help, as it provided the opportunity to express personal
grief and emotion.
A significant number of those worshipping at New Churches admitted that it was the singing that
drew them away from the traditional denominations. Others, who preferred quieter services with less
personal participation, found a church offering this. There is no doubt that one of the most important
factors for changing the place of worship is the music used for congregational singing.
The growing use of overhead projectors and computer-generated lyrics instead of hymnbooks initiates
other forms of personal involvement for the worshipper. There is more freedom to clap, wave their
arms or dance as they sing, leading to increased feelings of euphoria and well being.
Several individuals reported they had experienced tears, joy, euphoria or ecstasy as a result of singing
within worship.
There is also a contextual element to worship. When attending a service in Norwich Cathedral for an
alternative Halloween service the congregation was identified, in the main part, as consisting of
members of the Evangelical and Charismatic churches. In this context, worshippers who would, in
their less traditional church buildings, have sung enthusiastically and with great feeling, but in the
Cathedral were controlled and restrained.
First hand reports record that many worshippers became Christians during the singing of a hymn or
chorus, and they can remember vividly which hymn this was. Billy Graham, the great American
evangelist, provided an excellent example of how to build an atmosphere with the use of hymns and
choruses. The hymn 'Just as I am' (1834), led to his conversion and this hymn has meant a great deal
to many new Christians. Other favourites include 'How great Thou art', again made popular by Billy
Graham and 'And can it be' by Charles Wesley, especially the fifth line of the fourth verse 'My chains
fell off'. These and other older hymns have remained popular with their strong melodies that are easy
to sing and members of congregations often sing spontaneously in four-part harmony. In the Billy
Graham Crusades in Britain during the 1980s a popular chorus was 'Majesty' to which many made a
commitment. Recently the modern worship choruses 'Lord, the light of your love' by Graham
Kendrick, and 'My Jesus, my Saviour' by Darlene Zschech have been sung with great assurance.
Conclusions
Worshippers may use singing in order to heighten an emotional experience. The simple repetitive
music found in worship choruses can be very effective in setting the atmosphere for a service.
Feelings of joy, praise, sorrow, love, compassion and contemplation can all be encouraged by music
with the help of a sympathetic worship leader or musical director, but it is not always possible to
know which particular hymn or chorus will affect members of the congregation, or when. Churches of
all denominations are becoming increasingly aware and sensitive to this and are making use of singing
to promote a sense of community, both within and outside the church. The implications for the power
of singing in psychological terms are immense. There is a need for the theory of religious singing to
be developed, using music from the past and present.
References
Tamke, S. S.: Make a joyful noise unto the Lord (Ohio, Ohio University Press, 1978)
Archbishop's Commission: In Tune with Heaven (London, Church House, 1992)
Percy, M. & Taylor, R.: 'Something for the weekend sir? Leisure, ecstasy and identity in football and
contemporary religion': in Leisure Studies 16 (1997)

Phillips, T.: The Welsh revival (Edinburgh, The Banner of Truth trust, 1989)
Evans, E.: The Welsh Revival of 1904 (London, Evangelical Press, 1969)
Author's note
Diana Meadows BA (Hons), MMus is Chairman of Musical Keys, a Norfolk Charity providing music
and movement for children under the age of eight with special needs. The author has had a great deal
of experience as a music director in nonconformist churches.
This research is part of doctoral research undertaken under the joint supervision of Dr. Jane Davidson,
Department of Music and Rev. Canon Dr. Martyn Percy, Director of The Lincoln Theological
Institute, both at The University of Sheffield.
Back to index

INFLUENCE OF FORMAL INSTRUMENTAL MUSIC TUITION (...NCE AND ENGAGEMENT IN COMPUTER-BASED COMPOSITION
Proceedings paper
INFLUENCE OF FORMAL INSTRUMENTAL MUSIC TUITION (FIMT) ON ADOLESCENT

SELF-CONFIDENCE AND ENGAGEMENT IN COMPUTER-BASED COMPOSITION
Frederick A. Seddon and Susan A. O'Neill, Department of Psychology, Keele University
Background. A previous study with 11 year olds revealed that although teacher evaluations of the childrens'
compositions did not differentiate between the compositions of children with and without FIMT, children
without FIMT were more likely to rate their own compositions lower than children with FIMT (Seddon &
O'Neill, 1999). This finding is supported by previous studies which have found that children without FIMT
lack confidence in performing musical tasks if they attribute their lack of ability to a lack of formal music
training (Covington and Omelich, 1979; Vispoel and Austin, 1993, 1998). The lack of agreement between the
teacher evaluations and the children's self-evaluations suggests that children's confidence levels may
influence the accuracy of their self-assessments. Past studies have also revealed that during computer-based
composition, musically trained adolescents experimented less with possibilities offered by the computer and
produced compositions with more 'fixed ideas' about creating music than untrained adolescents (Scrip,
Meyaard and Davidson, 1988; Folkestad, 1998). Further research is needed to explore these issues if we are to
have a better understanding of the role of self-assessment in teaching composition using computer-based
methods.
Method.
Participants
Forty eight adolescents (aged 13-14 years) from a secondary school in Cheshire were selected by the Head of
Music and invited to participate in the study. Twenty five (Female=12, Male=13) had between 2-4 years prior
experience of FIMT and twenty three (Female=12, Male=11) had no prior experience of FIMT.
Materials
In order to collect composition data a Yamaha PSR 530 music keyboard was used. This music keyboard was
connected via MIDI to a computer with a researcher modified version of Cubase Score composition software
program installed. The program was modified by the first author to enable adolescents (after two thirty minute
training sessions) to compose with relative ease whether or not they had prior experience of FIMT. In
addition, the computer had a video card installed to enable all 'on screen manipulations' to be unobtrusively
recorded to a video recorder.
Procedure
The composition task involved asking participants to engage with computer-based composition after two
thirty minute training sessions. The training sessions were scripted to control for variations in training and
focused on how to use the composition program but did not provide any instruction in the compositional
process itself. No musical examples were given that could have implied 'correct models' to copy. Following
on from the two training sessions, participants were instructed to 'compose a piece that sounds good to you'.
Participants had three individual 30-minute composition sessions on three consecutive days in order to
complete their composition. The composition sessions took place in a room designated solely for the use of
the participant to ensure privacy. Participants were asked to choose three sounds from ten sounds available.
CD recordings of the completed compositions were made for evaluation by specialist music teachers.
file:///g|/Wed/Seddonam.htm (1 of 8) [18/07/2000 00:37:22]

During the composition sessions all 'on screen manipulations' of the program were unobtrusively recorded to
videotape through a 'video-card' installed in the computer. In addition to this videotape data, 'midi files' were
saved using different name references via the 'save as' method (e.g., David 1, David 2 etc.) for each
participant at the end of each composition session. Videotape recordings of 'on screen manipulations' and
'midi files' provided process of composition data for investigation.
Measure of participants' self-evaluations
Questionnaires were administered at two points in time. Time one: prior to both training in the use of the
Cubase program and engaging in computer-based composition and, time two: after completing their
computer-based compositions. At time one, participants were asked questions designed to reveal their levels
of confidence in their ability to compose pieces of music in relation to 'other students' with, and separately
without, FIMT. At time two, participants were asked to evaluate their own compositions in relation to 'other
students' with, and separately without, FIMT. According to Diener and Dweck (1980) knowing the
adolescent's ratings of 'others' performance allows a clearer interpretation of the evaluation of their own
performance. For example, an adolescent may rate his or her own performance as 8 on a 10-point scale; but if
that adolescent thinks that most other adolescents would rate a 9 or 10 on the scale, then he or she may not
consider 8 to be a successful score. On the other hand, if the adolescent believed most other adolescents
would rate a 4 or 5, then his or her performance might be outstanding by comparison (p.994). Thus the
difference between the questions 'How good do you think most students who have had [have not had]
instrumental tuition are at composing pieces of music?' and 'How good do you think you are at composing
pieces of music?' (time one) was calculated. Also 'How good do you think the compositions of most students
who have had [have not had] instrumental tuition will be?' and 'How good do you think your composition
sounds?' (time two) was calculated.
Teacher evaluations of compositions
The completed compositions were recorded to CD for evaluation by specialist music teachers using
'consensual assessment' procedures (Amabile, 1982; Daignault, 1996; Hickey, 1998). Evaluations of the
compositions were made by four, practising, experienced, specialist music teachers. Separately, they rated the
compositions each using different CDs with the compositions recorded in a different random order on each
CD. The compositions were identified by number only. They made their evaluations using pre-prepared forms
that required them to listen to the CD twice. On the first listening they were asked to rate for 'overall
impression'. On the second listening, they were asked to rate for 'creativity' and 'craftsmanship'. 'Overall
impression' was rated using a 7 point rating scale (anchored from 1= very poor, 4= average, to 7= excellent).
'Creativity' and 'craftsmanship' were rated using a 7 point rating scale (anchored from 1= low, 4= medium, to
7= high). Instructions to the teachers included: 'Please try to use the full range of the scale from 1-7', 'When
rating for 'creativity' please consider the following dimensions: originality, novel use of timbres, novel
musical ideas and variety.', 'When rating for 'craftsmanship' please consider the following dimensions: form,
technical goodness, detail, complexity and overall organisation'. Teachers were also advised:
'Though it is certainly possible to give similar ratings on all three categories ('overall impression', 'creativity'
and 'craftsmanship') do not allow how you rate on one scale to necessarily effect how you rate the
composition on the others. Keep the ratings for each category separate as you listen to the compositions'.
Results.
Participants' self-evaluations
T-tests were carried out to compare difference means at 'Time 1'and 'Time 2' means are displayed in Table 1.
Table 1.Mean scores (and standard deviations) of difference means for participants with and without FIMT
when compared to 'others' with, and separately, without FIMT at 'Time 1' and 'Time 2'

'Time 1' 'Time 2'

Participants (N)
compared to 'others'.... compared to 'others'....
With FIMT Without FIMT With FIMT Without FIMT
Mean (SD) Mean (SD) Mean (SD) Mean (SD)

With FIMT (25)
2.00 (1.12) -0.64 (1.47) 0.72 (1.10) -0.76 (1.54)
Without FIMT (23) 4.09 (1.76) 0.61 (1.80) 1.48 (1.34) -0.044 (1.02)
'Time 1'
Participants with and without FIMT rated themselves worse at composing than 'others' with FIMT but
participants with FIMT had significantly lower difference means than participants without FIMT (t = 4.95
(46), p<.001).
Participants with FIMT rated themselves better at composing than 'others' without FIMT and participants
without FIMT rated themselves worse at composing than 'others' without FIMT and that the difference
between these ratings were significant (t = 2.64 (46), p <.05).
'Time 2'
Participants with and without FIMT rated their compositions worse than the compositions of 'others' with
FIMT but participants with FIMT had significantly lower difference means than participants without FIMT (t
= 2.15 (46), p<.05).
Participants with and without FIMT rated their compositions better than the compositions of 'others' without
FIMT but participants with FIMT had higher difference means than participants without FIMT, although this
results failed to reach significance (t = 1.87 (46), p =.066).
Chi-square analyses were carried out to investigate these findings further. Participants were cross-classified
according to whether or not they had prior experience of FIMT and for 'Time 1', whether they rated their
ability to compose pieces of music same/better than or worse than 'others' with and separately without FIMT
and for 'Time 2' whether they rated how good their composition sounded the same/better than or worse than
'others' with and separately without FIMT. The results are summarised in Table 2.
Table 2. Number (and percentage) of participants with and without FIMT according to whether they rated
their ability to compose pieces of music same/better than or worse than 'others' with and separately without
FIMT at 'Time 1' and whether they rated how good their composition sounded the same/better than or worse
than 'others' with and separately without FIMT at 'Time 2'.

'Time 1' 'Time 2'

Participants (N) compared to 'others'.... compared to 'others'....
with FIMT without FIMT with FIMT without FIMT
Same/ Same/ Same/ Same/

Worse Worse Worse Worse
better better better better
(%) (%) (%) (%)
(%) (%) (%) (%)
With FIMT
3 22 20 5 13 12 21 4
(25)
(12.0) (88.0) (80.0) (20.0) (52.0) (48.0) (84.0) (16.0)
Without 1 22 13 10 5 18 16 7
(23)
FIMT (4.3) (95.7) (56.5) (43.5) (12.7) (78.3) (69.6) (30.4)
At Time 1 most participants with FIMT rated themselves 'worse' at composing than 'others' with FIMT but the
same or better at composing than 'others' without FIMT. At Time 2 participants with FIMT were more evenly
divided when comparing their completed compositions with 'others' with FIMT but there was little change
when comparing their compositions with 'others' without FIMT.
At Time 1 most participants without FIMT rated themselves 'worse' at composing than 'others' with FIMT but
were more evenly divided when comparing their completed compositions with 'others' without FIMT. At
Time 2 more participants without FIMT considered themselves the same or better when comparing their
completed compositions with 'others' with FIMT than at Time 1. This trend was repeated when comparing
their completed compositions to 'others' without FIMT.
In order to investigate these changes for participants from Time 1 to Time 2 difference means were the
dependent measures in a repeated measures ANOVA with two between-subjects factors (FIMT/ NON-FIMT
and gender) and two within-subject factors (Time 1/ Time 2 and 'others' with/without FIMT). The results are
summarised in Table 3.
Table 3.

There were significant main effects for Time 1/ Time 2 (F=28.95, p< .001) and 'others' with/without FIMT
(F= 189.84, p< .001). There were significant interactions between Time x 'others' with/without FIMT (F=
27.77, p< .001) and Time x FIMT (F= 4.55, p<.05). There were no significant main effects of interactions for
gender.
Teacher evaluations of compositions
Mean scores were calculated for each teacher on each category (see Table 4).
Table 4. Mean scores (and standard deviations) by teachers for 'overall impression', 'creativity' and
'craftsmanship'
'overall impression' 'creativity' 'craftsmanship'

Teacher
Mean (SD) Mean (SD) Mean (SD)
T1 4.15 (1.41) 3.94 (1.36) 4.10 (1.39)

T2 3.67 (1.00) 3.81 (1.12) 4.04 (0.99)
T3 4.33 (1.45) 4.04 (1.50) 4.17 (1.37)
T4 3.88 (1.12) 3.92 (1.77) 3.67 (1.65)

An examine of the correlation matrix for each teacher on each of the three categories ('overall impression',
'creativity' and 'craftsmanship') revealed that for all four teachers the three categories were highly correlated
indicating they were not differentiating between the categories. As the teachers were not differentiating
between categories it was decided to create an 'overall rating' for each teacher by calculating composite mean
ratings for ('overall impression', 'creativity' and 'craftsmanship') for each teacher. An examination of the full
correlation matrix for this 'overall rating' revealed the scores for three of the teachers were significantly
correlated. Since no significant correlation was found between the ratings of one of the four teachers this
teacher's rating was omitted from the analysis. Further examination of this teacher's ratings may reveal some
interesting differences in approach, however for the purposes of the present study, a consensus of agreement
between the other three teachers suggested more reliable evaluations (see Table 5).
Table 5. Correlation between teachers for 'overall rating'
T.1 T.2 T3
T.1 .40** .53**
T.2 .40** .32*
T.3 .53** .32*
** Correlation is significant at the 0.01 level (2 tailed)

* Correlation is significant at the 0.05 level (2 tailed)
Taking into account the significant correlation for the teachers' 'overall ratings', an 'overall evaluation' score
was then calculated for each participant based on the mean 'overall ratings' by the three teachers.
In order to examine the effects of prior FIMT and gender the 'overall evaluation' scores were dependent
variables in a two-way mixed ANOVA for participants according to prior experience of FIMT and gender. No
significant differences were found according to prior experience of FIMT or for gender and no interaction
was found between the variables suggesting that the evaluations by the teachers did not differentiate between
participants based on prior experience of FIMT or gender.
Discussion.
Results based upon the analysis of difference means at Time 1 reveal participants with and without FIMT
display low levels of self-confidence in their ability to compose when compared to 'others' with FIMT but
participants without FIMT display significantly less confidence than participants with FIMT. However, when
comparing themselves to 'others' without FIMT participants with and without FIMT display comparatively
high levels of self-confidence in their ability to compose. One interpretation of this could be that all
participants perceive a link between prior experience of FIMT and ability to compose. The participants
without FIMT lack confidence in their ability to compose when compared to 'others' with FIMT because they
attribute their perceived lack of ability to their lack of prior experience of FIMT. This lends support to
previous studies which found children without FIMT lack confidence in performing musical tasks if they
attribute their lack of ability to a lack of formal music training (Covington and Omelich, 1979; Vispoel and
Austin, 1993, 1998).The participants with FIMT may be displaying modesty or lack of self-confidence in
relation to similarly trained peers. Participants without FIMT display higher levels of self-confidence in their
ability to compose when comparing themselves with similarly untrained peers. Participants with FIMT also
display even higher levels of self-confidence in their ability to compose when comparing themselves to
untrained 'others' possibly because they believe prior experience of FIMT increases their ability to compose in
relation to untrained 'others'.
Results from the repeated measures ANOVA based upon the difference means at Time 1 and Time 2 reveal
that overall difference means were significantly lower at Time 2 than Time 1. This could be interpreted as an

increase in participants' overall levels of self-confidence as a result of engaging with the composition task.
Ratings at Time 2 being made in relation to the composition itself rather than speculating about composition
ability. Results also revealed participants with and without FIMT awarded lower ratings to their completed
compositions when compared to 'others' with FIMT but participants without FIMT awarded significantly
lower ratings than participants with FIMT. This lends support to a previous study which found children
without FIMT were more likely to rate their own compositions lower than children with FIMT (Seddon and
O'Neill, 1999). However, when comparing their compositions to 'others' without FIMT participants with and
without FIMT awarded higher ratings to their own compositions. When interpreting these results it is
important to note that the results of the teachers evaluations found no significant differences between the
compositions based upon prior experience of FIMT. This means that either the adolescents are employing
different evaluation criteria than the teachers or their levels of self-confidence are influencing the accuracy of
their self-evaluations.
Implications.
If self-assessment of adolescent computer-based composition is to be employed, issues of self-confidence in
relation to prior experience of FIMT need to be addressed to improve the accuracy of these measures. Based
upon the evidence of this study, it seems likely that self-confidence in computer-based composition
(regardless of prior experience of FIMT) will increase as a result of engaging with the process. Adolescents
should be encouraged to make self-evaluations of their compositions based upon the composition itself rather
than being influenced by their self-confidence in relation to their prior experience of FIMT.
References
Amabile, T. M., (1979). Effects of external evaluation on artistic creativity. Journal of Personality and Social
Psychology, 37(2), 221-233.
Covington, M.V., and Omelich, C.L. (1979). Effort: The double-edged sword in school achievement. Journal
of Educational Psychology, 71, 2, 169-182.
Daignault, L. (1996). A study of children's creative musical thinking within the context of a
computer-supported improvisational approach to composition. Unpublished doctoral dissertation. Chicago,
U.S.A.: Northwestern University.
Diener, C. L. and Dweck, C. S. (1980). An analysis of learned helplessness: ll. The processing of success.
Journal of Personality and Social Psychology, 39, 940-952.
Folkestad, G. (1998). Musical learning as cultural practice: as exemplified in computer-based creative
music-making. In B. Sundin, G.E. McPherson, and G.
Folkestad (Eds.), Children composing: research in music education (pp. 97-134). Malmo Academy of Music:
Lund University.
Hickey, M. (1998). Consensual assessment of children's musical compositions. Submitted to Creativity
Research Journal January 20, 1998.
Scripp, L., Meyaard, J., and Davidson, L. (1988). Discerning musical development: Using computers to
discover what we know. Journal of Aesthetic Education, 22 (1), 75-88.
Seddon, F.A., & O'Neill, S.A. (1999). An evaluation study of computer-based compositions by children with
and without prior experience of formal instrumental music tuition. Accepted for publication Psychology of
Music January 1999.
Visopel, W. P. and Austin, J. R. (1993). Constructive response to failure in music: The role of attribution
feedback and classroom goal structure. British Journal of Educational Psychology, 63, 110-129.

Vispoel, W. P. and Austin, J. R. (1998). How American adolescents interpret success and failure in classroom
music: relationships among attributional beliefs, self concept and achievement. Psychology of Music, 26, 1,
26-45.
Back to index

ON THE PERCEIVED COMPLEXITY OF SHORT MUSICAL SEGMENTS
Proceedings paper

Eric D. Scheirer, Richard B. Watson, Barry L. Vercoe
Machine Listening Group
MIT Media Laboratory
E15-401D, Cambridge MA, 02139-4307 USA
{eds, watsonr, bv}@media.mit.edu
Abstract
We conducted a listening experiment with 5-sec segments of natural music to investigate the human perception
of musical complexity, and to discover physical features of stimuli that might underlie this percept. The
judgments elicited were consistent across listeners and within different segments of a piece of music. A
multiple-regression model based on a psychophysical model of sound processing in the auditory system was
able to predict listeners' judgements of complexity. These results are consistent with the hypothesis that the
perceived complexity of a musical signal is an important surface feature of music.
1. Introduction
When listeners hear a musical stimulus, they immediately orient themselves in the sound and use surface
cues to make musical judgments, such as "this is by Bach" or "I hate this kind of music." This
orientation process is apparently pre-conscious, relating to basic auditory organization rather than to
high-level cognitive musical abilities. These musical judgments may be immanent or may lead to overt
acts such as foot-tapping, speech acts, musical gestures (such as vocalization) or other observable
behaviors.
Naturalistic real-world settings exist that provide opportunities to see these behaviors in action. Perhaps
the most significant is scanning the radio dial. A preliminary report on scanning-the-dial behavior and
its implications was recently presented by Perrott and Gjerdigen . They found that college students were
able to accurately judge the genre of a piece of music (about 50% correct in a ten-way forced choice
paradigm) after listening to only 250-ms samples. The kind of musical information that is available after
only 250 ms is quite different than the kind of information that is treated in the traditional sort of
music-psychology experiment (notes, chords, and melodies).
Immediate music-listening behaviors like these are fundamentally inexplicable with present models of
music perception. It is not at all clear what sort of cognitive structures might be built that could support
this sort of decision-making. The stimuli are too short to contain melodies, harmonic rhythms, or much
hierarchical structure. On the other hand, the spectral content, in many styles of music, is not at all
stationary even within this short duration. Thus, it seems quite possible that listeners are using dynamic
cues in the short-time spectrum at least in part to make these judgments. This sort of description makes
genre seem very much like timbre classification. Such a viewpoint is in concert with the writing of many
modern-day composers on the relationship between timbre and orchestration .
We define the musical surface to be the set of representations and processes that result from immediate,
preconscious, perceptual organization of a acoustic musical stimulus and that enable a behavioral
response. There are then three questions that immediately concern us. First, what sorts of representations
and processes are these? Second, what sorts of behaviors do they afford the human listener? Third, what
is the interaction between the representations and the processes as the listening evolves in time?
file:///g|/Wed/Scheirer.htm (1 of 11) [18/07/2000 00:37:27]

In this paper, we present exploratory experimental and computer-modeling research that investigates the
role of perceived complexity in the musical surface.
2. Listening Experiment
As part of a larger project on the perception and modeling of immediate music-listening behavior , we
conducted an experiment dealing with the human perception of musical complexity directly (along with a
number of other perceptual attributes that will not be reported here). We define this perceptual feature to
be the sense of how much is going on. It is the scale on which listeners can rate sounds along a range
from simple to complicated. This experiment was investigatory in nature and was not designed to test
any hypotheses in particular.
Overview of procedure
Thirty musically trained and untrained subjects listened to two five-second excerpts taken from each of
75 pieces of music. The subjects used a computer interface to listen to the stimuli and make judgments
about them. Among the judgments elicited was the subjects' sense of the music as simple or complex.
1. Subjects
The subjects were drawn from the MIT community, recruited with posts on electronic and
physical bulletin boards. Most (67%) were between 18 and 23 years of age, the rest ranged from
25 to 72 years. The median age was 21 years. Of the 30 subjects, 10 were male and 20 were
female, although there were no gender-based differences hypothesized in this experiment. All but
four subjects reported normal hearing. 22 reported that they were native speakers of English, and
6 reported that they were not.
9 subjects reported that they had absolute-pitch (AP) ability in response to the question "As far as
you know, do you have perfect pitch?" No attempt was made to evaluate this ability, and it is not
clear that all respondents understood the question. However, as reported below, there were small
but significant differences on the experimental tasks between those who claimed AP and those
who did not. The subjects had no consistent previous experience with musical or psychoacoustic
listening tasks.
After completing the listening task, subjects were given a questionnaire regarding their musical
background, and thereby classified into three groups: M0 (nonmusicians, N = 12), M1 (some
musical training, N = 15) and M2 (experienced musicians, N = 3). No formal tests of audiology or
musical competence were administered.
Breakdowns of musical ability by age and by gender are shown in Table 1. Note that the
experiment was not counterbalanced properly for the evaluation of consistent demographic
differences.
Male Female 18-25 25-30 30-40 40+
M0 1 11 9 0 2 1
M1 8 7 9 5 0 1
M2 1 2 2 0 1 0
1. Materials
The experimental stimuli were 5-second segments of real, natural music. Two non-overlapping
segments were selected at random from each of 75 musical compositions. The 75 source
compositions were selected by randomly sampling the Internet music site MP3.com, which hosts
a wide variety of musical performances in all musical styles by amateur and professional

musicians. Samples were mixed down to mono by averaging the left and right channels,
resampled to 24000 Hz, and amplitude-scaled such that the most powerful frame in the 5-second
segment had power 10 dB below the full-power digital DC. The music was not otherwise
manipulated or simplified. The stimulus set contains jazz, classical, easy-listening, country, and a
variety of types of rock-and-roll music.
It is worthwhile to explore the implications of this method of selecting experimental materials.
MP3.com is presently the largest music web site on the Internet, containing about 400,000
freely-available songs by 30,000 different performing ensembles. Using materials from such a site
enables studies to more accurately reflect societal uses of than does selecting materials from
personal music collections. The materials are certainly more weighted toward rock-and-roll and
less toward music in the "Western classical" style than is typical in music-psychology
experiments. However, this weighting is only a reflection of the fact that the listening population
is more interested in rock-and-roll than it is in "Western classical" music.
A second advantage of selecting music this way is that scientific principles may be used to choose
the particular materials. In this case, since the set to be studied is a random sample of all the music
on MP3.com, it follows from the sampling principle that the results we will show below are
applicable to all of the music on MP3.com (within the limit of sampling variance, which is still
large for such a small subset). This would not be the case if we simply selected pieces from a
more limited collection to satisfy our own curiosity (or the demands of music theorists).
2. Detailed procedure
Subjects were seated in front of a computer terminal that presented the listening interface, as
shown in Figure 1. The interface presented six sliders, each eliciting a different semantic judgment
from the listener. The scales were labeled simple-complex, slow-fast, loud-soft,
interesting-boring, and enjoyable-annoying (only the first will be directly discussed here). The
subject was instructed that his task was to listen to short musical excerpts and report his judgments
about them. Three practice trials were used to familiarize the subject with the experimental
procedure and to set the amplification at a comfortable listening level. The listening level was
allowed to vary between subjects, but was held fixed for all experimental trials for a single
subject.

Figure 1
Each of the 150 stimuli (75 musical excerpts x 2 stimuli/excerpt) were presented in a random
order, different for each subject. When the subject clicked on the Play button, the current stimulus
was presented. After the music completed, the subject moved the sliders as he felt appropriate to
rate the qualities of the stimulus. The subject was allowed to freely replay the stimulus as many
times as desired, and to make ratings in any order after any number of playings. When the subject
felt that the current settings of the rating sliders reflected his perceptions accurately, he clicked the
Next button to go on to the next trial. The sliders were recentered for each trial.
The subjects were encouraged to proceed at their own pace, taking breaks whenever necessary. A
typical subject took about 45 minutes to complete the listening task.
3. Dependent measures
For each trial, the final setting of the simple-complex slider was recorded to a computer file. The
computer interface produced a value from 0 (the bottom of the slider) to 100 (the top) for this
rating on each trial. Any trial on which the slider was not moved at all (that is, for which the slider
value was 50) was rejected and treated as missing data for that stimulus. Approximately 5.2% of
the ratings were rejected on this basis.
The response variables were shifted to zero-mean and scaled by a cube-root function to improve
the normality of distribution. After this transformation, the responses (labeled SIMPLE for
brevity) lie on a continuous scale in the interval [-3.68, +3.68] and are bimodally distributed, with
modes at about ± 2.5. Two additional dependent variables were derived. The SIGN variable
indicates only whether the response was above or below the center of the scale; it is a binary
variable. The OFFSET variable indicates the magnitude of response deviation from the center of
the scale on each trial, without regard to direction. It is calculated by collapsing the two lobes of

the bimodal response distribution and is normally distributed.

4. Results
A correlation (Pearson's r) test was run in order to investigate relationships between the trial number
(that is, the position of a particular stimulus in the random sequence of stimuli for a subject) and the
dependent variable SIMPLE. This test explores possible learning or fatigue effects. The tests were not
significant (r4272 = 0.005, p = n.s.) . This is consistent with the null hypothesis that there are no learning
or fatigue effects in this task.
The pairwise intersubject correlations of subject responses were computed. Of the 435 intersubject pairs,
183 (42.1%) were significantly correlated at p < 0.05 or better. The mean intersubject correlation was
r150 = 0.177, p = 0.033. Thus, we may conclude that overall, subjects agreed on the judgment of
complexity. However, the proportion of variance explained is rather small; if we choose two subjects at
random, on average the ratings given by one subject explain only 3% of the variance in the ratings of the
other. There were no differences between musicians' agreement with one another and non-musicians'
agreement; 39.4% of the M0 subject ratings and 41.9% of the M1 subject ratings were significantly
correlated. (2 of the 3 inter-M2-subject pairs were significant, which may bear further investigation).
Since the stimuli were taken in pairs from the original musical sources, we may compare the ratings
from the first excerpt of a song to the ratings from the second. Given a stimulus, we term the other
excerpt from the same song the counterpart stimulus. On a subject-by-subject basis, the two excerpts
elicited strongly correlated ratings (r2040 = .340, p < .001). That is, given a subject's rating of one
excerpt of a song, that rating explains on average 11.6% of the variance in the counterpart. This is so
even though the two excerpts were selected at random and do not necessarily have any obvious
similarity. When the ratings are pooled across subjects, the mean ratings of each stimulus and its
counterpart are even more strongly correlated (r150 = .502, p < .001).
Several analyses of variance were conducted to explore the relationship between subject demographics
and the rated judgments of complexity. Results are summarized in Table 2. In each case, the dependent
variable was OFFSET, calculated by collapsing the two lobes of the bimodal response distribution, since
the main judgment was not normally distributed. OFFSET measures the degree to which subjects use the
ends of the scale relative to the center. Rejecting the null hypothesis in an analysis of variance of
OFFSET (that there is no effect of the subject condition) is a sufficient condition to reject the null
hypothesis for the main variable, SIMPLE.
As seen in the table, each of the demographic variables had a significant effect on the subject ratings.
The first two effects, based on subject and stimulus number, were expected. These effects that some
subjects consistently find all stimuli to be more complex than do other subjects, and that some stimuli
are rated more complex by all subjects than others. The rest of the effects were unexpected and difficult
to interpret. The means and 95% confidence intervals of OFFSET broken down by each of these
independent variables are plotted in Figure 2.
Independent df F p
variable
Subject 29 106.85 &LT;

0.001
Stimulus 149 2.081 < 0.001
Musical ability 2 7.055 0.001
Absolute pitch 1 32.317 < 0.001

Native English 1 9.655 0.002
Sex 1 118.221 < 0.001
Age 3 54.793 < 0.001
Figure 2
Experienced musicians (M2 subjects) used the ends of the scale slightly more than other subjects.
Subjects claiming absolute pitch used the ends of the scale slightly more. Subjects whose native
language was not English, female subjects, and older subjects also used the ends of the scale more.
Without many more subjects to fill out a complete multidimensional ANOVA, it is difficult to interpret
these small but significant differences. One possibility is that the independent variables shown here are
actually covariates of some unmeasured demographic variable that is more fundamental, perhaps
corresponding to social cohort. Small but consistent effects of subject demographics similar to these
have been measured in previous research on loudness judgments of natural music examples by Fucci et
al. .
3. Computational modeling
In parallel to the experimental research, we developed a psychoacoustic model that incorporates
submodels of tempo and rhythm perception , auditory scene analysis , and the extraction of sensory
features from musical stimuli. The auditory model is implemented as a set of signal-processing computer
programs. It operates directly on the acoustic signal, not from symbolic models of stimuli, and so can be
used to study naturalistic samples of music taken from compact discs or other acoustic sources.
1. Modeling technique
The psychoacoustic model extracted 16 features from each of the 150 musical excerpts. Brief
descriptions of the features are shown in Table 3. Scheirer provided more details on these features
and how they are extracted from musical signals. Note that there are no features that relate to the
cognitive structure of the musical signal. All of the features deal with sensory aspects of the
musical sound such as loudness, pitch, tempo, and auditory scene analysis.

NameMeaning
CHANCOH Coherence of spectral assignment to auditory streams
PCHSTAB Stability of within-auditory-stream pitches over time
MEANIM Mean number of auditory streams present in signal
VARIM Variance of number of auditory streams present in signal
MEANMOD Mean amount of modulation (spectrotemporal change) in

signal
LOUDENT Entropy of loudness estimates in auditory streams
PCHENT Entropy of pitch estimates in auditory streams
LOUDEST Loudness of loudest moment in signal
DYNRNG Dynamic range (measured in loudness) of signal
BESTT Most-likely tempo of signal
TEMPENT Entropy of tempo-energy distribution in signal
TEMPSTB Stability of tempo estimates over time course of signal
TEMPCNT Centroid of tempo-energy distribution
NUMB Number of beats elicited from foot-tapping model applied to

signal
MEANIBI Mean time between beats
VARIBI Variance of time between beats
The features were entered in a multiple-regression procedure, where they were used to predict the
mean complexity ratings for each stimulus that were collected in the experiment of Section 2.
(Even though the individual ratings were bimodally distributed, the mean stimulus-by-stimulus
ratings across all subjects were normally distributed, and so can be modeled with linear
regression). Two kinds of multiple regressions were computed. The first entered all features at
once, to determine how much of the mean complexity could be explained with this psychoacoustic
model. The second entered the features one-at-a-time in a stepwise regression procedure, to see
which features are most useful for explaining the primary degrees of freedom of the complexity
judgments.
2. Modeling results
The first model, in which all features were entered, was strongly significant, with R = 0.536 (p <
0.001). Thus, compared to the correlations with the counterpart stimuli calculated in Section 2.6,
the psychoacoustic model explains slightly more of the variance in the ratings (R2 = 0.294 for the

psychoacoustic model, r2 = .250 for the segment-to-segment correlation.) A scatterplot of the

predicted ratings vs. the observed mean ratings is shown in Figure 3.
Figure 3
Further, when the psychoacoustic features and the counterpart ratings were included in a single
regression, the combined R2 value was 0.448. This is remarkably close to the result
(0.294 + 0.250 = 0.544) that would be obtained if the covariance explained by the counterpart
ratings were precisely orthogonal to that explained by the psychoacoustic model. This finding is
compatible with the hypothesis that the sources of complexity shared between each stimulus and
its counterpart are primarily cognitive (musical style, genre, use of lyrics) while the sources of
complexity captured in the psychoacoustic model are primarily sensory.
The second model, in which the psychoacoustic features were entered in a stepwise regression,
was strongly significant at every step, as shown in Table 4. The +/- signs on each feature in Table
4 indicate the direction of the partial correlation of that feature with the residual at that stage of the
stepwise regression (recall that larger values for SIMPLE indicate simpler stimuli). In total, five
features are entered in the stepwise model. Two of these are features that relate to the
auditory-scene-analysis of the signal (CHANCOH and VARIM) and two are features that relate to
the tempo and beat structure of the signal (VARIBI and BESTT). In some cases, the sign of the
partial correlation seems counterintuitive. For example, the negative VARIM partial correlation
indicates that, once the effects of CHANCOH are accounted for, stimuli are simpler when they
have a more-frequently changing number of auditory streams. However, since in each stage of the
stepwise regression, only the residual from the previous stage is being explained, it is impossible
to interpret the role of the later features without a more-detailed analysis of the feature covariance.
The most important conclusion is that a model based on only five psychoacoustic features can
explain nearly 20% of the variance in mean ratings of stimulus complexity.
Step Feature entered R2 p

1 -CHANCOH 0.064 0.002
2 +VARIM 0.102 < 0.001
3 -LOUDEST 0.138 < 0.001
4 -VARIBI 0.163 < 0.001
5 -BESTT 0.186 < 0.001
3. Individual differences
The results in the previous section indicate only that the overall mean ratings can be predicted with
psychoacoustic models. It is also useful to explore individual within-subject ratings to examine whether
they, too, can be predicted with such a model. Since the individual ratings are not normally distributed, a
linear regression model is not appropriate. Rather, we converted the ratings into a binary response
variable (above center/below center) and used logistic regression to model this variable, called SIGN.
We computed 30 separate logistic regressions, one for each subject, using the 16 psychoacoustic features
to predict SIGN. That is, the logistic regression for a subject tries to predict whether the subject gave a
response above center, or below center, for each stimulus.
In the grand average, 73.6% of the responses were correctly predicted (50% is the chance level). There is
a clear advantage to using separate models for each subject. If a single model is used to predict the
responses of all subjects, only 58.6% of the responses can be predicted correctly.
Of the 30 subjects, the responses of 14 of them (46.7%) could be modeled significantly well to the p <
0.05 level. The other 16 subjects could not be modeled in this fashion. For some of the nonmodeled
subjects, the difficulty was that the responses given by that subject were so heavily weighted to one side
of the complexity scale that the constant term in the model explains nearly all of the log-probability,
leaving no residual for the predictors. For example, for subject #30, more than 80% of his/her responses
were correctly predicted by the model, yet this performance can be expected reasonably often by chance
(p = 0.18). Such results indicate that a larger set and even broader range of stimuli is required to evaluate
these models more carefully.
There were no significant effects of the demographic variables on the proportion of responses that could
be predicted. The null hypotheses that musicians' responses are as easy as nonmusicians' to predict,
males' as easy as females', and so forth, cannot be rejected with this testing methodology.
30 independent stepwise logistic-regression analyses were also computed, to examine the various
features that helped to predict the different subjects' ratings. In these analyses, since the number of
predictors and thus the degrees of freedom are fewer, more of the analyses reach significance. 25 of the
30 subjects (83.3%) had their responses predicted significantly well with a logistic model containing
between one and five predictors. The most frequently-entered features were MEANMOD, entered in 8
of the 25 models; TEMPENT and VARIM, entered in 7 of the models; and BESTT, entered in 6 of the
models. All of the features except DYNRNG were entered in at least one model.
If many more subjects had been used in the study, it might be possible to divide them into groups based
on the features that predict their responses. But this is difficult when the features number more than half
the subjects as they do here. As one example of this sort of analysis, we divided the subjects into two
groups. The first group consisted of those subjects (N = 12) for whom MEANMOD or CHANCOH were
entered as predictors in the stepwise regression (these two features are strongly correlated, r = .270). The
second group consisted of the rest (N = 18). Using these two groups, we determined how many of the
intersubject correlations in rating patterns were significant, as was done for the whole subject pool in
Section 2.6. 62.1% of the 66 intersubject correlations in the first group and 49.0% of the 153 intersubject
correlations in the second group were significant at p < 0.05 or better. Thus both groups seem to be more

homogenous, according to this metric for homogeneity, than the subject pool as a whole was.
This argument by itself is not conclusive, as it is somewhat circular (second-order statistics are used to
identify subjects to put into groups, who are then found with related second-order statistics to have
something in common). However, it indicates a method that for a larger study might use to identify
groups of subjects who share common strategies for making complex judgments. This is a first step
towards a broader study of individual differences in listening behavior.
4. Discussion
Let us return to the concepts put forth in the introduction. We assume for the moment that there is a stage of
perceptual processing that can reasonably be called the musical surface. How could we determine whether a
particular feature of music (complexity, in this case) is a surface feature, and whether a particular judgment or
behavior is based partly, mostly, wholly, or not at all on the surface features of music?
Of course we do not mean to argue that only surface information is used for making musical judgments.
Surely, low-level surface information and high-level cognitive information interact in complicated ways in any
music-perception situation. However, most previous research on musical has focused exclusively on cognitive
cues, such as tonal constraints, melody construction and identification, and other structural aspects of music.
This approach limits both the styles of music that can be addressed (since the overwhelming majority of
cognitive-structural hypothesis about music perception narrowly target Western classical music) and the
explanatory power of the models. It is difficult to see how theories of music perception could ever relate to the
acoustic signal when the basic theoretical elements are so distinct from the sensory aspects of hearing.
The modeling of cognitive aspects of music perception must be considered in relationship to the sensory
modeling results that we have presented. The statistical results shown here demonstrate that significant
proportions-more than a quarter-of the variance in human judgments of complexity can be explained without
recourse to cognitive models. In other words, we have demonstrated that a sensory model suffices to explain a
significant proportion of the variance in this judgment. The only explanatory space left to cognitive models
remains in the residual.
The independence of the variance in judgments explained by the counterpart ratings, and that explained by the
psychoacoustic model, allows us to formulate a coherent hypothesis regarding these two factors. Namely, that
the variance explained by the counterpart ratings is primarily due to cognitive or structural similarities and
differences among a set stimuli, while the variance explained by the psychoacoustic model is primarily due to
sensory similarities and differences. One test for this hypothesis would be to control the length of the stimuli
used in the listening task, as done by Perrott and Gjerdigen in their scanning-the-dial experiment. If the
hypothesis is correct, as stimuli become very short, the counterpart ratings should be able to explain relatively
less variance than the psychoacoustic model, because there will be little basis for examining structural
similarities and differences among the stimuli. In contrast, as the stimuli become longer, the counterpart ratings
should be able to explain relatively more variance, as the structural properties of the music become more
important for mentally summarizing it for comparison.
A pressing question regarding experimental judgments of the sort we have reported here is that of individual
differences. Although the intersubject variance in this task was small enough that experimental effects could be
observed, it still seems large relative to the ratings being made. It is obviously inadequate to divide listeners so
crudely into categories by their musical backgrounds.
Considering again the modeling results from Section 3, we can formulate several hypotheses regarding
individual differences. The question at hand is what sorts of differences there are among listeners. We
distinguish three hypotheses targeting only the sensory aspects of musical hearing (although we do not mean to
claim that this list is exhaustive):
H1. There are no important differences among listeners. Different listeners use essentially the
same features weighted the same way to make judgments.

H2. Individual differences are based on different weights applied to a single feature set. Each
listener extracts the same auditory cues from sounds, and then these cues are combined with
different weights to form judgments.
H3. Individual differences are based on different features of sound. Different listeners extract
different cues and combine them in idiosyncratic ways to form judgments.
The present results are not compatible with hypothesis H1. If H1 were true, then a single regression model
would be as good a model for subjects' judgments as the individually-adapted models. But we found that
individual models could predict the subjects' judgments much more accurately than a single model.
We did not collect enough data in this experiment to distinguish H2 and H3. Although it is clear that different
stepwise models enter different features, a few of the features are entered very often, and the overall space of
features is really quite small. In the music-psychology literature, there seems to be almost no discussion of
different listening strategies that listeners might adopt, the reason that different listeners (even those with
similar musical experience) hear different things in music, or the perceptual and cognitive bases of musical
preference. These topics must be considered crucial if we wish to develop a coherent psychology of
music-listening behavior. Continuing evaluation of these hypotheses, and other hypotheses regarding
individual differences in listening, awaits future research.
References
Erickson, R. (1985). Sound Structure in Music. Berkeley, CA: University of California Press.
Fucci, D., Petrosino, L., & Banks, M. (1994). Effects of genre and listeners' preference on
magnitude-estimation scaling of rock music. Perceptual and Motor Skills, 78(3), 1235-1242.
Perrott, D., & Gjerdigen, R. O. (1999). Scanning the dial: An exploration of factors in the identification of
musical style. Paper presented at the Society for Music Perception & Cognition, Evanston, IL.
Scheirer, E. D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society
of America, 103(1), 588-601.
Scheirer, E. D. (1999). Sound scene segmentation by dynamic detection of correlogram comodulation. Paper
presented at the International Joint Conference on AI Workshop on Computational Auditory Scene Analysis,
Stockholm.
Scheirer, E. D. (2000). Music-Listening Systems. Unpublished Ph.D., Massachusetts Institute of Technology,
Cambridge, MA.
Back to index

From: Dr
TOWARDS A COMPUTATIONAL MODEL OF EXPRESSION IN PERFORMANCE: THE GERM MODEL
Dr. Patrik N. Juslin
Patrik.Juslin@psyk.uu.se
Background:
Studies of music performance have been conducted for a hundred years. This
research has yielded a large body of findings regarding different aspects of
performance. In particular, a lot of research has concerned a phenomenon
referred to as performance expression; that is, variations in timing, loudness,
timbre, and pitch that form the so-called microstructure of a performance. A
number of different approaches to performance expression have been advanced,
but few attempts have been made to relate the different approaches.
Aims:
This paper presents a computational model of performance expression: The GERM

model. The aim of the GERM model is to (a) describe the principal sources of
variability in performance expression, (b) emphasize the need to integrate
different aspects of expression into a common model, and (c) provide some
preliminaries (germ = a basis from which a thing may develop) for a
computational model that simulates the different aspects.
Main contributions:
Drawing on previous research on music performance, the authors propose that

performance expression derives from four main sources of variability, namely
(1) Generative rules that function to convey the musical structure in an
appropriate manner, (2) Emotional expression that is governed by the
performer's expressive intention, (3) Random fluctuations that reflect internal
timekeeper variance and motor control variance, and (4) Movement principles
that imply that certain aspects of the performance should be shaped in
accordance with biological motion. A preliminary version of the GERM model was
implemented by means of computer synthesis. Synthesized music performances were
recorded and evaluated in listening tests. The results from these tests are
briefly summarized.
Implications:
The preliminary evaluation of the GERM model suggests that (a) different
sources of expression can be integrated into a common model, (b) the model may
contribute to our understanding of how different sources of expression
interact, and (c) different performers might be characterized in terms of their
relative weights regarding different sources of expression.
file:///g|/Wed/Juslin.htm (1 of 2) [18/07/2000 00:37:29]

From: Dr
Back to index
file:///g|/Wed/Juslin.htm (2 of 2) [18/07/2000 00:37:29]

From Klaus
Proceedings. Keynote
Music and emotional meaning: Perception and production rules
Klaus R. Scherer
University of Geneva
Klaus.Scherer@pse.unige.ch
Emotional meaning is multiply defined, with reference to B=FChler's Organon

model, as representation, symptom, and appeal. The constitution of
representational and symptomatic meaning is partly biologically determined and
partly culturally constituted.
I suggest to formalize the lawful mechanisms whereby listeners within and

across cultures access these types of emotional meaning inherent in music as
perception rules. I further suggest that the appeal function consists in the
induction, through music, of a state corresponding to the emotional meaning of
the piece in the listener. This process of induction is described in the form
of production rules.
It is proposed that the production of emotional experience through music occurs

through a number of different routes, based on different mechanisms. I
distinguish central routes (i.e., implicating the central nervous system - CNS
-- in emotion generation) and peripheral routes (based on direct effects on the
somatic and autonomic nervous systems - SNS and ANS - with ensuing
proprioceptive feedback to central areas). A number of such rules, in
particular the induction through central appraisal and/or empathy as well as
the effect of peripheral rhythmic coupling, are identified and illustrated,
including a brief review of the pertinent research literature (effects of music
on subjective experience, cognitive processes, physiological response patterns,
and motor-expressive behavior).
Back to index
file:///g|/Wed/Scherer.htm [18/07/2000 00:37:29]

posters3
Poster Session 3 Wednesday

Theme 1: Emotion and meaning
Theme 2: Time and rhythm
Delegate Paper title Theme
Almayev, N. How meanings might be induced by the music 1
Bhatti, S. Cross-cultural study of affective responses to Qawwali 1
Cordes, I. Communicability of emotion through music rooted in early human vocal patterns 1
Costa, M. Melodic musical interval occurrence and perceived emotions in classical and serial music 1
The influences of a concurrent auditory frequency on the perception of an ambiguous visual
Datteri, D. 1
stimuli (ABSTRACT)
Fabian Somorjay, D The multidimensional perception space of baroque music 1
Hampson, P. A naturalistic empirical investigation of Deryck Cooke's theory of music and meaning 1
The relation of melody and text in Japanese popular songs: an examination based on a
Hoshino, E. 1
colour-choice method and a comparison of generations (ABSTRACT)
Kallinen, K. Emotional prototypes in music (ABSTRACT) 1
Charlie parker and the "golden section": an examination of musical proportion and pacing in the
Kenny, B. released and alternate takes of an oscar and treadwell. 1
The effects of altering environmental context on the performance of memorized music

Mishra, J. 1
(ABSTRACT)
O'Loughlin, N. The hidden programme of mahler's seventh symphony 1
Parker, O. Music cognition: the relationship of psychology and music 1
Richard, D. Skin, flesh, bone and marrow: the body in music (ABSTRACT) 1
American and Italian children's recognition of their musical interpretations: a cross cultural
Rodriguez, C. 1
analysis (ABSTRACT)
Ross, J. Generative performance rules and folksong performance 1
Santiago, D. Construction of meaning in "devaneio sobre as ondas" by Glauco Velasquez: a performer's view 1
Sapen, D. A comparative analysis of the psychodynamics of creativity 1
Cooper, W. Recognition of rhythm and pitch sequences in relation to a working memory model (ABSTRACT) 2
Franek, M. Short-term memory for tempo 2
Implicit memory for musical rhythm: is pitch information coded into representation underlying
Goto, Y. perceptual priming of rhythmic tone sequence? 2
Minami, Y. Cross -cultural comparisons in the movement of clapping in time 2

Schubert, E. The perception of dotted rhythms and the kerning illusion 2
Tekman, H. Dynamic attending in detection of temporal variability 2
Vetter, H. A regression analysis of timing 2
Weyde, T. Recognition of rhythmic structure with a neuro-fuzzy-system 2
Yamada, M. Temporal control mechanism in equal interval tapping with metronome ticking 2
Yanagida, M. Realizability of time-difference stereo using the precedence effect 2
Back to main index
file:///g|/poster3/posters3.htm [18/07/2000 00:37:30]

Almayev Nickolay
Proceedings paper
How meanings might be induced by the music

Nickolay Almayev
Email: almaev@mail.ru
Part 1.The problem of meaning's description

The problem of musical meaning - does the music has some meaning at all, and if yes, then how it
may be induced by it – leads us to the problem of how meaning should be described. In a little bit
more naive formulation the same question may sound as what do we call "meaning", or what do we
mean saying "meaning"?
Temporary leaving aside that this term has a long history of application to various problems of
psychology and philosophy, we can suppose that it is the meaning of a word that is typically
associated with the word "meaning". Example of a meaning may be found in a dictionary. It is a
designation of that to what a word refers with the help of some other words.
If this implication is correct then speaking about the meaning of the music we shall very soon face
some considerable difficulties.
First of all it is rather hard to say to what does music refer. Although Sloboda (1986) has found
several examples of events that are mimicked by music (like the singing of birds, the sound of springs
etc.) in general music refers to nothing in the outer world. This state of affairs immediately puzzles
our typically extraverted attitude towards the human psyche, which is also shared by the most of
contemporary psychology, and consists in definition of internal states of psyche through the external
events of environment.
Music somehow immediately refers to psyche, without any particular references to external events.
Another disappointment is connected to the rather restricted ability of words to describe that intensive
and complicate living–through processes that might be induced by music. Unlike the meaning of text
it is almost impossible to say what this or that piece of music is about. Each listener can propose
plenty of contents and typically none of them is interested in that, what the composer had in mind
while creating this piece.
Nevertheless, it is obvious that natural language and music can bridge each other through the
description of emotional states i.e. of how this or that content is experienced.
Elsewhere (Almayev N., 1999a) I've argued, that several problems of psychology that are connected
to language and thought cannot be solved in an unified manner if we shall not find some entities of the
fixation of meaning that are of a finer grain than the words. (For example, it is not clear what is in
commons between the several meanings of the words, between metaphoric and literal sense, between
some characteristics of a situation and usage of a specific words with the purpose of adequate
file:///g|/poster3/Almayev.htm (1 of 9) [18/07/2000 00:37:33]

Almayev Nickolay
description of this situation, and some more.)

The same and even in some greater extent refers to music, that is on the one hand definitely has no
significant connection to the meanings of words, but on the other hand cannot be called meaningless.
How meaning might be described
Theoretical tools for meaning description were based on Husserlian concept of intentionality and
inner temporality as the most primitive form of it.
Please refer to (Almayev N., 1999 a) and/or (Almayev N., in press) for the details of philosophical
background and methodology. We found also that intentionality and inner temporality are not
sufficient for the description of meaning realization in concrete living-through process. Particularly,
Husserlian terms could not describe why some intentions might realize themselves in consciousness in
spite of the ego's will, whiles the other can not do the same despite all the ego's desire.
Psychoanalytic concept of "psychic energy" that ascends to the works of Freud seemed to be rather
promising although demanding clarification both from the phenomenological and natural science
approaches.
The Definition of "Psychic Energy" in Terms of Intentions

It is obvious that "energy" is a dimension that is independent from the dimension of "objects," because
any object in our psyche can acquire energy and lose it. At the same time, energy is not separated
from intentions because when an object has energy its intentions are fulfillable, while they cease to be
such when it loses energy. Thus, "energy" describes not the actual but the potential state of
intentionality.
Energy is the ability to restore continuously the "whatness" of the former objects, but also the ability
to create the "whatness" of the new objects and the new moments of inner time.
If we combine an "intentional" definition of energy with the "neural a priori" (Almayev N., in press,
paragraph: "The Necessity of Intentions…") we will have to postulate the existence of a specific
function that reserves a "resource of identification," or a "whatness resource," in order that any kind of
object might exist in the psyche. Thus, psychic activity should be considered not only in the
dimension of objects, but also in the independent dimension, that of energy management.
If we are allowed to use a computer metaphor, energy acquisition will stand for the process of
memory allocation that is necessary for a program to work. The fact that our psyche is far from being
similar to a computer of the von-Neumann architecture does not mean that it does not need some
information resource in order to pass from one state to another. Energy may be defined as an
information resource that is required to provide the loading of ever new neural units in order to
connect with the active units. Correspondingly, the usage of such resource decreases when the
connections are already well established and functioning.
In various psychoanalytic theories the concept of psychic energy is closely linked with the concepts of
consciousness and unconsciousness, and, correspondingly, with those of "Ego" and "Id." Although
psychoanalytic theories share this common point, they are substantially different in many other
respects. In some theories the relation between the Ego and the Id is complicated by the participation
of the "Self" which, e.g., for C.G. Jung, obviously means something different than for H. Kohut or
D.W. Winnicott. Every such theory deserves an analysis that could easily exceed the volume of this

Almayev Nickolay
paper by many times.

However, in order to start the present analysis, the following preliminary definition may be proposed
that describes the relation between consciousness and the unconscious as:
between something that is made by us (i.e., refers to the ego and demands energy) and something that
is made by itself (i.e., refers to the Id and may spare some energy).
Such definition presupposes a gradual relation between the different levels of implementation of
activity, or the different states of formation of activity, and not a firm border between the entities. It
corresponds to the basic ideas of the Russian physiologists as P.K. Anokhin(1978), N.A.
Bernstein(1967), and P.V. Simonov (1985).
The Two Necessary Variables
We can postulate the existence of at least two variables of psychic energy management that are
necessary for the latter and obviously constitute some significant characteristics of all living-through
processes. These are: 1) the intensity of energy demand (in which the ego is presently engaged); 2) the
estimation of the level of energy (presently available to the ego).
These functions may be viewed as the dimensions of the space of emotional states. For example, in
depression that is very often accompanied by anxiety the demand for energy is high (anxiety), while
its actual level is low (depression). On the contrary, feeling joyful and free (I mean the kind of feeling
that is best of all expressed by Luesher's (1983) yellow color, from the eight-color set) corresponds to
the low demand for energy combined with a high level of available energy.
The state that corresponds to Luesher's red—directed activity—may be characterized as a high energy
demand combined with an equally high level of available energy.
The empirically found dimensions of the semantic differential that are reproduced in almost all the
factorizations without any considerable dependence on the material of scaling are: attitude
(good-bad), strength (strong-week), and activity (active-passive) (Osgood Ch. et al,1957, Osgood Ch.,
1976)
The strength factor easily corresponds to the possession of energy or its lack. The activity factor may
be traced to an intensive search for energy, or the absence of the above. As for the first factor, its
meaning may be interpreted as a protention of some energy resources for the sake of an object (good),
or an inhibition of such protention (bad).
These examples are intended to show that the two aforementioned functions are easy to detect in the
living-through processes of emotional states. At the same time, we can hardly hope to find them
empirically without taking into consideration the intentional modifications of inner temporality.
Moreover, an attempt to consider the intentional and energetic components of the living-through
processes separately will lead to a simplification and a loss of the precision of our models.
The Basic Assumption of the Model
Physically sound is a measured in frequency (Hz). But how can frequencies expressed in Hz produce
meaning? The physical scale expressed in Hz should be somehow correlated, so to say, with the
"human scale" which is at the same time physical and spiritual. Human body is a universal mediator
between "nature" and "spirit".
Therefore, I've suggested that mechanism of resonance play key role in the process of meaning's
induction that is conducted by music.

Almayev Nickolay
Sounds may possess symbolic meaning stemming from the association with the parts of body to
which they are "tuned in" through resonance.
The unconscious is always considered as something that is "under" consciousness and its center: the
ego. We can easily observe that the lower sounds resonate with the chest or even the stomach, while
the higher with the throat and this or another section of the cranium. Throughout all of the
Indo-European tradition beginning from the Vedas the chest (heart) is associated with the will and
passions, while the stomach with drives or desires (unconscious and relatively more abound in
energy), and the head—together with some sapient division—with reason or the mind (conscious but
very often lacking energy).
Such observations, as well as the everyday experience of speech and music, lead us to formulate the
basic assumption of the present model: the rising of the tune corresponds to the ego's more intensive
demand for energy, while its falling to the ego's energy demand of reduced intensity.
Attaching the Energy Variables to the Variables of Inner-temporality

The main principle of the inner temporality of consciousness that was discovered by Husserl as early
as in 1917 (Husserl, 1969) is as follows: that which is impressed transforms into that which is
retended and that which is retended passes to that which is protended. This basic passive
transformation encounters the ever new impressions that enrich and correct it.
In many of his late works, and most of all in the last voluminous "Experience and Judgement" (1938),
Husserl described the many levels of synthesis at which intentional structures experience
modifications during the functioning of our consciousness. This multi-level character of synthesis is
one of the basic principles of our model.
To all these characteristics, two independent dimensions must be added: first, the demand for energy,
and, second, the estimation of the level of energy.
The "understanding" of a musical piece occurs when such a pattern of energy distribution is found that
fits both the retended Gestalt of the melody and the living-through process that is meaningful for a
personality.
The Intensity of Energy Demand: the Tempo
Every content, in order for it to continue being active in the psyche, needs a certain amount of energy.
This is precisely why we can consider every sound as demanding a certain amount of energy for its
subsequent retention and protention. Correspondingly, the amount of sound in a particular period of
time—i.e., the intensity of energy demand—is, perhaps, the most important characteristic of a
living-through process. In traditional psychology and physiology its correlate is arousal. In music it is
encoded in the tempo. The realization of concrete retentions and protentions of each tone substantially
changes depending on the context of the tempo. Going of tune down corresponds to reduction in the
ego's energy demand, although it depends on tempo to what a cause such a reduction may be
attributed. It may be attributed either to the absence of energy due to the fact that additional energy is
not available (see Almayev N. in press, example 2, in "Lento"). Or, on the contrary, - to a surplus of
energy, when additional energy is not necessary (see ibid. example 1, in "Allegro molto").
The intensity of energy demand that is encoded by the pitch of the tone should not be confused with
the intensity of energy demand that is encoded by the tempo. In general, we can express this relation
as the one between "who" demands which is encoded in the pitch—it may be the "intellect," "will,"

Almayev Nickolay
"passions," and "how much" which is encoded in the tempo.

The following preliminary clarification may be proposed: ego's energy is the energy that might be
further invested into any object whatever, it is, so to say, "pure" energy, i. e. allocated resource
without given destination. Psychic states concerning management of the "pure" energy are
prototypical for what is called intellectual activity. Object's energy is the same information resource
although already allocated for a certain object. If ego begins fulfilling the objects intentions i.e. lets it
into the consciousness, object shares its resource with the ego, and ego begins feeling more abound in
energy, whiles at the same time more attached to the object, less "free". That is the state of "the
passions". The state of "the will" may be understood as some combination of the ego's own energy
demand with participation of energy that is being "stored" in some "valuable" objects.
The Rhythm
The rhythmic organization of a musical piece may also be viewed from the standpoint of energy and
inner temporality. Energy requires a certain time to be accumulated before its consumption. The
regularity of such a process that combines the phases of accumulation and consumption—which
permits to protend both the next consumption and the next accumulation—is encoded in the rhythm.
The Tonal Structure
Everything we said before did not presuppose the existence of this or another musical system. That
which is said about energy—to some extent including even the rhythm—refers to speech (its
intonations and speed) as well as to music.
Now let us consider an example from the European diatonic system and see how our model can be
applied to the explanation of the difference between the major and minor keys. In C-major the first,
third, and fifth degrees are C-E-G, and in C-minor C-Eb-G. The psychological difference between
these two sequences is obvious: the major key sounds much more confident, abundant in energy,
while the minor key sounds obviously lacking in energy. Since the first and the last degrees of both
sequences are equal, the psychological difference must be rooted in the different order of the intervals.
According to our basic assumption, the rising of melody reflects a demand for energy.
Let energy be counted in Libs (from Libido, the name for psychic energy in the works of Freud and
Jung), one Lib being equal to a semitone, 1Lib = 1 semitone (that is a rather preliminary and rough
approximation).
Then, in the case of the major key we have a rise of first 4, and then 3 Libs, while in the case of the
minor key it is first 3, and then 4.
Let us first consider the case of the major key. After the first tone (C) has finished sounding it is
retended, and the protention of the same tone and the same energy demand is active. Then a higher
tone (E) sounds, and correspondingly we have a higher demand for energy (+ 4). This rise in the
energy demand is reflected by our consciousness at the second level of the synthesis and, in its turn, is
being protended. It means that we passively expect the next rise in energy by the same 4 points.
However, the next sound (G) requires only + 3. Consequently, at the third level of the synthesis our
consciousness reflects a relative decrease in the energy demand and correspondingly we have a
feeling of a surplus of energy: 4 - 3 = 1.
In the case of the minor key the situation is slightly different at the first and second levels of the
synthesis, but is crucially different at the third. The transition from C to Eb requires only 3 Libs, but
the further transition to G requires 4 Libs. Thus in the minor key we have a relative increase in the

Almayev Nickolay
energy demand, instead of its relative decrease in the major. Correspondingly, 3 - 4 = -1.
Now we can easily feel the reason why, in particular in the 17th and 18th centuries, music was often
considered as closely related to mathematics. (Mozart himself said that music is joy that wants to
calculate itself).
The parallel to mathematics is obvious here: the relation that determines whether the key is major or
minor corresponds, so to say, to the second derivative from the function of the rising of the tune.
Examples
In the two aforementioned examples (Almayev N., in press) we considered low energy demand in
combination with the high level of available energy, and high energy demand in combination with low
level of available energy. One can refer to the famous R.Wagner's "Flight of Walkure" theme as an
example of intensive energy demand combined with the big amount of available energy.
Part2. Quasi-experimental study
Main assumption of all the proposed approach is that meaning of both natural language words and
music might be described in one and the same way through the models of meaning realization. Being
living-through processes meaning realizations are subjected to those intentional modifications, which
were described by Husserl and to which the two functions of psychic energy management were added.
If that could be considered true, then the task of mutual elucidation of music and natural language
becomes actual. We need natural language words for qualification of musical pieces, and also music
from its' site being very well structured quantitative system could help in elucidation of how the
meanings of the words are realized internally.
Therefore, first of all the reliability of correspondence between verbal qualifications and musical
pieces should be checked. Will the subjects who share the common native language and belong to the
same culture, but who are differently educated in general and in music in particular, show consistency
in their qualification of different musical pieces?
In order to answer this question we (with my student Elkhimova L.- this work was for her graduating
diploma at the Moscow State Open University) conducted in 1999 empirical research.
Method
Two experts in music (teachers of a musical school) were asked to select 6 melodies (3 -19-th century
Romantics, and 3 - "contemporary" Rock) and to describe each of them with the help of 4 adjectives,
which will suite each composition best of all. Experts were also asked to have in mind, if possible,
metaphors of taste.
As the result 24 descriptors were obtained. Some of them evidently implied estimation of tempo:
"agile", "turbulent", the other were more or less resembling traditional adjectives from Osgoods
(1976) semantic differential, several were very specific like: "aerated like Soda water", "little bit
sweet" ("Land of confusion"), "strict" ("Du hast"). Correspondence of adjectives of different
languages is the separate topic, therefore I will not try to translate the whole list.
Experts were asked only to propose metaphoric definitions for each composition, but not to compare
compositions to each other according to the definitions that were already made.

Almayev Nickolay
Hypothesis
Different subjects of the same culture but different educational and social background will be in
general agree with the experts in describing the musical compositions by the adjectives of their native
language (Russian).
H1. Subjects categorizations of different compositions will be significantly different and coincide
with that of experts.
H0. Subjects categorization of different compositions will be random and will not coincide with
experts estimations.
Stimuli
The following 6 compositions were selected: "Flight of a bumble-bee" by N. Rimsky-Korsakov,
"Hungarian Dance" by J.Brahms, F.Kreisler's "The Torments of Love", Genesis "I can't dance",
Genesis "Land of Confusion", Rammstein "Du hast".
Subjects
20 Subjects, 10 male, 10 female predominantly young (17-35 years) with different musical
preferences and educational background participated. Each of them received 6 blanks each blank
contained all the 24 descriptors (one blank for one composition). They were asked to estimate
correspondence of each composition to all the descriptors scaling from 1 - fits very poor to 7 - fits
very well.
Results
ANOVA with repeated measures was applied for the scores of each descriptor as a dependent variable
and compositions as the levels of independent (grouping) variable. Distribution of all the dependent
variables was quite close to normal. Data for each descriptor were computed separately. Compositions
significantly differed according to all the descriptors. 9 cases can be considered as the "definite hits" -
experts descriptions achieved the highest score within the subjects estimations and there was
significant difference between the leading composition and the next one closest to it. In 7 more cases
2 compositions one of which was those one that was predicted by experts shared first place with no
significant difference between them.
In the other 8 cases the composition that was predicted by experts was either among three or more
leading compositions or differed significantly from the first one.
Nevertheless, in almost all the cases prototypical compositions were estimated significantly higher
than the mean by corresponding descriptor.
We have also performed regression analysis of scores obtained for each descriptor as a dependent
variable and tempo as an independent one.
Results were very different for the different descriptors. It was high for descriptors that evidently
presupposed tempo estimation, low for all the taste-based metaphors and moderate for the rest. The
greatest dependency on tempo R-sq.=0.661 (S- function) was detected for descriptor "Energetic".
There were plenty of Quadratic and Cubic functions that served as the best approximations for various
"U" and "inverted U" type dependencies of some descriptors on tempo. For example, in the case of
descriptor "lucid" best approximation was cubic function with R-sq.=0.381, while linear
approximation provided only about 15 % of explained variance.

Almayev Nickolay
In all the taste based metaphors descriptors R-sq. for the dependency on tempo was very small, just
about several per cents.
Discussion
In general, H1 may be considered as accepted and H0 as rejected. All the compositions varied
significantly in all the descriptors. All the descriptors may be considered as adequate because the
mean scores of corresponding compositions were significantly higher than the mean.
As for the fact that in 15 cases selected composition shared its primacy with one or more other
compositions, it is rooted in the experimental task as it was formulated for the experts. They had to
propose four descriptors for each composition, although not to evaluate which composition fitted this
or that descriptor best of all. As the result descriptors with high loading on tempo has repeated several
times for each composition. "Dynamic", "turbulent", "exciting"("agitating") for the "Bumble-bee",
"agile" for "Land of confusion", "energetic" for "Du hast", "cheerful" for "I can't dance" and some
other adequate translation of which puzzles me. As the result, compositions with the higher tempo
either occupied the first rank or formed the whole group on the base of contrast to slow "Torments of
Love", as it was the case with the descriptor "cheerful" that was supposed to designate "I can't dance".
The most striking and unexpected result for us was that the subjects reproduced the strange
taste-based metaphors like "aerated" and "little bit sweet". Mean scores of the "Land of confusion" by
those descriptors were significantly higher than that of any other composition. All my attempts to
propose any reasonable explanation for this event ended as yet without any success.
"Strict" descriptor has also reproduced with the great difference of "Du hast" from all the other
compositions. The latter may lead to formulating a hypothesis of how "strict" is encoded by music.
"Strict" means that protentions of "not allowed" objects will be repressed quickly and without
hesitations. "Du hast" differs from the other compositions by the substantially greater number of
pauses which brake the melody. We always stay with some expectation of prolongation of previous
sounds when the next pause happens. This explanation of course has to be tested in a special
experiment that could help to identify concrete temporal and other characteristics of melody that are
responsible for the encoding of "strictness".
The meaning of "Aired" (not to mix with "aerated like soda-water") which was initially proposed for
"Land of confusion" seems to be encoded by the constant rise of tune without any significant
dependency on the tempo. The first in this descriptor was the "Bumble-bee flight", although the
second was slow "Torments of Love" that, nevertheless is characterized by the almost constant rise of
tune.
The tempo although being very important category cannot predict the results solely. Even in the case
of "Energetic" in which R-sq. was the biggest, the most energetic was "Du hast" with tempo=115,
while not significantly different from it "Bumble-bee" had tempo=188!
Nevertheless, Regression analysis seems to be the most appropriate procedure for evaluation of
influence of music variables on the estimation of this or that descriptor. Correlation (that presupposes
linearity and is so widespread in the different branches of Psychology) is hardly may be of some value
because most of the dependencies, as we have seen, are considerably different from the linear.
What is needed for an exploratory stage of investigations in relations between meaning of the words
and meaning of musical pieces is to be able to include more significant variables into regression
equations. Such variables might be: number and duration of rises and falls of melody, number and
duration of pauses, "speed" of the rise and "speed" of the falls, etc. Unfortunately, we neither had

Almayev Nickolay
software that could count such statistics for melodies, nor time to calculate it manually.
Concrete constants like, for example, the time at which estimation of available energy takes place is
the matter of another type of experiments of a more precise character. We plan to apply schemes of
one subject and paired comparisons of stimuli that will vary only in the duration of an event that is
under investigation in order to determine the constants.
References
Almayev N. (1999a) Dynamic Theory of Meaning: New Opportunities for Cognitive Modeling. Web
Journal of Formal, Computational & Cognitive Linguistics. http://fccl.ksu.ru/fcclroot.htm .
http://fccl.ksu.ru/winter.99/cog_model/proceedings.htm
Almayev N. (in press). The Concept of Psychic Energy and the Phenomenon of Music. Analecta
Husserliana: The yearbook of phenomenological research / Published under the authority of the
World Institute for Phenomenological research and learning - Dordrecht: Kluwer Academic
Publishers.
Penultimate draft is available at:
http://www.psychol.ras.ru/strukt/ALMAEV/penult.htm
Anochin P.K., (1978).Beitraege zur allgemeinen Theorie des funktionellen Systeme (Jena: Fischer,
Bernstein N.A. (1967). The Co-ordination and Regulation of Movements (Oxford: Pergamon Press,
Husserl E. (1939). Erfahrung und Urteil , Prag:Academia.
Husserl E. (1969). Gesammelte Werke Bd.10. Vorlesungen ueber inneren Zeitbewustssein. Den Haag.
Martinus Nijhoff.
Luescher M., (1983). Luesher Color Test. London-Sydney: Pan.
Osgood Ch., Susi C.J., Tannenbaum P.H. (1957). The Measurment of Meaning rbana: University of
Illinois Press.
Osgood Ch. E., (1976). Focus on Meaning vol.1.The Hague-Paris: Mouton.
Simonov P.V.(1985). The Science of the Human Higher Neural Activity and Artistic Creation.
Moscow: Nauka.
Sloboda, J. (1986). The Musical Mind. Oxford: Clarendon Press.
Back to index

Cross-cultural differences in perceiving the emotional content of music
Proceedings paper
Cross-cultural study of affective responses to Qawwali

Shabana Bhatti and Andrew Gregory
Department of Psychology, University of Manchester, U.K.
gregory@psy.man.ac.uk
Introduction
Music can arouse deep and profound emotions within us, often in conjunction with external situations. However one of the main questions
which remains unanswered is whether the emotional feelings, in response to music, are due to inherent features of the music or learnt by
association with concurrent events.
Cooke (1959) suggests that particular sequences of notes are always associated with, and express, particular emotions. He proposes sixteen
different basic sequences of notes, and gives examples from the repertoire of western classical music to support his proposal. Meyer (1956)
however suggests that emotion is aroused when a tendency to respond is inhibited. He argues that music sets up expectations, which produce a
mental response to complete these expectations. If the actual music is different from the mental expectation, then an emotional response is
produced. He points out that the communication of shared meaning can only take place within a cultural context. Only if one is familiar with
the music within a culture will one generate appropriate expectations. Sloboda (1991) demonstrated some empirical support for Meyer's ideas.
He asked musicians to describe musical passages that produced very intense emotional experiences, such as 'tears' or 'shivers down the spine',
and some of these experiences followed unexpected changes, such as a change of key or rhythm, or a new vocal or instrumental entry.
One approach to elucidating the question as to whether the emotional responses to music are learnt within a culture or are inherent to the
music is by cross-cultural studies of responses to music. If listeners can detect the emotional content of music from cultures with which they
are unfamiliar, then this would suggest that the emotional content is inherent in the music.
Several studies have been conducted using Hindustani classical music, which is prevalent throughout North India and Pakistan. This music
involves improvisation within a raga, which is a complex melody structure, similar to the western concept of mode, but more detailed.
Different ragas are traditionally associated with particular emotional feelings.
Deva & Virmani (1980) showed that Indian listeners are sensitive to the emotional content of classical Hindustani music. Studies comparing
Indian with western listeners have generally found differences in their sensitivity to such music. Castellano, Bharucha & Krumhansl (1984)
asked Indian and western listeners to rate how well a probe tone fitted a theme from each of ten ragas. Apart from the notes corresponding to
the tonic and fifth of the western musical scale, which are equally important on Indian scales, only Indian listeners were sensitive to the scales
underlying the ragas. Vaughn (1994) found that affective ratings by western musicians of melodies from ragas did not agree well with those
used by Indian musicians.
Gregory & Varney (1996) found differences between the responses of western and Indian listeners to Hindustani ragas. The listeners in this
study were all students living in Britain, and it could be argued that many of the Asian students were not so familiar with classical Hindustani
music, perhaps only coming across this music in the popularised form used in Indian "Bollywood" films. If so, this would tend to diminish the
extent of any differences. Gregory (1996) has however confirmed the differences between the emotional responses of listeners from different
cultural backgrounds in a more detailed study comparing western listeners with those from an Indian/Pakistani background, most of whom
were quite familiar with Hindustani classical music.
These studies would all seem to support the contention that the emotional responses to music are learnt within a particular culture. However
this may be an oversimplification. The possibility still exists that some features of music are universal across cultures in producing an
emotional response, while others are specific to particular musical cultures and are learnt within the culture.
Balkwill & Thompson (1999) propose a multiple cue model for the perception of emotion in music. They suggest that some musical features,
such as tempo and melodic or rhythmic complexity, may provide universal cues as to the emotional content, whilst other features such as
modality may be specific to particular cultures. Listeners may therefore rely on either, or both, of these types of cues depending upon their
cultural knowledge. They carried out an empirical study, playing Hindustani ragas to western listeners, and showed that the listeners were
sensitive to the intended emotions of joy, sadness and anger in the music, but not to that of peace. They also showed that listeners' judgements
of different emotions were significantly related to their ratings of certain musical features of the ragas. For example the perception of joy was
associated with fast tempo and low melodic complexity, whilst sadness was associated with slow tempo and high melodic complexity.
The present study looks at the issue from a different perspective, by using Qawwali music. Qawwali is a recognised musical genre in the
Indian subcontinent, but has unique characteristics related to its religious function. The term Qawwali (an Arabic word meaning "utterance")
applies both to the medium and to the occasion of its performance, the devotional assembly of Islamic mysticism (Sufism) in India and
Pakistan. Qawwali as music is a group song sung by qawwals. A group of qawwals is made up of a lead singer, one or two secondary singers
and musicians, and clapping junior members. Performers believe they have a religious mission: to praise the Name of Allah using rhythmic
handclapping, vigorous drumming on a barrel-shaped dholak, harmonium and a vast repertoire of sung poetry. Qawwals present mystical
poetry usually in either Farsi, Urdu, or Hindi. By repeatedly and hypnotically chanting salient phrases, they claim to transport audiences to a
file:///g|/poster3/Bhatti.htm (1 of 8) [18/07/2000 00:37:40]

trance-like state.
A detailed description and analysis of Qawwali performance has been given by Qureshi (1986). She outlines the underlying belief system, its
articulation through mystical poetry and the rules governing the making of and listening to Qawwali in the traditional Sufi assembly, including
the interplay between the musicians and the responses of the audience.
Qawwali is frequently claimed to convey ecstasy and joy to all listeners, even those quite unfamiliar with the cultural background. This makes
it a very suitable medium to examine Balkwill & Thompson's suggestion that some features of music provide universal emotional responses,
while others may be specific to a culture.
For comparison, three other types of religious music were incorporated into the investigation, so as to compare the relative importance of
musical features and religious significance. The Muslim call to prayer, the Adhan, has similar religious associations to Qawwali, but quite
different musical features, with a slow vocal melodic line. A traditional Christian hymn matches the historical association of Qawwali with
Sufism, and a fast paced example of Christian 'rock' music has musical similarities to Qawwali in terms of a strong beat and fast tempo.
The responses of listeners from four different cultural backgrounds were compared, so as to study the effect of differences in familiarity with
each type of music. The affective response was measured using subjective measures, as this technique is well established (Hevner, 1936;
Nielzén & Cesarec, 1981), and more objective physiological measures do not clearly differentiate between different emotions.
The aims of the present study are firstly to compare the emotional responses of listeners from different cultural backgrounds to Qawwali, so as
to test the claim that Qawwali conveys emotion to all listeners, and secondly to compare the emotional responses to Qawwali with those to
other forms of religious music. The overall aim is to use these examples to elucidate the relative importance of musical and cultural factors on
the emotional effects of music.
Method
Music Complete passages of music were used, rather than excerpts, as it may take time for the emotion to build up within the music.
Four complete pieces of religious music were used:
Duration
(min)
A typical Qawwali performance Allah Hu, by Nusrat Fateh Ali Khan 15
the Adhan,the Islamic call to prayer traditional 3
Christian Rock music Holy is the Lord, by Lindell Codey & 8

Andy Park
Traditional Christian hymn Come Thou long expected Jesus by St. Michael's 2
singers
Listeners Four groups of young people from different cultural backgrounds:
P Pakistani from Lahore very familiar with Qawwali and Adhan
BP Pakistani living in Britain some familiarity with Qawwali and Adhan
C British Christians familiar with Christian rock music and hymns
N Non-practising Christians some familiarity with Christian hymns
The letters above provide the key to Figure 1.

Response Scales
Listeners indicated on a set of seven-point adjectival scales the emotional feelings conveyed by the music. The scales selected for analysis
were: Arousing, Ecstatic, Emotional, Joyous, Peaceful, Profound, Spiritual, Uplifting. Each scale ranged from a minimum of 1 to a maximum
of 7. Response forms were translated into Urdu for the Lahore listeners, but otherwise were in English.
Several methods were employed to determine the most suitable adjectives. Uplifting and emotional were taken from typical descriptions of
Qawwali, and joy and ecstasy are terms frequently used for the emotions conveyed both by Qawwali and by evangelical Christian music. The
term profound was used in previous studies, both by Hevner (1936) and by Nielzén & Cesarec (1981). The remaining adjectives were chosen
from a pilot study in which ten people were asked, whilst listening to each piece of music, for adjectives that would describe their immediate
emotional associations.
Procedure
Each piece of music was recorded onto tape, and the order of presentation was counterbalanced so that successive listeners were presented
with different sequences of the four pieces of music. Each listener was tested individually in a quiet room, listening to the music played on a
tape recorder. After each piece of music, listeners were asked if they had heard this song previously, and whether they understood the words

of the song.
Results
For each adjectival scale, the mean responses of the four groups of listeners to the different pieces of music are shown in Figure 1.
Figure 1. Mean rating scores by each listener group on adjectival scales



The mean scores of the four groups of listeners were compared by means of a one-way analysis of variance, calculated for each piece of music
on each adjectival scale. The values of F and levels of significance are shown in Table 1.
Table 1. One-way analysis of variance comparing the difference between the groups of listeners on each adjective scale on each piece of
music.

Type of music
Adjective Qawwali Adhan Christian rock Hymn
Arousing 2.47 13.12 *** 11.17 *** 8.24 ***
Ecstatic 5.43 ** 15.82 *** 11.13 *** 3.28
Emotional 1.15 2.53 8.19 *** 5.04 **
Joyous 0.88 51.09 *** 8.48 *** 6.39 **
Peaceful 10.50 *** 12.72 *** 0.89 1.26
Profound 9.19 *** 7.62 *** 5.57 ** 4.37 *
Spiritual 5.07 ** 5.32 ** 11.10 *** 2.22
Uplifting 5.52 ** 35.52 *** 5.44 ** 7.81 ***
Table shows values of F.

* p < 0.05 ** p < 0.01 *** p < 0.001
These analyses were followed by post hoc Scheffé tests, and differences between groups are only commented on when significant on the
Scheffé tests.
Qawwali music was rated by all four groups of listeners as being highly arousing, emotional, and joyous, and there were no significant
differences among listener groups on these four adjectives. Qawwali thus does seem to convey certain affective states equally to all groups of
listeners. However on other scales there were significant differences between listeners. Both Pakistani groups rated Qawwali as being highly
ecstatic, spiritual and uplifting, but the other two groups gave lower ratings on these scales, with Christians giving the lowest.
Responses to the Adhan showed more differences between groups of listeners. Only on the emotional scale were there no significant
differences, with all groups rating it as moderately high. Pakistanis from Lahore gave higher ratings than all other groups on the scales of
arousing, ecstatic, and joyous. Both Pakistani groups rated the Adhan as highly peaceful, profound, spiritual and uplifting. The other two
groups however gave it low ratings on the scales of arousing, ecstatic, joyous, peaceful and uplifting.
For the Christian music there were significant differences between listener groups for most adjectives. The two non-Pakistani groups rated
both types of Christian music as being significantly more emotional, joyous and spiritual. The Christian group differed from the others in
rating the Christian rock music as highly arousing, and uplifting.
Discussion
Overall the results show that listeners have the greatest consensus over Qawwali music. Thus it seems that Qawwali does convey certain
emotions to all listeners, who agreed that it was arousing, emotional and joyous. This would lend support to the idea that the emotional
responses to music can, in some instances, be an inherent feature of the music, probably due to musical features such as tempo and the timbre
of the voice and instruments. Gabrielsson & Juslin (1996) found that various instruments differed in respect to their suitability for expressing
certain emotions. For example they argue that it is difficult to convey anger on a flute, or solemnity on an electric guitar. It is likely that the
instruments used in Qawwali, such as the dholak and the vigorous clapping of hands, as well as the emotional quality of the voices are
particularly good at expressing these emotions.
On the other hand, more religious emotional feelings, such as spiritual, ecstatic and uplifting, were only felt by those listeners who were
familiar with the music and its religious significance. This demonstrates the importance of cultural factors related to the awareness of the
religious significance of the music. Similarly, only the non-Islamic listeners rated Christian religious music highly on spiritually-related
emotions, presumably due to a cultural awareness of its significance. However, for the Christian Rock music, only the group very familiar
with its religious use rated it as arousing, profound and uplifting. Christian listeners to Islamic music on the whole gave lower ratings on
spiritual scales (ecstatic, spiritual, uplifting) than the non-practising Christians. This could be due to an awareness of its religious significance
combined with a negative attitude towards it.
The main results support Balkwill and Thompson's (1999) idea of a combination of musical features and cultural factors determining the
emotional responses to music. However the negative attitude of Christian listeners to Islamic religious music suggests that in addition there
may be situational factors that are also important determinants of the emotional response to music. Overall the emotional responses to the
different pieces of music seemed to be determined by a variety of factors, including inherent features of the music, familiarity with musical
styles and socio-religious beliefs influencing the appraisal of the music.
References
Balkwill, L-L. & Thompson, W.F. (1999). A cross-cultural investigation of the perception of emotion in music: psychophysical and cultural
cues. Music Perception, 17,

43-64.
Castellano, M.A., Bharucha, J.J. & Krumhansl, C.L. (1984). Tonal hierarchies in the music of North India. Journal of Experimental
Psychology: General, 113, 394-412.
Cooke, D. (1959). The language of music. Oxford, England: Oxford University Press.
Deva, B.C. & Virmani, K.G. (1980). A study in the psychological response to ragas. In R.C. Mehta (Ed.) Psychology of Music: Selected
Papers. Delhi, India: Sangeet Natak Academi.
Gabrielsson, A. & Juslin, P.N. (1996). Emotional expression in music performance: between the performer's intention and the listener's
experience. Psychology of Music, 24, 68-91.
Gregory, A.H. (1996). Cross-cultural differences in perceiving the emotional content of music. Proceedings of the 4th International
Conference for Music Perception and Cognition, Montreal, Canada, 407-412.
Gregory, A.H. & Varney, N. (1996). Cross cultural comparisons in the affective response to music. Psychology of Music, 24, 47-52.
Hevner, K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48, 246-268.
Meyer, L.B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Nielzén, S. & Cesarec, Z. (1981). On the perception of emotional meaning in music. Psychology of Music, 9(2), 17-31.
Qureshi, R.B. (1986). Sufi music of India and Pakistan : sound context and meaning in Qawwali. Cambridge, England: Cambridge University
Press.
Sloboda, J.A. (1991). Music structure and emotional response: some empirical findings. Psychology of Music, 19, 110-120.
Vaughn, K. (1994). Mode and mood in North Indian raga. Proceedings of the 3rd International Conference for Music Perception and
Cognition, Liège, Belgium, 110-113.
Back to index

COMMUNICABILITY OF EMOTION THROUGH MUSIC ROOTED IN EARLY HUMAN VOCAL PATTERNS
Proceedings paper
COMMUNICABILITY OF EMOTION THROUGH MUSIC ROOTED IN

EARLY HUMAN VOCAL PATTERNS
Inge Cordes, Universität Bremen,FB 9
icordes@uni-bremen.de
Music is known to be connected with emotion. Yet, it is evident that music is a product of cultural
development. Consequently it is nearly impossible really to understand the meaning of music from a
different culture. This can even apply to different styles or epochs of one's own culture. Following
Dowling & Harwood (1986) there are musical signs which are of a symbolic character that can induce
emotions, provided they are familiar. The understanding of music often takes place at levels of a high
degree of complexity. Nevertheless, music is often supposed to be a "language" common to all humans.
If this is true, there must also exist signs which go back to early ages of human development and which
retained their meaning. It is very difficult to discover them, because early communication has not been
fossilized. In order to get round this problem it seems useful to focus our attention on the musical
elements which are used by parents in motherese for communication with their infants. The use of
motherese occurs suddenly when there is need for it and it has been proved to be cross-cultural (Grieser
& Kuhl, 1988; Fernald et al., 1989; Fernald, 1992; Papoušek, M. 1994). H. Papoušek (1985) suggests
that motherese is a revival of an early form of communication used to build a bridge to the preverbal
child.
In this special way of talking several different pitch contours are produced by marked modulation of the
fundamental frequency and by prolonging the underlying syllable. These melodic contours, as they are
called, are effective in mediating emotional messages. This is in line with the observation of Williams &
Stevens (1982) that the course of the fundamental frequency in time most clearly characterizes a
speakers' emotional state. The use of melodic contours is linked to social contexts. So for catching the
infant's attention and encouraging it to imitate or to take its turn in a dialogue parents increase the use of
rising contours. Contrastingly, to soothe an infant softly declining falling melodic contours are used.
Bell-shaped melodic contours prevail in approval of a desired behaviour like smiling; they decline softly.
To discourage unwanted behaviour steeply declining falling- and bell-shaped melodic contours will
occur (Stern, Spieker & MacKain, 1982; Jacobson, Boersma, Fields & Olson, 1983; Werker & McLeod
1989; M. Papoušek, H. Papoušek & Symmes, 1991).
It is to these musical elements of parent-infant communication that H. & M. Papoušek (1995) trace back
the roots of musical development. So the question arises, whether these melodic contours, which are
known to transform into prosody of speech, remain effective in music, especially in the melody of
singing.
In order to answer this question, four different song categories were chosen with two each forming
contrasting pairs. They are connected to different social situations which also occur in parent-infant
contacts. These categories are 'Songs to arouse Attention' (A), 'Lullabies' (L) as soothing songs,
'Warriors' Songs' (W) and 'Praise Songs' (P).
file:///g|/poster3/Cordes.htm (1 of 5) [18/07/2000 00:37:42]

The songs were taken from ethnological archives. They should be from a large variety of cultures and be
transmitted orally only. Their melodic construction was analyzed in order to compare it with linguistic
research. For this purpose the tapes were played at half speed and the pitch contours were written down
by hand. Truly comparable pitch contours should be expected to be composed differently in the
examined song categories always according to purpose.
As a first result it can be noted that the melodies of the songs are really composed of single pitch
contours whose figures are similar to the melodic contours of motherese.
In accordance with the melodic contours of motherese they were specified as Level, Rising, U-shaped,
Falling, Bell-shaped, Sinusoidal and Complex.
The Level-contours can be on one-level, "1l", or on two levels like coockoo-calls, "2l".
To facilitate interpretation the following forms were subdivided further:
Rising melodic contours can rise steeply or softly. This is described as "st" or "so".
The same differentiation applies to the U-shaped contours which in motherese are also used to get the
child's attention.
Falling and Bell-shaped melodic contours can decline steeply or softly. That is correspondingly marked
"st" or "so".
The Sinusoidal melodic contours can be shaped differently, producing different effects. Therefore such
Sinusoidal contours which consist of lined-up Bell-shaped contours are called Sinusoidal-Bell-shaped
contours with the subdivisions SBst and SBso. Often Sinusoidal melodic contours contain large leaps,
named SLst and SLso respectively. Sinusoidal melodic contours can also be softly swinging, specified as
SSst or SSso.
Finally, Complex melodic contours comprise lipthrills, short exciting cries, whistling etc.
As a second result it can be noted that the composition of forms of melodic contours shows
different focal points in the four song categories.
This is presented in the table below.
Song Amount of Melodic Contours (%)
category Level Rising U-Shaped Falling Bell-shaped C / sinusoidal
1l 2l st so st so st so st so SBst SBso SLst SLso SSst SSso C
A 17.06 21.91 2.51 19.73 6.02 32.78
n = 22 15.39 1.67 19.40 2.51 1.84 0.67 7.86 11.87 2.17 3.85 0.50 0.17 7.69 8.03 0.67 15.72 0.00
W 13.15 6.62 3.54 35.00 9.49 32.21
n = 38 11.90 1.25 3.83 2.79 2.71 0.83 22.18 12.81 5.37 4.12 0.13 0.54 11.15 10.98 1.21 1.46 6.74
P 5.81 3.47 2.28 21.91 15.48 51.05
n = 42 5.30 0.51 1.99 1.45 0.40 1.88 9.05 12.86 7.80 7.68 3.47 5.75 10.13 12.81 4.61 12.52 1.76
L 4.03 5.00 2.67 23.77 20.43 44.10
n = 83 3.01 1.02 3.40 1.60 0.83 1.84 8.49 15.28 8.64 11.79 1.31 2.81 12.03 11.50 3.59 12.42 0.44

Table 1. Distribution of forms of melodic contours in the four song categories (%). The upper numbers
always show the whole amount of the respective form. The differentiated values are given in the
numbers below.
As can be seen from the table, steeply Rising melodic contours prevail in the 'Songs to arouse Attention'
and, compared to other song categories, they have the largest amount of 2l-melodic contours. This is
reminiscent of the findings about corresponding social contexts in linguistic research. Also, the
1l-melodic contours as well as the SSso form of Sinusoidal melodic contours are emphasized. 1l-
melodic contours are presumed to fix the listener's attention and the softly Swinging Sinusoidal melodic
contours seem to have a similar effect, but in a somewhat moderate manner.
The 'Warriors' Songs' are characterized by falling melodic contours with the focal point on the steeply
declining version, which also fits the results about motherese. Additionally, 1l- melodic contours are also
of importance in these songs. Sinusoidal melodic contours containing large leaps represent a big amount,
SLst and SLso. Noteworthy is the number of complex contours compared to the other song categories.
The latter two have the effect of being rather arousing. So they may support the effect of the steeply
declining Falling melodic contours.
In 'Praise Songs' both versions of Falling melodic contours with the focal point now on the softly
declining form and both versions of Bell-shaped melodic contours predominate. Additionally they
contain the largest number of Sinusoidal melodic contours. All of this is reminiscant of the preferential
use of melodic contours in parent-infant communication concerned with rewarding.
'Lullabies' also show mainly Falling- and Bell-shaped melodic contours, but with the difference that now
the focus is in each case on the softly declining part. Sinusoidal melodic contours are also highly
important. Interestingly, only the amount of Sinusoidal-Bell-shaped forms is smaller than in the 'Praise
Songs'; the melodic contours in 'Lullabies' are of less complexity in accordance with the infant's capacity
of reception.
However, certain reservations have to be made. On closer inspection we will find at least four different
types of lullabies. They differ with regard to the composition of melodic contours and, connected to that,
they differ in function. Lullabies can be very soothing, moderately soothing, entertaining and even be of
a warning character (Cordes, 1998). But the prevailing type of lullaby is the moderately soothing one and
which praises the infant's behaviour. Therefore the averages reflect those which are known from
motherese to have the corresponding effect, i.e. softly declining Falling- and Bell-shaped melodic
contours. This explains why 'Lullabies' and 'Praise Songs' look rather similar with regard to the
composition of melodic contours.
Summarising the findings it can be stated that the outstanding forms of melodic contours of each
song category are comparable to those which are preferably used in corresponding social contexts
in motherese.
Finally there is one more important fact which came out. From the duration of a song and the number of
its melodic contours the average duration of the melodic contours was calculated. Thereby it became
obvious that average duration differs according to song category. In 'Songs to arouse Attention' they are
longest, with 5.14 sec on average, followed by 'Lullabies' and 'Praise Songs' with 4.51 sec and 3.99 sec
of average duration resp.. The melodic contours of 'Warriors' Songs' turned out to be shortest, with 2.50
sec on average. These findings are particular relevant when placed in relation to the ethologist
Tembrock's (1971) research. According to him, in the acoustic system of animals, the control of distance

is very important and he claims that distance-reducing calls, which have an attracting effect, are
characterized by a relatively long rise time, until they reach full amplitude, as well as by longer temporal
extension. In vertebrates they have a tonal character with dominating frequencies. In contrast, calls
which widen the distance between individuals reach maximum amplitude quickly and are of short
duration. They are not or only irregularly repeated and are of a noisy character. The broad spectrum of
frequencies prevents or restricts an adaptation in recipients and causes them to seek refuge, often in
flight. Spittka (1969) has been able to show that rats, having the opportunity to choose between different
sounds, prefer those with long rise times, of longer duration and with rhythmic repetition.
In communication with animals we use these effects, though unconsciously, when luring or shooing
them away. The findings of M. Papoušek et al. (1990) prove that these effects are meaningful in
preverbal parent-infant communication, too. They have shown that expanded melodic contours manage
to attract the attention of four-month old infants, whereas a short rising-falling contour fails to do so.
Relating these findings to my own results it is obvious that the temporally stretched melodic contours of
'Lullabies' and 'Praise Songs' and contrastingly the short duration of melodic contours in 'Warriors'
Songs' conform to Tembrock's statements regarding animal acoustic behaviour. The same is true for the
large amount of steeply declining Falling melodic contours in 'Warriors' Songs', which just start at the
highest point, as well as for those Bell-shaped melodic contours that rise and decline suddenly.
Following Tembrock, they have the effect of being most aggressive. In line with this is the outstanding
number of noisy complex melodic contours in this category and finally the noticeably larger number of
different forms of melodic contours which are used in a song. By contrast, the prevailing Falling- and
Bell-shaped melodic contours of 'Lullabies' and 'Praise Songs', which belong to pleasant situations,
mainly decline softly. A moderately sudden decline of them serves to keep the listener's attention (M.
Papoušek, H. Papoušek & Symmes, 1991).
Finally, the especially long duration of melodic contours in 'Songs to arouse Attention' has a counterpart
in the 'long-distance calls' of animals which are used to call for someone very distant or as alarm calls
(Tembrock, 1971).
Conclusion
The melodies of songs have been proved to be composed of melodic contours similar to those that
parents of preverbal infants use to convey emotional meanings. This correspondence suggests that the
primordial connection of music and emotion has developed from an early form of human
communication. Additionally, the correspondence of important features of melodic contours with
features in animal sounds suggests that in some ways the emotional meaning of these human acoustic
figures can be traced back to the pre-human state of development. So one can suppose that music
expression and comprehension originate at least in part on a basic level which share all humans.
True, a large number of researchers has proved that pitch contour is highly important for recognition of
melodies. Typically this applies especially to children and untrained grown ups. While this lends general
support to my findings, further research is needed to buttress them.
References
Cordes, I. (1998) Melodische Kontur und emotionaler Ausdruck in Wiegenliedern. In K. E. Behne, G.
Kleinen & H. de la Motte-Haber (Eds.). Musikpsychologie: Jahrbuch der Deutschen Gesellschaft für
Musikpsychologie. Göttingen, Bern, Toronto, Seattle, Hogrefe-Verlag.

Fernald, A. (1992) Meaningful Melodies in Mothers' Speech to Infants. In H. Papoušek, U. Jürgens & M.
Papoušek (Eds.). Nonverbal Vocal Communication: Comparative and Developmental Approaches.
Cambridge University Press, Cambridge and Editions de la Maison des Sciences de l'Homme, Paris. pp.
262-282.
Fernald, A., Taeschner, T., Dunn, J., Papoušek, M., De Boysson-Bardies, B. & Fukui I. (1989).A
Cross-Language Study of Prosodic Modifications in Mothers' and Fathers' Speech to Preverbal Infants. J.
Child Lang., 16, 477-501.
Grieser, D. L. & Kuhl, P. K. (1988). Maternal Speech in a Tonal Language: Support for Universal
Prosodic Features in Motherese. Developmental Psychology, Vol. 24, No. 1, 14-20.
Jacobsen, J. L., Boersma, D. C., Fields, R. B. & Olson, K. L. (1983). Para-linguistic Features of Adult
Speech to Infants and Small Children. Child Development, 54, 436-442.
Papoušek, H. (1985) Biologische Wurzeln der ersten Kommunikation im menschlichen Leben. In W.
Böhme (Ed.). Evolution und Sprache: Über Entstehen und Wesen der Sprache. Tron, Karlsruhe. pp.
33-47.
Papoušek, H. & Papoušek, M. (1995) Beginning of Human Musicality. In R. Steinberg (Ed.). Music and
the Mind Machine. Verlag Springer, Berlin- Heidelberg-New York-London-Paris-Tokyo-Hong
Kong-Barcelona-Budapest. pp. 27-34.
Papoušek, M. (Ed.) (1994) Vom ersten Schrei zum ersten Wort. Anfänge der Sprach-entwicklung in der
vorsprachlichen Kommunikation. Verlag Hans Huber, Bern-Göttingen-Toronto-Seattle.
Papoušek, M., Papoušek, H., Bornstein, M. H., Nuzzo, C. & Symmes, D. (1990). Infant responses to
prototypical melodic contours in parental speech. Infant Behavior and Development, 13, 539-545.
Papoušek, M., Papoušek, H., Symmes, D. (1991). The meaning of melodies in motherese in tone and
stress languages. Infant Behavior and Development, 14, 415-440.
Spittka, O. (1969) Akustische Wahlversuche mit Albino-Ratten in der Skinner Box. PhD.,
Humboldt-Universität, Berlin. Unpublished.
Stern, D. N., Spieker, S. & MacKain, K. (1982). Intonation Contours as Signals in Maternal Speech to
Prelinguistic Infants. Developmental Psychology, Vol. 18, No. 5, 727-735.
Tembrock, G. (Ed.) (1971). Biokommunikation. Informationsübertragung im biologischen Bereich, Teil
II. Akademie-Verlag, Berlin.
Werker, J. F.& Mc Leod, P. J. (1989). Infant Preference for Both Male and Female Infant-Directed Talk:
A Developmental Study of Attentional and Affective Responsiveness. Canadian Journal of Psychology,
43 (2), 230-246.
Williams, C. E. & Stevens, K. N. (1982) Akustische Korrelate diskreter Emotionen. In K. R. Scherer und
P. Ekman (Eds). Approaches to Emotion. Lawrence Erlbaum Associates, Inc. Publishers, Hillsdale, New
Jersey. pp. 307-325.
Back to index

MELODIC MUSICAL INTERVAL OCCURRENCE AND PERCEIVED EMOTIONS IN CLASSICAL AND SERIAL MUSIC
Proceedings paper
Marco Costa, Serena Rossi, Luisa Bonfiglioli, Pio Enrico Ricci Bitti
Department of Psychology, University of Bologna, Italy
costa@psibo.unibo.it
Aim. The aim of this study was to investigate the relations between the statistical occurrence of the different musical intervals in melodies and the evoked emotions. The
hypothesis was that the expression of a particular emotion in music is associated with a distinct interval frequency distribution.
Method. Thirty-four melodic pieces from tonal classical music and four melodic pieces from serial music (Le Pierrot Lunaire by Schönberg) were chosen and presented to a
sample of 19 students without a specific musical training for the evaluation of their emotional content on a semantic differential composed by 10 bi-polar adjective scales.
Melodies that polarized on each adjective were selected and a complete intervallic analysis was performed. Without considering pauses, each interval was classified within 15
categories. An interval frequency distribution for those melodies that characterized each adjective was obtained and analyzed.
Results. A global interval ranking showed that in tonal music unisonous and seconds accounted for the 66.5% of total intervals and if intervals until the perfect fifth were
considered the 92.5% was reached (Figure 1). In Schönberg's pieces there was a significantly lower frequency of unisonous, min II, maj II, min III, and a greater frequency
diminished and augmented intervals within the octave (Figure 2). In melodies rated as pleasant and agreeable there was an higher presence of unisonous, maj II, min III, perfect
IV and a lower occurrence of maj VII, diminished, augmented, and intervals greater than the octave (Figure 3). Melodies expressing unhappiness were composed by a higher
occurrenceof unisonous whereas melodies expressing happiness were formed by an higher occurrence of maj VI, octave, and intervals greater than the octave (Figure 4).
Melodies with a positive affective connotations (relaxed, stable, calm,serene, carefree) were characterized by a frequent use of maj II, min III, and perfect IV. To the contrary
melodies with a negative affective connotation (restless, unstable, fearful, worried, anguished) were characterized by an higher occurrence of unisonous, min II, diminished and
augmented intervals (Figure 5). Melodies evoking power had an higher frequency of unisonous, a lower frequency of min II, maj II, min III, maj III, an higher frequency of
diminished, augmented, and intervals wider than the octave (Figure 6).
Conclusions. The study of interval occurrence has revealed an effective tool for the investigation of emotional content in melodies.Furthermore this research has showed that the
dimensional property of musical intervals (its amplitude ranging from unisonous to intervals greater than the octave), it is important for the explanation of perceived emotions in
melodies.
file:///g|/poster3/MCosta.htm (1 of 8) [18/07/2000 00:37:47]

Figure 1

Figure 2 (* = p < .05, ** = p < .01, *** = p < .001)

Figure 3

Figure 4

Figure 5

Figure 6

Back to index

THE INFLUENCES OF A CONCURRENT AUDITORY FREQUENCY ON THE PERCEPTION OF AN AMBIGUOUS VISUAL STIMULUS
THE INFLUENCES OF A CONCURRENT AUDITORY FREQUENCY ON THE PERCEPTION OF AN
AMBIGUOUS VISUAL STIMULUS
DARCEE L. DATTERI & TIMOTHY L. HUBBARD
TEXAS CHRISTIAN UNIVERSITY
1. BACKGROUND
A basic cinematic experience involves the integrative perception of auditory

and visual stimuli. A growing body of literature has not only examined
judgments of background music, but also the psychophysiological responses to
different auditory and visual stimuli, as well as the influential relationships
between audiovisual components.
2. AIMS
Observers presented with a visually ambiguous stimulus and a concurrent

auditory stimulus were predicted to disambiguate the visual stimulus depending
on which type of auditory stimulus was presented (high/low tone;
ascending/descending scale).
3. METHOD
The visual stimulus consisted of a square wave grating consisting of 14

alternating black and white bars, and the auditory stimuli consisted of either
high or low pitches or ascending or descending scales. Participants then
determined which part of the visual stimulus seemed to "stand out."
Participants were also asked to rate how confident they were in their choice of
black or white for each auditory/visual stimulus pair.
4. RESULTS
The presentation of low tones was significantly more likely to result in

judgments of the black areas as figure, whereas the presentation of high tones
was significantly more likely to result in judgments of the white areas as
figure. Also, this effect appeared to be independent of the level of musical
experience, the sex of the participants, or confidence in their judgments.
5. CONCLUSIONS
This study began to bridge the gap between visual film and musical underscoring
by examining the relationships between very basic and simple visual and
auditory stimuli. Allowing for precise laboratory control over stimuli, it was
somewhat removed from the level of complexity involved in musical underscoring.
Even so, robust effects of auditory frequency were found, thus helping to
establish the role of auditory stimuli in the interpretation of visual stimuli
on a firm scientific basis.
Back to index
file:///g|/poster3/Datteri.htm (1 of 2) [18/07/2000 00:37:48]

THE INFLUENCES OF A CONCURRENT AUDITORY FREQUENCY ON THE PERCEPTION OF AN AMBIGUOUS VISUAL STIMULUS
file:///g|/poster3/Datteri.htm (2 of 2) [18/07/2000 00:37:48]

THE MULTIDIMENTSIONAL PERCEPTION SPACE OF BAROQUE MUSIC
Proceedings paper

Dorottya Fabian Somorjay
and
Emery Schubert
Sydney NSW 2052 Australia
d.fabian@unsw.edu.au
Phone: +61-2 9385 6954
Fax: +61-2 9313 7326
Submitted to the 6th International Conference of Music Perception and Cognition, Keele, United Kingdom, August 2000
Background
The study of historical performance practice has occupied a prominent place in musicological investigations of baroque music
in the 20th century, but particularly since the 1960s. Instruments of the period have been revived and contemporary documents,
including treatises of performance practice, studied and interpreted (Dolmetsch 1949, Donington 1963, Hubbard 1965,
Neumann 1978, 1982, 1989). By the 1980s-1990s a consensus began to be formed as to the characteristics of a historically
stylish performance of baroque compositions. Although certain publications of the field, especially the more philosophically
and psychologically oriented ones (Farnsworth, 1969; Kenyon 1984, 1988, Kivy 1995, Taruskin 1995, Wiora 1968), implicitly
acknowledged that taste and enculturation have a major part in evaluating musical performances, musicologists failed to study
the performance style and its components from an empirical point of view. In other words, the listening process has largely
been ignored. The current project undertook the first step in rectifying this situation.
During her investigations of baroque performance practice issues the first author (Fabian, 1998) questioned the validity of some
of the claims musicologists made regarding the important characteristics of the historically informed performance. While they
often referred to tempo, ornamentation and use of baroque instruments, her experience suggested that these elements were not
as crucial as pulse or meter, articulation and accenting. Before being able to discuss this issue, a broader question needed to be
addressed ? What are the dimensions that underlie the listeners' perception of baroque music? To address this problem, we
decided to adopt an empirical approach. The study reported is largely exploratory, however, we propose that key dimensions of
the perception of baroque music will be related to the constructs of stylishness and expressiveness. This is in line with the view
of certain writers (eg. Avison, 1752; Babitz, 1952, 1967, 1970; Donington, 1973, 1989; Rosenstiel, 1972; Schulenberg, 1990)
who look at the performer's perspective, rather than that of the listener.
Stylishness refers to the appropriateness of performance in its historical and musical context. This being adherence to historical
performance circumstances, such as size of ensemble and type of instruments, and the use of historically documentable
instrumental techniques, such as fingering, tonguing and bowing, and other period performance practices such as articulation,
ornamentation and metre (as understood by eighteenth century practitioners). To this end, stylishness is the most important
issue in performance practice theory. Significantly its evaluation depends on perception as well, and not just on musicological
prescriptions. Expressiveness generally refers to interpretative and individual aspects of a performance. However there is a
crucial need to identify the parameters of musical expressiveness as distinguished by various musical styles. For example, in
baroque music, using rhetorical gestures and a speech-like flexibility in rhythmic patterns is perfectly in line with the period's
performance practice. This is a kind of expressiveness that is peculiar to baroque and distinct from the common understanding
of expressiveness, such as rubato used in reference to romantic music.
Aim
file:///g|/poster3/Somorjay.htm (1 of 6) [18/07/2000 00:37:52]

The aim of the present investigation was to determine the underlying dimensions of the baroque listening experience (as
experienced by an individual more typical of the population than a highly sophisticated listener) and to apply an alternate
methodology to that used by traditional musicologists. Specifically, we tested the notion that there are at least two underlying
dimensions of the baroque listening experience: Stylishness and Expressiveness.
Method
Developing a baroque Music Perception Inventory
In order to measure meaning and to test the validity of musicological opinion we turned to the semantic differential
methodology of Osgood, Suci and Tannenbaum (1957). Taking current musicological opinion regarding the components of a
historically stylish performance of baroque music we prepared an unipolar inventory consisting of 40 words to be rated by the
participants. Apart from the study by Gotleib and Konencni (1985), there existed no inventory for gathering information about
the baroque listening experience. Gotleib and Konencni investigated piano versus harpsichord performances of J. S. Bach's
Goldberg Variations. They obtained evaluations on cognitive, emotional and perceptual dimensions via 15 bipolar scales:
"clear-crisp/not clear-crisp, ugly/beautiful, wish to own/do not wish to own, pleasing/not pleasing, simple/complex, cold/warm,
exciting/not exciting, spontaneous/not spontaneous, weak/strong, interesting/not interesting, orderly/disorderly, slow/fast,
emotional/not emotional, surprising/not surprising, slightly differentiated/not slightly differentiated" (p. 92). While Gotleib and
Konencni's study provide a possible starting point, we chose not to adopt their rating instrument because the number of scales
used was fairly small and it did not address all of the issues that were of interest to us. Instead, the words were collected largely
from the writings of musicological scholars and from the researchers' own ideas. This method of establishing the inventory
provided us with a benchmark of well established musicological criteria. Each word was presented on the inventory as a
unipolar scale, appearing at one end as â€˜Very' [word] and at the other end as â€Ñot At All' [word]. For example, the word
â€˜flexible' appeared as:
Not at All Flexible --------------------------------------- Very Flexible
The participant placed a mark along the dotted line for each scale. The list of abbreviated scales is shown in Figure 1. The
words and phrases used in the inventory were:
all baroque instr., articulated, clear structure, detailed, even dynamics, expressive, fast tempo, flexible tempo, forced/intense tone
quality, good continuo, good dynamics, good performance, good pulse, good solo performance, good tempo, good texture, legato,
light tone, matter-of-fact, mechanistic, monotonous, over-expressive, phrased, regular accents, rhetorical, rhythmicised, romantic,
rubato, slow tempo, slurred, speech-like, springy, straight tone, strict, stylistically appropriate, transparent texture, vibrato tone, well
accented, well ornamented, well phrased
Stimuli
5 different recordings each were chosen of 2 musical excerpts: bars 82-125 of the 1st Movement (Allegro) of J. S. Bach's
Brandenburg Concerto No. 4 (ca. 1 minute in length), and bars 1-16 of the 2nd (Adagio) Movement of J. S. Bach's Brandenburg
Concerto No. 1 (duration ca 1 minute 50 seconds). (see Appendix for details). These were played from a cassette tape in two
different orders. The choice of the examples was based on the first author's study of over 40 complete Brandenburg sets
(Fabian, 1998), and represented various aspects of debatable performance practice issues, such as choice of instruments,
phrasing, articulation, tempo, tone production, clarity of texture, and so on.
Participants
44 volunteers took part in the study. Approximately half of them were 3rd year or graduate music students at the University of
New South Wales or musicians specialising in performing on baroque instruments. The other group consisted of novices (i.e. 1st
year introductory music students at University of New South Wales). Only pooled data responses are reported here.
Procedure
The participants were sitting in a classroom. They filled in a questionnaire regarding their musical background. Following this
they received brief instructions about how to complete the inventory. Three versions of the inventory were distributed to
participants. Each version had a different order of items and different order of scale poles. Participants were encouraged to
provide their first reaction to each example. Each example was played once (cassette player operated by instructor) and was
followed by enough time (silence) for everybody to complete his or her answers. The session took approximately 60 minutes.

Figure 1. Factor Loadings for Experiment
Scale abbreviations are described in the Method section.

Scales are listed on the x-axis in alphabetical order.
Only loadings greater than |Â±0.35| are shown.
Factor analysis was conducted with components extracted for eigenvalues of greater than 1 using SPSS Version 6.1.1 for
Macintosh software. The first three factors explained 45.8% of variance, and the first two explained 27.1% and 11.6%
respectively. Factor loadings are shown in Figure 1. The items which loaded onto the first factor consisted of two kinds?those
which were related to the stylishness of the musical items, such as Speech-like, Articulated, Clear Structure and, of course
Stylishness; and those which required value judgements (Good-Articulation, Good-Performance, Good-Pulse etc) also loaded
onto this factor. This suggests two things: (1) the first dimension is related to Osgood, Suci and Tannenbaum's â€˜Evaluative'
dimension, and, therefore (2) judgments related to stylishness of the excerpts is associated with positive evaluations. The
second dimension was loaded highly with variables that were associated with expressive aspects of the performance, such as
Flexible, Over-Expressive, Romantic and a negative loading from Mechanistic. The third factor did not suggest a clear cut label
to the researchers, and given its relatively weak contribution to the variance it was omitted from further analysis.
The first two factors were investigated further. Given that the theoretical interest was that stylishness is an important dimension
of the perception of baroque music, we tried to explain stylishness response as a composite of other responses. A stepwise
linear regression was performed (using SPSS) in which the Stylishness variable was modelled in terms of the other scales used.
The results of the analysis are summarised in Table 1. The scales which could be used to explain stylishness (at p = 0.05) were
Articulation, Speech Likeness, and Not Romanticness (-ve coefficient). Interestingly, only two evaluative scales (Good
Performance and Well-Ornamented) were entered into the model. The model explained 78.7% of the variance in Stylishness
response. This results suggest that the first factor relates to something more complex than evaluation. Most likely, part of a
perceived good performance is that it is stylish, and stylishness together with judged quality may form an important dimension
of the baroque listening experience.
Table 1. Summary of Stepwise Regression Model of Stylishness
Multiple R .89378
R Square .79885
Adjusted R Square .78673
Standard Error .91975

Analysis of DF Sum of Squares Mean Square

Variance
Regression 5 278.84 55.76
Residual 83 70.21 .85
F = 65.92455 Signif F = .0000
Variables in the Equation
Variable B SE B Beta T Sig T
ARTICULA .275 .056 .273 4.898 .0000
GOOD_PER .463 .066 .447 7.010 .0000
ROMANTIC -.264 .048 -.272 -5.453 .0000
SPEECH_L .332 .075 .279 4.383 .0000
WELL_ORN .186 .064 .159 2.897 .0048
(Constant) .255 .413 .617 .5390
A similar regression analysis was conducted with the proposed â€˜expressiveness' label of the second factor. The resulting
model is summarised in Table 2. This model explained 66.8% of variation in Expressiveness responses and did so in terms of
the Over-Expressiveness, Romantic and not Mechanistic scales (at p = 0.05). In this analysis three evaluative scales were also
included: Good Performance, Good Tempo and Well Accented. In contrast to the factor analysis, the regression analysis
suggests that evaluative scales are used to describe elements of expressiveness and stylishness. However, by taking only the
non-evaluative scales of the factor analysis and the regression analyses together, we tentatively conclude that the stylish
performance is judged primarily according to articulation and speech-likeness. An expressive performance is judged primarily
according to how romantic and unmechanistic it was.
Table 2. Summary of Stepwise Regression Model of Expressiveness
Multiple R .83
R Square .69
Adjusted R Square .66
Standard Error 1.05
Analysis of Variance DF Sum of Square Mean Square
Regression 6 204.69 34.12
Residual 82 91.71 1.12
F = 30.50225 Signif F = .0000
Variables in the Equation
Variable B SE B Beta T Sig T
GOOD_PER .390 .094 .409 4.129 .0001
GOOD_TEM .203 .090 .175 2.243 .0276
MECHANIS -.186 .086 -.180 -2.171 .0329
OVER_EXP .236 .060 .272 3.893 .0002
ROMANTIC .151 .064 .169 2.345 .0214
WELL_ACC .210 .085 .185 2.454 .0162

(Constant) -.323 .973 -.332 .7407
Conclusions
This study provides a first step in understanding how listeners interpret baroque performance. While a clear identification of the
theoretical dimensions of stylishness and expressiveness await further investigation, the present study provides musicologists
with an alternative methodology for understanding highly complex performance practice issues from a more objective and
scientifically plausible stance.
A thorough investigation is required in the method of selecting the scales upon which to judge items. Since the terminology
stems from highly sophisticated listeners (musicologists), there will be occasions when the terminology becomes unclear or
interpreted â€˜incorrectly' by the more typical listener. For example, in debriefing participants, we discovered that the terms
â€˜Texture' â€˜Detailed' and â€˜Well-Accented' were probably used with different shades of meaning than that intended in
musicological literature.
Acknowledgement
The authors are grateful to Kate Stevens and members of the Australian Music Psychology Society (AMPS) for their comments
on an earlier draft of this work.
References
Avison, C. (1752). An essay on musical expression. London
Babitz, S. (1952). A problem of rhythm in baroque music. The Musical Quarterly 38: 533-565
Babitz, S. (1967). Concerning the length of time that every note must be held. Music Review 28: 21-37
Babitz, S. (1970). The great baroque hoax: a guide to baroque music and performance for connoisseurs. Los
Angeles: Early Music Laboratory
Dolmetsch, A. (1949). The interpretation of the music of the 17-18th centuries. London: Novello (1st published
1915).
Donington, R. (1963). The interpretation of early music. London: Faber (revised: 1973, 1989).
Donington, R. (1973). The performer's guide to baroque music. London: Farber
Fabian, D. (1998). J. S. Bach recordings 1945-1975: The Passions, Brandenburg Concertos and Goldberg
Variations ? A study of performance practice in the context of the early music movement. Unpublished Doctoral
Dissertation. The University of New South Wales.
Farnsworth, P. R. (1969). The social psychology of music (2nd ed.). Ames, Iowa: Iowa State University Press.
Gotleib, H. and KoneÄ•ni V. J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg
Variations by Johann Sebastian Bach. Music Perception, 3: 87-102.
Hubbard, Frank (1965). Three centuries of harpsichord making. Cambridge, Mass: Harvard University Press
Kenyon, N. (Ed.) (1988). Authenticity and early music. London: Oxford
Kenyon, N. (1984). The limits of authenticity ? a discussion. Early Music, 12: 3-25.
Kivy, P. (1995). Authenticities - philosophical reflections on musical performance. Ithaca: Cornell University
Press
Neumann, F. (1978). Ornamentation in baroque and post-baroque music. Princeton: Princeton University Press
Neumann, F. (1982). Essays in performance practice. Ann Arbor, Mich: UMI Research Press
Neumann, F. (1989). New essays in performance practice. Ann Arbor, Mich: UMI Research Press
Osgood, C. E., Suci, G. J. & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of
Illinois Press.

Rosenstiel, L. (Ed.) (1972). .The spheres of music: harmony and discord. Current Musicology 14: 81-172
Schulenberg, D. (1990). Expression and authenticity in the harpsichord music of J. S. Bach. The Journal of
Musicology 8: 449-476
Taruskin, R. (1995). Text and act ? essays on music and performance. Oxford: OUP
Wiora, W. (Ed.) (1968). Alte Musik in unserer Zeit ? Referate und Discussionen der Kasseler Tagung 1967
Musikalische Zeitfragen Vol. 13 Kassel-Basel: Bärenreiter
Appendix. List of Recordings Used in Experiment
Virtuosi of England, directed by Arthur Davison. EMI Classics for Pleasure CFP 40010 (rec.1971) [Brandenburg
No. 4 â€Ãllegro' only]
Collegium Aureum, no director listed. Victrola VICS 6023 RCA (rec: 1965) [Brandenburg No. 1 â€Ãdagio' only]
Academy of St Martin-in-the-Fields, directed by Neville Marriner. Philips 6700045 (rec. 1971)
Concentus Musicus Wien, directed by Nikolaus Harnoncourt. Telefunken â€˜Das Alte Werk' SAWT 9459-60 (rec.
1964)
Sigiswald Kuijken and other soloists, directed by Gustav Leonhardt. Seon RL 30400 EK (rec: 1976-1977)
Concentus Musicus Wien, directed by Nikolaus Harnoncourt. Teldec â€˜Harnoncourt Edition' DAW 8.42823 XH /
242925-2 (rec. 1982)
back to index

Hampson
Proceedings paper
A Naturalistic Empirical Investigation of Deryck Cooke's Theory of Music and Meaning

Peter Hampson (Keele University)
1. Introduction
This particular study is concerned with an experimental testing of Deryck Cooke's theory about
emotion in music as expounded in his book "The Language of Music" (1959). In it he claimed that
an analysis of tonal music reveals a consistent use of particular patterns of pitches linked to
quite precise emotional meanings. He argued that these meanings constituted a shared language
available to anyone familiar with the idiom and further, that they arose from tensions inherent
in the relationships between musical sounds arising from their origins in the harmonic series.
Cooke illustrated his thesis by reference to many examples of music where the expressive aim of
the composer could be inferred from an accompanying text. He then extended the argument by
analogy to other, purely instrumental, music. In the central chapter of his book he identified
16 basic melodic patterns derived from this material, the meanings of which, he claimed, could
be reliably understood and agreed.
2. Background
Cooke's theory has a persuasiveness about it which invites serious consideration. Given the
importance we attach to melodic contour in our identification and memory of music, it would
indeed seem to have a reasonable claim to be a significant carrier of emotional meaning.
Further, the sheer quantity and range of the examples Cooke produced in support of his ideas
means that they cannot be lightly disregarded, although some writers have been unimpressed by
this. Zuckerkandl (1960), for example, argued that the musical examples used had been selected
to fit the hypothesis, whereas counter-examples could also be easily found. Langer (1957),
though not referring to Cooke, considered that this kind of methodology simply reveals
conventions in musical usage.
However, Cooke specifically rejects the notion that cultural conditioning can be the whole
explanation for his observations:
".....it is difficult to believe that there is no more to it than that. .....one can only
wonder how certain patterns of tone setting ever came about in the first place to 'correspond
with certain emotional reactions on the listener's part', unless the correspondences were
inherent ....." (Language of Music p. 24/25).
Cooke's thesis fits into a model of the relationship between music and emotion which was
current at the time he was writing. Earlier research had established that the perceived
emotional content of music was widely shared among listeners (Schoen & Gatewood, 1927;
Gundlach, 1935; Hevner, 1935a). Hevner, in a series of experiments, (1935b, 1936, 1937),
systematically explored the expressive effects produced by varying specific features of the
music. She concluded that, in order of effectiveness, predictable effects upon the perceived
emotion in music were produced by changes to the tempo, mode, pitch, harmony and rhythm,
although sometimes the influence of one dimension was so strong as to inhibit the impact of
others. Only one of the features she examined, that of melodic direction, had no impact. Given
the significance imputed to melodic direction by Cooke this would seem to undermine one of his
major claims. However, since he makes no attempt to offer any psychological evidence himself,
it is impossible to know how he would have countered that finding.
In more recent years there has been a move away from this essentially static view of the
relationship between music and emotion, particularly since Meyer (1956) drew attention to the
importance of expectation and resolution in the emotional structure of a composition. Interest
has switched to identifying the features which are believed to be implicated in inducing
personal reactions (e.g. Sloboda, 1991). From that approach is developing a broader theory
based on analogies between the dynamics of music and what might broadly be called the human
condition (Sloboda, 1998) There has also been an increasing interest in the role of the
performer and what it is that s/he does which affects how listeners respond. (e.g. Gabrielsson
& Juslin, 1996).
At the same time, however, the notion of there being universal components to our perceptions of
music has continued to be argued. Dowling & Harwood (1986), for example, suggest a number of
apparently culturally universal features which appear to derive from underlying properties of
the human information processing system, while a recent study by PapouŠek (1996) concludes that
file:///g|/poster3/Hampson.htm (1 of 8) [18/07/2000 00:37:55]

Hampson
a series of observations on the earliest developments of musical behaviour suggest that:
".... the functions and meanings of melodic shapes originate from biological rather than
cultural roots." (p.97)
One might further speculate on the consonance of such a notion with the theory of 'basic
emotions' triggered by hard-wired neural networks designed to respond to certain kinds of
stimuli, particularly since Cooke drew some justification for his theory by reference to the
widespread and apparently inherent consonance between certain facial expressions and the
emotions they appear to express. While the theory of basic emotions is controversial (Ortony &
Turner, 1990) it does attract the support of some music researchers [e.g. Panksepp (1995);
Panksepp & Bekkedal (1997)]
3. Empirical Investigations
Cooke's work has not stimulated much interest from experimental music psychologists, possibly
because of the difficulty of producing a valid test. There only appears to be one published
study on it, by Gabriel (1978). His approach was to standardise the basic terms by recording
each one as a series of pure sine-wave tones. These tones were of identical volume and
duration. They were produced in the key of C major and, before each fragment was heard a chord
of C major/minor, as appropriate, was played. By these means Gabriel aimed to control for
stress, rhythm and timbre so that only the pitch changes would be influential. His subjects
were 22 students at a Polytechnic, following courses other than music. They listened to each
motif twice and were provided with a response sheet which paired it on one occasion to Cooke's
description of it and on the other occasion to a randomly assigned description drawn from
elsewhere on Cooke's list. The only limiting factor in this latter pairing was that the
modality was maintained. Respondents were ask to score each term for appropriateness on a five
point scale. An analysis of the results demonstrated that there was no significant difference
between the subjects' ratings of Cooke's own characterisation of the phrase and the randomly
assigned one.
4. Limitations of Gabriel's Study
Cooke himself foreshadowed one of the major concerns which Gabriel's methodology raises by
drawing attention to the importance of the holistic nature of the musico-emotional experience,
i.e. if the pitch sequences are divorced from a musical context the power of the emotional
message will inevitably be lost
A second problem is highlighted by more recent research findings which have drawn attention to
the importance of the role of the performer with regard to the emotional 'tone' of the music
[Gabrielsson & Juslin (1996)]. By removing that human element entirely from the experiment
Gabriel may again have compromised the result
A third difficulty arises with his use of sine-wave tones as a stimulus since the overtones of
the harmonic series, the presence of which, Cooke argues, give emotional tension to the
intervals of a motif, are missing.
Finally he paid scant regard to Cooke's emphasis on the importance of the listener's musical
sophistication since there is no evidence that his 22 students were musically experienced in
Cooke's terms.
All these factors suggest that it might be worthwhile trying to devise a fairer test for
Cooke's claims, so as to give the musical motifs the best opportunity to convey whatever
emotional charge they might carry.
5. A Naturalistic Alternative
There are a number of concerns which arise when proposing an experiment which uses real musical
examples as its stimuli rather than specially created ones. The first is the need to control as
much as possible for any variables, other than pitch change and modality, which could affect
the emotional interpretation of music. The second is to ensure that any excerpts used are long
enough to have some musical integrity, while being confined to only one basic term and avoiding
such features as modulations which could cloud both the tonality and the emotional tone of the
passage. A third is that the performances offered should be appropriately stylish without being
mannered. Finally there is the issue of how to focus the listener's response on the basic term
as opposed to the longer contextual passage. In consideration of these and other problems the
following criteria were established:
The extracts presented should be limited to 19th century piano music, since that would have the
advantages of restricting timbre and stylistic range while offering a wide choice of
repertoire.
suitable performances should be identified using the criteria that they are considered to be
idiomatic and that the recorded sound is of good quality.

Hampson
Each extract chosen should contain the exposition of a clear and tonally unambiguous exposition
of just one basic term
The selected motifs should have thematic significance rather than being primarily linking or
passage work and, as far as possible, be similar in terms of general tempo, rhythmic features
and dynamic levels
a coherent extract should be played to the participants, followed by two repetitions of the
particular basic term being considered before asking for a response.
participants should be provided with a written response sheet which would allow them to
quantify the strength of all of Cooke's descriptors to each extract.
participants should be experienced in the tonal idiom
Test material was prepared and trialled with members of the music psychology unit at Keele
University. The response instrument began with a cover page requesting information on age, sex,
and musical experience, followed by a brief explanation of the purpose of the study and a set
of instructions as to how the task should be completed. Attached to the cover page were further
sheets containing a series of response blanks, one for each extract. These listed Cooke's
descriptions of his basic terms. Listeners were asked to rate them as to how well each matched
their own perception of the emotion of the extract. Ratings were on a scale of 1 (No match) to
5 (Perfect match).
A tape of musical extracts was prepared. Each extract was copied from compact disc using a
Marantz CD-67 player linked to a Denon DRM 550 Stereo Cassette Tape Deck. Each was long enough
to establish a context for the basic term and the notes comprising the basic term itself were
then immediately recorded twice more. Participants completed the response sheets while
listening to the tape and were then invited to give feedback both about the experiment and the
rationale behind it. Subsequently a number of revisions were made, the most important being
that the number of basic terms to be tested was reduced from 16 to 9 so as to eliminate
redundancy caused by some terms being musically and descriptively close to one another. The
final selection of the basic terms, together with their musical exemplars, is given in Table 2
below.
Table 1: Basic Terms Selected for the Test
Mode Term Meaning Musical Excerpt
1. 1-2-3-4-5 An active, assertive emotion of joy, Impromptu in F# Op. 36.

Major exuberance, triumph or aspiration (Bars 7 - 12)
2. 51-1-(2)-3 A feeling of pain. An assertion of Impromptu in A$ Op.

Minor sorrow, a protest against misfortune 29(Bars 35 - 44)
3. 5-(4)-3-(2)-1 Passively experienced confident joy. Nocturne in A$ Op. 32 No.

Major Accepting or welcoming blessings, 2 (Bars 1 - 7)
relief, consolation, reassurance or
fulfilment
4. 5-(4)-3-(2)-1 Final acceptance of, or yielding to Mazurka in E minor Opus

Minor grief; passive suffering and despair; 17 No. 2 (Bars 1 - 13)
discouragement and depression.
5. 1-2-3-4-5-6-5 Innocence and purity. Absolute Kreisleriana No. 2 in B$:

Major happiness Schumann (Bars 1 - 4)
6. 1-(2)-3-(2)-1 Brooding - an obsession with gloomy Prelude in B minor Op. 28

Minor feelings, a sense of inescapable doom No. 6 (Bars 1 - 4)
7. 8-7-6-5 A confident incoming emotion of joy. Nocturne in F major Op.

Major An acceptance of comfort, consolation 15 No. 1(Bars 1 - 4)
or fulfilment
8. 1-2-3-4-5-6-5 A powerful assertion of Study in F Op. 10 No. 2 (Bars 1 - 8)

Minor minor fundamental unhappiness
9. 8-7-6-5 An incoming painful emotion. An Nocturne in F minor Op.55

Minor acceptance of, or yielding to grief. No. 1 (Bars 1 - 8)
Never ending despair.
6. Test Methodology

Hampson
In order to ensure a sample which would be experienced in the tonal idiom, several groups of
adult amateur instrumentalists were contacted. Three groups agreed to take part in the test, as
did a number of music students at a midlands conservatoire. Numbers were as follows:
Group 1 4 players Group 2 14 players
Group 3 6 players Students 10 players
The volunteers were each given a copy of the response sheets and the page of explanations and
instructions was read through with them verbatim. An opportunity was given to ask questions.
These were rare and none revealed any confusion about the nature of the test or the
instructions. For the final test, four random sequences of recordings had been prepared. Each
sequence began with the opening bars of Chopin's Ballade in A flat major followed by two
repeats of the first 8 notes as a trial 'basic term'. One of the four sequences of recordings,
previously assigned at random to that group, was then played. Of the 34 participants in the
sample, 9 heard the first sequence of extracts, 14 the second sequence, 7 the third sequence
and 4 the fourth sequence
The tape was stopped after each item and re-started after one minute. Participants were warned
10 seconds before each new sequence was about to start. Visual observation suggested that this
time was more than adequate for all responses to be completed. The tape was played on a Bush MC
113 CD Micro System provided by the writer. The whole test from initial explanation and
distribution of the response sheets through to their final collection took about 20 minutes.
Many participants expressed interest in the task and in the outcome of the test.
On the completion of all the tests the results were transcribed from the response sheets so
that scores for each basic term were collated, regardless of which particular sequence had been
heard. The raw scores were entered into an SPSS Spreadsheet for analysis.
7. Results
The results of the tests were analysed first to provide evidence for the hypothesis that it is
possible to detect significant relationships between 'real music' examples of Cooke's basic
terms and his verbal characterisations of them.
A failure to find such a significant relationship, however, need not mean that the results have
nothing to say. While Cooke's theory makes claims specifically for the emotional charge
generated by tonal relationships, he also recognises three component elements in their make-up.
One is modality: major motifs express positive emotions, minor ones express negative emotions.
A second is the direction of the movement: rising motifs are perceived as representing outgoing
or assertive emotions while falling motifs denote passive, receptive ones. A third is the
terminal point of the motif: those ending on the tonic are seen as expressing fulfilment or
completion while those moving away from the tonic lack that element. The responses were
therefore analysed to see if there is any evidence to support these components of the basic
terms.
It also seemed worthwhile investigating whether responses varied according to sex or to
previous familiarity with the specific musical examples chosen. Given the number of analyses to
be performed upon the data it was decided to consider only those factors which generated a
probability of less than 1% as being significant.
a) Overall Responses
First and foremost the results showed that there was only one extract (No. 3) where the
predicted definition was rated highest. The conclusion therefore is that Cooke's basic terms
have not been shown to convey the precise meanings he ascribed to them. However, the mean
rating of the statements did vary markedly for many of the extracts so further analyses were
undertaken to explore the effects of modality, contour and final note.
b) Modality
Scrutiny of the results suggested that, except for extract 8, the means fell into two clusters
in which the valency of the descriptors varied according to the modality of the music. Thus in
extract 1 (major) the most highly scored statements were the positively valenced ones. By
contrast extract 2 (a minor basic term) produced the strongest mean scores for the negatively
valenced descriptors.
In order to confirm this apparent relationship the responses were aggregated and subjected to
an anova test which proved significant at the p=< 0.001 level So, insofar as Cooke's
explanations for his theory include a modal dimension, this result offers some confirmation of

Hampson
a relationship between a perception of positive feelings and major tonality and correspondingly
between negative emotions and minor tonality, even if it lacks the precision Cooke claimed for
it.
c) Contour
(i) Major Mode

Evidence relevant to this dimension can be found by analysing the responses to specific
statements. With regard to rising contour, statements 1 ('An active, assertive emotion of
joy.....') and 5 ('Innocence and purity.......') in the major mode can be classified as
outgoing and should therefore be expressed best by extracts 1 and 5, both of which contain
rising basic terms. Similarly major mode statements 3 ('Joyful acceptance ....... fulfilment
and homecoming') and 7 ('A confident incoming emotion ........') should be best expressed by
the falling motifs of extracts 3 and 7. However, summaries of the relevant responses showed a
significant (p=<0.001) preference for the 'incoming' statements (3 & 7) in both contour
conditions, suggesting that listeners perceived the 'outgoing' extracts 1 & 5 as more 'passive'
than would be the case if Cooke's theory were applicable.
(ii) Minor mode
The motifs in minor keys produced similarly unconvincing evidence for the relationship between
active/passive emotions and rising/falling themes.
d) The position of the tonic
(i) Major mode
The significance or otherwise of the tonic termination of a basic term was examined by
comparing the responses to extracts 3 & 7. Cooke's theory would predict that statement 3
(Joyful acceptance .......fulfilment or homecoming) would be preferred for the former, and
statement 7 (Confident incoming emotion of ....anticipation or expectation) for the latter. The
actual results showed that statement 3 was the preferred descriptor for both extracts. In the
case of the basic term which ends on the tonic (Extract No. 3) the difference is not
significant, whereas there is a small significant difference at the p=0.025 level for Extract
No. 7 (i.e. the 'wrong way').
(ii) Minor mode

In the minor mode, responses similarly showed a preference for one explanation (statement 4)
over the other (statement 9) for both extracts. In the case of Extract 9 this generated a
significant difference of p < 0.05 the 'wrong way'.
f) Group Differences in Responses
(i) Sex
There were significant differences in the responses of men and women to extract 1 (p = 0.001)
and to extract 8 (p < 0.001). Scrutiny did not reveal any consistent pattern in the former
case. However, in extract 8, which had already been shown to be atypical in the relationship
between tonality and valency of emotion women consistently respond more favourably to the
positive statements and less favourably to the negative statements than did the men in this
sample. Possible reasons for this will be explored below
(ii) Experience
Participants were chosen with regard to their practical experience of working in a tonal idiom.
Analysis confirmed that differences in the length of this experience was not a significant
factor .
(iii) Familiarity with the Extracts
One final possible influence on the participants' responses was examined, namely whether or not
any previous familiarity with the music might have influenced the results. Here again analysis
revealed no significant differences.
g) Rank Order of Descriptors
It was observed that certain descriptors seemed to be more frequently chosen than others. The
responses to the major basic terms favoured statement 3 above all the others, while the
responses to the minor motifs favoured statement 4 (though Extract 8 is an exception [see
below]).
At the other end of the scale, statement 1 was the least favoured descriptor for the major
extracts and statement 9 for the minor ones. The likeliest reason for this, and for the

Hampson
repeatedly noted fact that the responses to extract 8 seem to run against the grain, may lie in
the nature of the extracts themselves. It was the aim, when selecting the music which would
exemplify the basic terms, to try to choose extracts which were fairly similar to each other in
terms of tempo, dynamics, rhythm etc. By and large this was achieved, with the exception of
number 8, the Study in F minor, which has an rhythmic urgency and drama which is not present in
the rest. Perhaps the very fact that the others are more uniform in their general character
tended to influence the choices of descriptor away from the more "assertive" statements (1 & 9)
and towards the more "accepting" ones. At the same time it is reasonable to hypothesise that
the comparative vigour of the F minor study influenced listeners to perceive it as 'confident'
and 'joyful', even though this is certainly not what Cooke's theory would have predicted.
Summary of the Findings
There is no evidence that Cooke's basic terms carry the precise meaning he gave them.
Modality is a significant factor in whether or not listeners perceive a positive or negative
emotional element in the music.
The perception of the emotional content of a musical phrase can be strongly influenced by
factors other than modality (e.g. tempo)
The contour (direction) of a melody, the position of the key note and any previous familiarity
with specific music examples do not appear to be a significant factor in determining the
emotional character of a motif.
The sex of the respondent may have some influence on the perception of specific emotional
characteristics in music.
8. Discussion
The core finding of this study is that the precise emotional connotations which Deryck Cooke
ascribed to certain musical basic terms were, with one exception, not agreed by a sample of
listeners experienced in the western tonal idiom. That is not to say that there were no
agreements about the emotional tone of the extracts but these appear to be influenced by
factors other than the tonal tension upon which Cooke built his hypothesis.
Principal amongst these factors is the modality of the fragments. In most cases the findings
show a significant link for listeners between positive affect and major modality, and between
negative effect and minor modality, which is in accordance with many studies made over the
years (e.g. Hevner, 1935; Rigg, 1964). Although clearly not supporting the detail of Cooke's
claims, it could be argued that this linkage is evidence of an inherent mechanism at work.
Various studies have set out to establish whether or not there is an acoustic foundation for
the association between modality and emotion and also the age at which it reliably begins to
appear. The evidence is contradictory because of the various different frameworks which have
been used for such enquiries. Kastner & Crowder (1990), working with children ranging in age
from 3 years to 12 years, showed that even the youngest were able to discriminate and register
culturally shared emotional perceptions of the two modes. They speculated that the reason why
such young children were able to recognise this link is to do with the relative familiarity of
the major and minor modes. They referred to hypotheses by Zajonc (1984) that mere exposure to a
stimulus leads to familiarity with it and that, in all organisms, familiar stimuli tend to be
preferred, novel stimuli feared. Familiarity in this case, the writers argued, could be to do
with the presence of the major triad as easily audible harmonics early in the overtone series,
which would support the notion of an inherent perceptual component in the modal-emotional
linkage. However, the link between familiarity and preference is contradicted in the musical
field by Hargreaves and Castell (1986) who reported that 4/5 year old children did not
discriminate in their preference ratings between familiar and unfamiliar melodies because it
takes time to develop the cognitive maturity necessary to link familiarity with liking. This
suggests a developmental process.
A further difficulty lies in the fact that even this powerful influence is not always
over-riding. As has already been discussed, extract 8, the F minor study, was perceived as
primarily having positive affect. This suggests that such factors as a fast tempo or a
distinctive rhythm can be more sometimes be more significant in moulding perceptions than can
mode or any particular tonal relationships between the pitches. Indeed, Hevner (1937) concluded
that tempo was the most important dimension of all for carrying the emotional connotation of
music, though she warned against assuming any simple correspondence between any one factor and
a particular emotional tone.
Despite Cooke's many references to instances where the rising and falling of melodies, to and
from the tonic, has particular emotional resonances, the factors of musical contour and
key-note position were found not to be significant in this experiment. This is consonant with
developmental studies, which do not support the notion that these elements are perceived in

Hampson
specific ways without appropriate cultural experience [Gerardi & Gerken (1995); Trehub,
Schellenberg and Hill (1997)]
Cooke did not give any consideration to individual or group differences in perception in his
theory. However, there was one interesting instance in this study of a significant difference
between men and women in their responses to the music. As reported above, the responses to
extract 8 showed that the women consistently rated it higher on the positive statements and
lower on the negative statements than did the men in the sample. It is difficult to know how to
interpret this, especially since it is the only extract which produced this effect and other
relevant studies into gender-based perceptions [e.g. Giomo (1993); Abeles & Chung (1996);
Panksepp & Bekkedal (1997)] are inconclusive. This particular response remains interesting
nevertheless, and it might be instructive to track down other extracts which generate similarly
distinctive patterns. Comparison studies might then suggest some worthwhile hypotheses to
pursue.
9. Conclusions
Cooke's claims about the emotional meanings which can be attached to fragments of music are not
supported by this research. It is also evident that his greater claim, that there is an
inherent factor in terms of tonal tension which governs our perceptions, cannot be tested in
this way. The use of real music in place of Gabriel's elemental approach, makes it impossible
to disassociate any possible inherent responses from those developed through enculturation.
Nevertheless it remains a fascinating question which deserves an answer.
Reference was made earlier to the identification by Dowling & Harwood (1986) of certain
'universals' in music. They also consider its adaptive value in terms of evolution and suggest
that it can be explained by reference to its value to human groups rather than necessarily to
individuals, allowing them to express shared experiences, values and cultural identity which
are important for survival. (Op. cit. p. 236). The work of PapouŠek (1996) quoted earlier draws
attention to its value in terms of individual social development. Cross (1998) has suggested
the outlines of a developmental theory which would encompass both these elements. Perhaps
future work in such areas as ethnomusicology or neuromusical research will yet reveal important
inherent features of music to help explain why it is so all pervasive in the cultures of the
world.
REFERENCES
Abeles, H. F. & Chung, J. W. (1996) Responses to Music Ch. 8 in the Handbook of Music
Psychology (2nd. Edn.) ed Hodges, D. A. IMR Press, UTSA
Cooke, D. (1959). The Language of Music. Oxford University Press
Cross, I. (1998). Is Music the Most Important Thing We Ever Did? Music, Development and
Evolution Proc. 5th Int. Conf. on Music Perception & Cognition 35 - 40
Dowling, W. J. & Harwood, D. L. (1986): Music Cognition. Academic Press.
Gabriel, C. (1978). An Experimental Study of Deryck Cooke's Theory of Music and Meaning.
Psychology of Music. 6 (1): 13 - 20
Gabrielsson, A. & Juslin, P. N. (1996). Emotional Expression in Music Performance: Between the
Performer's Intention and the Listener's Experience. Psychology of Music. 24 (1): 68 - 91
Gerardi, G. M. & Gerken, L. (1995). The Development of Affective Responses to Modality and
Melodic Contour Music Perception 12, 3: 279 - 290
Giomo, C. J. (1993). An Experimental Study of Children's Sensitivity to Mood in Music
Psychology of Music 21, 2: 141 - 162
Gundlach, R.H. (1935). Factors Determining the Characterization of Musical Phrases Amer. J.
Psychol., 47, 624 - 643
Hargreaves, D. J. & Castell, K. C (1986). Development of Liking for Familiar and Unfamiliar
Melodies Reported in The Developmental Psychology of Music. C.U.P.(1986). p. 117/8
Hevner, K. (1935a). Expression in Music: A Discussion of Experimental Studies and Theories
Psychological Review, 42 : 186 - 204
Hevner, K. (1935b). The Affective Character of the Major and Minor Modes in Music Amer. J.
Psychol.:47 : 103 - 118
Hevner, K. (1936). Experimental Studies of the Elements of Expression in Music. American

Hampson
Journal of Psychology. 48: 246 - 268
Hevner, K. (1937). The Affective Value of Pitch and Tempo in Music American Journal of
Psychology. 49: 621-630
Kastner, M. P. & Crowder, R. G. (1990). Perception of the Major/Minor Distinction: Emotional
Connotations in Young Children Music Perception 8, 2: 189 - 202
Langer, K. (1957). Philosophy in a New Key. 3rd. Edn. Cambridge MA
Ortony, A. & Turner, T. J. (1990). What's Basic About Basic Emotions? Psychological Review. 97
(3): 315 - 331
Panksepp, J. & Bekkedal, M. Y. V. (1997) The Affective Cerebral Consequence of Music: Happy vs
Sad Effects on the EEG and Clinical Implications. International Journal of Arts Medicine, 1997,
5: 18 - 27
PapouŠek, M. (1996). Intuitive Parenting: a hidden source of musical stimulation in infancy.
In: Deliege, I. & Sloboda, J. (Eds). : Perception & Cognition of Music. Psychology Press.
(Chapter 4 pp. 88 - 112)
Rigg, M. G. (1964). The Mood Effects of Music: A Comparison of Data from Four Investigators.
Journal of Psychology. 58: 427 - 438
Schoen, M. & Gatewood, E. L. (1927) The Mood Effects of Music. In : Schoen, M. (Ed). The
Effects of Music. Kegan Paul, Trench, Trubner & Co., Ltd. London
Sundberg, J. (1993). How Can Music be Expressive? Speech Communication 13. 239 - 253
Sloboda, J. A. (1991). Musical structure and emotional response: some empirical findings.
Psychology of Music. (19: 110 - 120
Sloboda, J. A. (1998) Does Music Mean Anything? Musicae Scientae: 11, 1: 21 -31
Trehub, S., Schellenberg, G & Hill, D (1997). The Origins of Music Perception and Cognition: A
Developmental Perspective In: Deliege, I. & Sloboda, J. (Eds). : Perception & Cognition of
Music. Psychology Press.
Zajonc, R. B. (1984). On the Primacy of Affect American Psychologist. 39: 117 - 123
Zuckerkandl, V. (1960). Review of "The Language of Music" by Deryck Cooke Journal of Music
Theory. (Vol. 4:1, 104 - 109)
Gatewood, E. L. (1927) The Mood Effects of Music. In : Schoen, M. (Ed).
Back to index

THE RELATION OF MELODY AND TEXT IN JAPANESE POPULAR SONGS: AN EXAMINATION BASE
THE RELATION OF MELODY AND TEXT IN JAPANESE POPULAR SONGS: AN EXAMINATION BASED
ON A COLOUR-CHOICE METHOD AND A COMPARISON OF GENERATIONS.
Etsuko HOSHINO (Dr.)
UENO-GAKUEN UNIVERSITY,
2-3-1, Haracho, Sohka, Saitama-ken, JAPAN
DZE04250@nifty.ne.jp
Aims:
Songs consist of two components: melody and text. The purpose of this study was
to ascertain and compare the influences of image on two components.
Method:
Two groups of subjects(21 young females and 21 middle-aged females) listened to

two kinds of Japanese popular songs: enka and J-Pop. Enka are old-type popular
songs, while J-Pop means new-type popular songs in rock style. There were three
different kinds of auditory stimuli: song melodies, song texts, and complete
songs. The subjects listened to each excerpt of the stimuli once, after which
they were asked to choose one colour which seemed most nearly to fit that
stimulus from a list of 21 colours.
Results:
The results showed:
1) 30% of colour responses were chosen in common between melodies and texts
from the same colour categories such as "warm colour", "cool colour", "neutral
colour" and "muddy colour"(Kawakaki,et al.,1994).
2) There was a stronger relation of image between text and song than between
melody and song.
3) Each type of popular song produced a different impression according to the

age of the listener. Middle-aged subjects were influenced solely by the image
of the text when they listened to new J-Pop songs.
It seemed that subjects' musical understanding or their familiarity with a song

activate the image of the melody of the song. The relation between melody and
text of a song through colour image is discussed.
Back to index
file:///g|/poster3/Hoshino.htm (1 of 2) [18/07/2000 00:37:55]

THE RELATION OF MELODY AND TEXT IN JAPANESE POPULAR SONGS: AN EXAMINATION BASE
file:///g|/poster3/Hoshino.htm (2 of 2) [18/07/2000 00:37:55]

Kari Kallinen
EMOTIONAL PROTOTYPES IN MUSIC
Some empirical findings how basic emotions are expressed in music structure
Kari Kallinen, Department of Musicology, University of Jyvaskyla
Background.
It has been suggested that qualities of musical syntax are central to the expression or communication
of different kinds of emotions. However, there seem to be only few studies providing empirical
evidence to support this claim.
Aims.
This study is about how basic emotions with different intensity levels are expressed in the structure of
(tonal) western art music.
Method.
Music listeners were asked to choose a basic emotional facial expression (joy, sadness, anger, fear,
surprise, disgust) corresponding to the emotional character of the recorded music sample heard and
evaluate the intensity of the emotion by marking a cross to the intensity segment. Passages considered
as best examples of basic emotions (consensus of choices of facial expressions was high) were taken
under structural analysis. Features such as loudness, tempo, complexity of harmony ("iconical"
meaning) and sudden dynamic or textural change, new or unprepared harmony ("symbolic" meaning)
were examined. The combinations of structural features related to distinct emotions and intensities
were then considered as prototypes.
Results.
Results seem to give some support to the claim that states that emotional qualities in music are
qualities in musical syntax. However the final results of this study shall be presented in Keele 2000,
since by the time of the submission the analysis is not in full completed.
Conclusions.
The study suggests that expressions of basic emotions (or at least the recognition of basic emotional
characters) seem to have some analogy with the visual pattern recognition. A figure can be percepted
when all or some of the most important partials are present.
Back to index
file:///g|/poster3/Kallinen.htm [18/07/2000 00:37:56]

Charlie Parker and the Golden Section : An Examination of Musical Proportion in the Released and Alternate takes of "An Oscar
Proceedings paper
Charlie Parker and the Golden Section : An Examination of Musical Proportion in the Released and
Alternate takes of "An Oscar for Treadwell"
B. Kenny
ABSTRACT
This paper examines Charlie Parker's contrasting approaches to musical pacing in consecutive takes of "An Oscar For Treadwell." Following a brief analysis of recurrent thematic material common to both takes,
an analysis which incorporates a discussion of Parker's complex approach to motive deconstruction, is a proportional examination of music pacing. Four nodes corresponding to Golden Section proportions were
chosen along the 64 bar time lines for each take. At each of these nodes, the Released Take was compared with the Alternate Take in respect to motivic choice, melodic contour, rhythmic impetus and overall
successful dramatic realisation of each node. Results show a close relationship between significant musical events articulated by the Released Take and Golden Section proportions. It is suggested that one of the
key factors which contribute to the organic unity found in Parker's best work is his ability to conceive of improvisation in long range terms. With all the inherent drama, logic and fluidity of a musical
conversation, Parker is able to generate and sustain rhetorical tension throughout each multi-chorus improvisation, although to a greater extent in the Released Take. Such rhetorical constructs assist Parker in
overcoming artificial formal and musical divisions implicit in the 32 bar chorus song forms he improvises on, a characteristic which distinguishes his playing from other "chorus to chorus" bebop contemporaries.
PROPORTIONAL ANALYSIS
The main issues to be addressed in the following discussion and overview of proportional analysis can be summarised as follows:
1. What is proportional analysis and, in particular, the Golden Section ratio?
2. What analytic advantages does this methodology bring to a greater psychological understanding of such ephemeral but important concepts as rhetoric and pacing in improvisation?
3. What is its history of application to both notated and improvised musics?
4. What is the likelihood of an improviser such as Charlie Parker consciously articulating its aesthetic properties?
One of the primary recommendations for any form of proportional analysis is its very generality, namely that such analyses necessitate a discussion of many interrelated musical parameters at the chosen
proportional moments in a given work. Furthermore, proportional analyses acknowledge the important temporal aspects of music as a series of discrete linear events. The identification of these events and the
manner in which they are prepared and resolved often holds the key to an understanding a work's inherent structural drama. In a genre perhaps more indebted to concepts of spoken 'rhetoric' than notated music,
such analyses may also illuminate the improviser's unique approach to the original structure.
THE GOLDEN SECTION RATIO

Proportional analysis has traditionally been most often applied to single movement entities, be they single-movement works or movements within larger works. With few exceptions, discussions of musical
proportion have largely focused on Golden Section proportions (henceforth GS) as either a catalyst for further discussion or as ends in themselves (Dorfman, 1986, xii).
file:///g|/poster3/Kenny.htm (1 of 17) [18/07/2000 00:38:16]

The GS ratio involves the division of a line, an area or a musical work into two parts, "so that the ration of the shorter portion to the longer portion equals the ration of the longer portion to the entire length."
(Howat, 1983, p. 2). The exact length of the ratio is irrational, approximating 0.618034 or a little less than 2/3 of the entire length measured (Figure 1).
Most often, discussions of GS proportions have focused on musical form (i.e. AABA, ternary, sonata), which is either viewed to be underpinned by the ratio or exists in a complementary relationship to it. As
Dorfman (1986, p. 20) notes, GS divisions either coincide with major formal divisions in a work or, as is more common, with major turning points. In the compositions of Debussy, for example, Howat (1983)
demonstrates how GS proportions give shape to and make sense of the composer's smaller episodic divisions. These overriding proportions account for significant thematic events which defy explanation by more
conventional analytic methodologies.
Musical works have the potential for at least two major GS divisions, which occur roughly 2/3 from the beginning (Long GS) or 2/3 from the end of the work (Short GS). These two main GS divisions can
likewise subdivide existing sections or be subdivided themselves in a seemingly infinite number of ways (Figure 2).

Naturally, the greater number of these subdivisions there exist, the more likely that they will either be confused with the primary GS for the work or be subsumed under a consideration of other formal divisions.
While GS proportions have received greater attention in the visual arts, the relationship between logarithmic proportions and music has been an enduring and intriguing one. Throughout history, music theorists
and composers have made connections between this dynamic ratio which can be found in nature and similar proportions in music. In Ancient Greek music theory, GS proportions played a key role in the
Pythagorean and Euclidean formulation of the harmonic overtone series (Dorfman, 1986, p. 26-39). Logarithmic proportions are likewise present in other theoretical constructs of music, such as the relationship
between loudness and sound intensity and, to an extent, in the traditional Western system of proportional notation (Dorfman, 1986, p. 54).
In the absence of verifiable anecdotal evidence, various musicologists have noted a remarkable tendency in the works of Western composers towards GS proportions. These proportions either coincide with larger
formal divisions or with other significant musical which cannot be accounted for by static form alone. Authors with more reliable anecdotal evidence have also examined the intentional use of GS proportions as a
compositional device in the works of Debussy (Howat, 1983), Satie (Adams, 1996) and Stravinsky (Watkins, 1994). To this author's knowledge, the ratio has never yet been applied to a dedicated study of
proportion in improvisation.
What is the likelihood of a jazz improviser and composer such as Charlie Parker deliberately employing GS proportions as a compositional or improvisatory method? Parker's intentional use is probably highly
unlikely, for several reasons which will be addressed at the conclusion of this paper. If not ignorant of GS proportions (one cannot dismiss Parker's acquaintance with them out of hand) it is improbable that
Parker would have intentionally wished to straitjacket his improvisation in this manner. As for any subconscious applications of the ratio, one enters the realms of speculation and hypothesis. Discussing evidence

of GS proportions in Bartok's Music For Percussion Strings and Celeste, Andrew Ford, for example, observes:
Did Bartok do this deliberately? We don't know. The evidence is pretty strong, and there are similar proportions elsewhere in his music....I am inclined to believe that had Bartok been consciously employing these structures he would have
made them exact. As it is, they are only approximate, and their presence probably results from the composer's innate sense of balance, perhaps derived from his nature studies. (Ford, 1997, p. 86-87).
A similar line of reasoning informs Dorfman's observation that:

What appears to be lacking in prior discussions of proportional analysis...is a consideration of the possibility that various subconscious factors...may result in certain very general, archetypal shapes, and that the golden section may be one
particular shape which fulfils these subconscious requirements. (Dorfman, 1986, p. 146)
METHODOLOGY
The two works chosen for analysis were Charlie Parker's improvisations on his own composition "An Oscar For Treadwell." Both the Released and Alternate takes were recorded consecutively at Mercury
Studios in New York City on June 6, 1950. The musicians on the date were Dizzy Gillespie (trumpet), Thelonious Monk (piano), Curly Russell (bass) and Buddy Rich (drums). The particular analytic focus for
this paper is Parker's two improvised choruses (bars 41 to 104) and their relationship to the composed Head melody which proceeds them (Figure 3).
The 'head' or composed melody for this song was most likely a Contrafact, meaning that Parker composed a new melody over the chord changes of an existing jazz standard. While it could be argued that one
should ultimately look for similarities between the original jazz standard and the resulting improvisation, such an approach poses several problems. First, it is often difficult to establish beyond doubt which tune
the changes were originally appropriated from. Second, this exercise is rather pointless in that countless jazz standards share the same or very similar sets of chord changes - the most famous set being those based
on Gershwin's "I Got Rhythm", usually referred to as the "Rhythm" changes. For these reasons, I opted for Parker's composed head which bookends the improvisations of this song.
In each take, Parker's improvised choruses (64 bars in total) articulate two cycles of the 32 bar Head. The Head itself can be subdivided into four 8 bar phrases which express a AABA form.
These clear formal divisions implicit in the Head meant that all A and B sections could be analysed in respect to their A or B section groups, irrespective of the take they originally came from. With the A and B
sections for both takes combined, this brought the total number of A section motives to twelve and B section motives to four (Figure 4).

Within each of the section groups, motives were further subclassified according to their thematic material. For example, A Section motives incorporating the notes of the theoretical 'Blues' scale were labelled
'Blues' motives. Following this identification and classification procedure, four nodes corresponding to Golden Section proportions were chosen at corresponding points along 64 bar time lines for both takes (bar
41 to bar 105). These Four nodes, ranging from greater to lesser significance, are detailed in Figure 5.

At each of the four nodes, the Released Take is compared with the Alternate Take in respect to motivic usage, melodic contour, pacing and overall successful dramatic realisation of each node.
ANALYSIS
A AND B SECTION MOTIVES


Figure 6 A Section Motives

Both 'encirclement' motives establish a characteristic high C from the outset. This note, the most convincingly prepared and resolved upper pitch in both takes, is lended further prominence by the turn
(C-D-D-B-D-C) which encircles it. Following the turn, this prepared C becomes the focal point for the following phrase. The melodic contour of this consequent phrase can be characterised as generally
descending from C5 to A4.
All three 'blues' motives incorporate the b3 and b7 of the theoretical blues scale, themselves part of a motive which commences on G3, ascends to Eb5 and descends to C4/B4 via Bb4 and Ab/G4.
Figure 7 Blues Motives

Figure 7 illustrates the motivic relationship between Alt 72, which articulates elements of the Blues scale in its most basic form, and its other two embellishments at Rel 48 and Rel 97. This example also
illustrates Parker's ability to deconstruct a basic motive by analogy, a complex process that avoids any pejorative connotations of "riffs" and "formulae."

All of the '6th' motives (Figure 6) incorporate a melodic cell which commences on E4, descends via C4 to A3 (the major 6th of a C major chord) and returns to C4. The difference between the two 6th motives
becomes evident after the first bar, where Rel 65 continues with a rising climb to C5 whereas Alt 97 reinforces the E-C-A-C motive through repetition.
Both 'sweep' motives (Figure 6) encompass a spiralling succession of C major 7th structural tones (E G B D). Both of these are also in fact the opening bars of both the Released and Alternate Takes and both are
virtually identical for their first 2 1/2 bars. This use of an evidently pre-composed motive to launch both takes is indicative of the importance Parker placed on motive selection and structural planning. One can
only speculate whether these fixed 'coathangers' were used by Parker to overcome performance anxiety or if, rather, they are symptomatic of his ambivalent approach to improvisation, an approach where pre
composition and spontaneity often interact with and generate one another.
Figure 8 B Section Motives

The most important and frequently occurring B section motive (Figure 8) is the 'D9' motive, which articulates the structural tones of a D9 chord (D, F#, A, C, E) and is found on three separate occasions across
both takes. The structural tones themselves occur first as members of a rising and falling scalic passage and are then stated more explicitly in the second bar, where an ascent is made from F#4 through of the D9
chord tones to E5. After a momentary drop to B4, a reverse descent is made through the chord tones of Dm7 - a two beat anticipation of the chord change which occurs in the next bar. Like the aforementioned
pair of 'sweep' motives, the first pair of D9 motives (Rel 61 and Alt 93) are virtually identical for their first two bars. Thematic similarities do not cease here either - the G7 submotive (Alt 93 bar 3, parenthesised
in Figure 8) can also be found in the fourth bar of Rel 61.
The Alt 62 presentation picks up the 'D9' motive mid-way through at its melodic peak (E5) and interestingly incorporates the Dm7 and G7 submotives found in the other presentations.
The other important B section motive (Figure 8) occurs only once in the total two takes. This has been labelled the 'Sequence' motive because six of its eight bars consist of a two bar melodic sequence repeated
three times. Each time, the sequence commences with a held B, which descends to either the 4th or Major 6th below before playing out in a turn-like figure.

THE FOUR GOLDEN SECTION NODES
Node 1 (0.618 of total = Bar 81)
Figure 9 GS Node 1 (Bar 81)

At bar 81, the 'Encirclement' motive is presented in the Released Take. Before examining GS Node 1 in both of the takes, it is worth briefly comparing this 'encirclement' motive at Rel 81 with its earlier
presentation in the Alternate Take at Alt 48 (Figure 10).
Figure 10 Encirclement Motives (comparison)

At Rel 81, Parker enhances the dramatic impact of the encircled C through the use of rhythmic space (i.e. the 1 1/2 beats of rest which follows the turn). This spacious approach prevails throughout bars 82 and 83
where, for such a busy bebop piece, a relatively simple melody and static rhythm sketch the fundamental descending melodic contour of C4, B4 and A4. The Released Take's overall melodic contour over bars
81-83, in other words, can be characterised as generally descending. A similar use of hiatus also clearly separates A4 from F4 in bar 83.
By contrast, the Alternate Take's angular melodic contour and unbroken quaver rhythm (at Alt 48) fails to engender a similarly coherent feel. Its jagged melodic contour further suggests more a collection of riffs
which, while skilfully referencing important chordal notes, does not convey a similar sense of continuity.
Following bar 49, the melodic contour descends from C5 to E4 via B4 and G4 - all structural tones of C Major 7. It appears as if Parker attempts to reference as many chord tones as possible in an improbably
short length of time; an attention to detail that is perhaps at the expense of clarity. Unlike Rel 81, Alt 48 appears to approach each chord change as an event in itself and thereby fails to achieve any similar long
range melodic perspective and singular direction of contour.
Returning to GS Node 1 (Figure 9), one can contrast the Released Take's definition of a gradually descending contour (C-B-A) with the Alternate take's melodically angular and rhythmically dense conception.
This too resembles less a melody in its own right than a chord-by-chord 'motive collection.' The chord tones of C Major9, for example, are first explicitly stated as an arpeggio (E-G-B-D) with a brief descending
scalic 'tail' (D5 - A4) before stepping up to C5 (7th of Dm7) at the commencement of bar 82. The melody then descends through the chord tones of Dm7 (C-A-F-D), a sequential mirror image of the angular C

Major9 tones, before playing out in a characteristic Bebop riff over the G7.
In view of Bar 81's proportional significance as the primary GS, one would have to argue for the Released Take's measured and coherent approach. The Alternate Take's unbroken and angular adherence to the
'changes' represents, moreover, an unsatisfactory solution to an problem which has is origins in bar 48 (of the Alternate Take). Recalling the 'encirclement' motive had already been used at this point, Parker
would have needed to either repeat it, draw upon an as yet unused motive, or spontaneously create something. Evidence for latter of these three hypotheses (spontaneous creation) is supported by the complete
absence of this jagged Alt 81 melody elsewhere in either of the two takes. Furthermore, it's unlikely that Parker, who was a gifted composer as well as improviser, would have 'composed' such an angular and
perhaps uninteresting melody if given the leisure to do otherwise. It is therefore suggested that Parker's necessary response to this improvising problem was in fact truly spontaneous.
Node 2 (0.381 of total = Bar 65)

Again a distinctive, measured and coherent melody - the 6th motive - distinguishes the Released Take from the Alternate Take. Following its characteristic E C A C opening, an ascending scalic melody carries
the contour from F4 (3rd of Dm7) to C5 before resting on G4. Following the G, Parker once again achieves dramatic tension and phrase definition through the use of silence. At bar 66 (beat 3 1/2 ) G4 (the
dominant of C) creates anticipation of a return to C5. What instead follows, however, is B4 - a gesture similar enough in melodic contour to C5 and an accommodation of the prevailing Em7 chord. This B then
steps to A (the root of A7) and maintains a downward moving contour.
Once again, the Alternate Take melody offers less suspense, direction and coherence. The melodic contour itself is relatively static, with non-essential tones largely prolonging E4, the 3rd of C Major7. This
prolongation is achieved through the use of F, first as a passing note from G4 to E4 and then as an upper neighbour to the held E4. The second half of its phrase reinforces the prevailing melodic stasis; the step
down to C4, scalic climb to G4 and further descent through E4 and G3 all represent attempts to prolong the structural tones of C Major7 (C-E-G-B).
Parker's use of silence (i.e. rests) in the Alt melody is also interesting. Whereas the Released Take can be perceived as falling into two even phrases (2 + 2), the Alternate Take instead suggests a phrase structure
of 3 + 0. Its first and only phrase extends from bar 65 to 67. Throughout bars 67 to 68, the Alternate Take's reticence does little to enhance or reinforce the important turnaround harmony (iii-VI-ii-V) which
characterises these bars.

Figure 12 Contour reduction of bars 65-68

Figure 12 distils both takes to their essential melodic contours. It is worth noting the Released Take's overall conformity to an arched-shaped contour. That the peak of its melodic contour reaches fruition at the
midway point testifies to its symmetry of phrase structure (i.e. 2 + 2). The Alternate Take's melodic contour, on the other hand, can be characterised as largely static with a sudden drop at its tail.
In summary, at GS Node 2 Parker achieves a greater sense of clarity, balance and dramatic impetus in the Released Take. This view is not only contingent on its more sophisticated usage of rhythmic space and
pitch selection but also on its more precise attention to phrase structure and melodic contour.
Node 3 (0.618 of Chorus One = Bar 61)

At Node 3 in the Released Take (Figure 13), the important B section 'D9' motive is presented. The first feature worth noting is the similarity between the Released Take and the Original Head melody between
bars 61 and 64. The Released Take not only shares a similar phrase structure to the Head but also a similar melodic contour and many of the same key melodic tones. At bar 61, the Released Take begins its scalic
climb from D4 to C5 a quaver later than the Head melody. Through rhythmic foreshortening, the Released Take overtakes the Head by bar 61, beat 4. A similar process of rhythmic variation, this time through
displacement rather than foreshortening, occurs on the final quaver beat of bar 63. Here the Released Take melody echoes that of the Head a quaver later (the displacement becoming a whole crotchet beat by bar
64, beat 1).
This process of improvised 're-composition' through embellishment, foreshortening and displacement just illustrated is another example of Parker's subtle deconstructive approach improvisation, a technique that

significantly blurs the distinctions between fundamentally different psychological processes - those of composition, embellishment and spontaneous creation. Parker's consistent incorporation of key notes from
the original Head is furthermore indicative of the importance he places on the original song's structure as a template for improvised stimulus.
This situation of general concordance can only be contrasted with the seemingly direction-less Alternate Take melody, which announces bar 61 with a spill-over phrase from bars 59-60 (Figure 13). This
spill-over phrase terminates on beat four of bar 61, whereupon Parker opts for silence before commencing the next phrase. The phrase following this hiatus, the only complete phrase over these eight bars,
stretches from bar 62 (beat 3) to the end of bar 64.
When the 'D9' motive is recaptured at its midway point (E5), it is as if Parker belatedly realises the thematic importance of Head melody. From here, Parker uses similar techniques of rhythmic displacement and
foreshortening in addition to those which can only be described as motivic cutting and pasting - particularly in its use of the Dm7 and G7 submotives.
In summary, by paying greater attention to the Head's melody's phrase structure, melodic contour and skeletal tones, the Released take articulates the onset of bar 61 more successfully than the Alternate Take
melody. The latter not only unconvincingly heralds bar 61 with a spill-over phrase but also fails to satisfactorily build and resolve tension between bars 61 and 68.
Figure 14 Contour reduction of bars 61-65

Figure 14 represents the skeletal contours for both Takes. The contrast evident here is between the Alternate Take's sudden onset and descent and the arch shapes which are evident in the Head and Released Take
melodies.
Node 4 (0.618 of Chorus Two = Bar 93)

At bar 93 (Figure 15), the Alternate Take presents the important 'D9' motive in its entirety. The significance of this motive and its close relationship to the Head melody has already been discussed. It is
interesting, however, to compare the Alternate Take's appropriate realisation of this moment with the Released Take melody, which at bar 93 is into the third cycle of the 'Sequence motive.' For once, however,
Parker's delineation of a proportional node in the Alternate Take may have been misguided.

Figure 16 Theoretical formal proportions for An Oscar For Treadwell

It must be remembered that the Long GS for the combined choruses (GS Node 1) occurred just twelve bars earlier (at bar 81) and that Parker failed to adequately underscore this moment in the Alternate Take. As
is evident in Figure 16, it is suggested that this has the effect of shifting the overall climax in the Alternate Take from bar 81 to this surrogate climax in bar 93. In other words, due to its proximity to the
proportionally ideal bar 81 climax, bar 93 becomes a surrogate Node 1 in the Alternate Take.
This shift in overall climax entails a sudden and, as is argued here, compromised resolution in the remaining 12 bars. By contrast, in the Released Take Parker is perhaps intuitively aware that the overall climax
of the work has already been successfully realised (Bar 81) and that any subsequent attempt to improve on it would necessarily rob GS node 1 of its full impact. While significant as a 6 bar entity, each 2 bar unit
of the 'Sequence' motive creates the impression of repetition and stasis as opposed to movement and tension. That the 'Sequence' motive is into its third repetition by bar 93 and that its melodic contour only
begins to build from bars 95, lends credence to the hypothesis that Parker views this final B section as a preparation for the final A section which commences in bar 97.
In summary, in the Released Take Parker may have intuitively used this steadily building B section as both a means of weakening bar 93 and of strengthening the onset of the final A section at bar 97. For once, it
seems, a static formal division is given precedence over the larger GS proportions. This suggests a hierarchy of pacing used in the Released and Alternate Takes, which is summarised in Table One.

CONCLUSIONS
Throughout this paper, it has been argued that the Released Take's greater conformity to idealised GS proportions demonstrates a greater awareness of pacing than the Alternate Take. It must be stated, however,
that this issue of pacing may represent only one of the reasons why the Released Take was ultimately chosen for release. Although Parker may have personally participated in the final editing and selection
procedure, it is more likely that release decisions would have ultimately been made by the record producer. Furthermore, issues other than pacing may have determined the decision outcome. These may include
factors affecting the musical quality of the performance (i.e. miscued notes, the relative merit of the sidemens' solos, the tempo of each take, the arrangement of the Head, etc.) or those affecting the sound
recording process (i.e. balance, intrusive noises, etc.). In spite of these important variables, it cannot be altogether discounted that the release decision process may have been influenced by issues of pacing and
proportion.
Another important point is that in all probability Parker would have only been aware of GS proportions at an intuitive and subconscious level. It is doubtful whether Parker would have been able to, or would have
even wanted to, employ GS proportions with any degree of precision. On the other hand, Parker may have intuitively acquired them through three equally plausible hypotheses.
The first is that as an consummate jazz improviser, Parker would have certainly understood the importance of melodic rhetoric and similarly how to structure it to achieve optimal communication. Through years
of experience, experimentation and self-critical feedback, Parker would have also realised the importance of building and releasing tension across large formal entities. His very attraction to such proportions as
the Golden Section may be explained simply as a rational expression of a generalised arch curve, one which prolongs tension for the majority of a work and leaves sufficient time to resolve it.
The second hypothesis is that Parker would have in time internalised many of the jazz Head arrangements he improvised on to the extent that their proportions could not but influence his improvisations. As a
composer and regular performer of jazz standards, Parker would instinctively come to know where the important moments of most tonal works occur. In well constructed song forms, these peaks would be
especially articulated in the Head arrangement's melodic contour, harmony, rhythmic figuration, dynamics, texture, articulation, etc. The argument here is that if one plays and improvises on enough of these
composed song forms, one cannot help but internalise their logic.
The third and final hypothesis is that Parker may have acquired GS proportions from nature. As remarked earlier, GS proportions have been uncovered in the works of many Western art music composers who
may have never intentionally employed them. Although musicologists have generally abdicated from responsible conclusions as to why these proportions occur, it may be construed that many of these
occurrences may be in fact manifestations of the ratio as a naturally occurring phenomenon. It should be of no surprise, therefore, that Parker, a very successful exponent of a genre which owes much of its
impetus to a spontaneous rhetoric, may have intuitively constructed his musical argument within GS proportions.
This analysis of Parker's successful and less successful pacing of the same work bears testimony to his improvisational genius and also hopefully demonstrates, on a small scale, a great improviser's work in
progress. Questions surrounding Parker's intentional or subconscious use of GS are perhaps irrelevant in view of the fact that there will never be sufficient anecdotal evidence to substantiate any of the assertions
made in this paper. That GS proportions are nevertheless better adhered to in the Released Take is evident in the music itself.

ACKNOWLEDGEMENT
The author of this paper is indebted to Associate Professor Gary McPherson (University of NSW) for all of his advice, enthusiasm and critical feedback, all without which the writing of this paper would have
been impossible.
BIBLIOGRAPHY
Adams, C. (1996). "Erik Satie and Golden Section analysis." Music & Letters. LXXVII(2), p. 242-252.
Bauer, S. (1994). Structural Targets in Modern Jazz Improvisation: An Analytic Perspective. Ph.D. diss., University of California, San Diego.
Berliner, P. (1994). Thinking In Jazz : The Infinite Art of Improvisation. Chicago: University of Chicago Press.
Blanq, C. (1977). Melodic Improvisation in American Jazz : The Style of Theodore "Sonny" Rollins, 1951-1962. Ph.D. diss., Tulane University.
Clarke, Eric F. (1988). "Generative processes in music." in J. A. Sloboda (ed.), Generative processes in music: the psychology of performance, improvisation and composition. Oxford: Clarendon Press, p.
1-26.
Dorfman, A. (1986). A Theory of Form and Proportion in Music. Ph.D. diss., University of California, Los Angeles.
Ernest, G. (1903). "Some Aspects of Beethoven's Instrumental Forms." Proceedings of the Royal Musical Association, Twenty-Ninth Session 1902-1903, p. 73-98.
Ford, Andrew. (1997). Illegal Harmonies: Music of the 20th Century. Alexandria, Sydney: Hale and Iremonger Pty Ltd.
Forte, A. (1973). The Structure of Atonal Music. New Haven: Yale University Press.
Howat, R. (1983). Debussy in Proportion. Cambridge: Cambridge University Press.
Jarvinen, T. (1995). "Tonal Hierarchies in Jazz Improvisation." Music Perception 12(4), p. 415-437.
Johnson-Laird, P. N. (1991). "Jazz Improvisation: A Theory at the computational level." in P. Howell, R. West and D. Cross (Eds.) Representing Musical Structure. New York: Academic Press, p. 291-325.
Kernfeld, B (Ed.). (1988). New Grove Dictionary of Jazz. New York: Macmillan.
Kernfeld, B. (1981). Adderley, Coltrane and Davis at the Twilight of Bebop: The Search for Melodic Coherence (Volumes I and II). Ph.D. diss., Cornell University.
Kramer, J. (1995). "Beyond Unity: Toward an Understanding of Musical Postmodernism." in E. Marvin and Richard Hermann (Eds.). Concert Music, Rock and Jazz Since 1945 : Essays and Analytic
Studies. New York: University of Rochester Press.
Martin, H. (1996). Charlie Parker and Thematic Improvisation. Lanham: Scarecrow Press Inc.
Marvin, E. (1995). "A Generalization of Contour Theory to Diverse Musical Spaces: Analytical Applications to the Music of Dallapiccola and Stockhausen." in Marvin, E. and Richard Hermann (Eds.)
Concert Music, Rock and Jazz Since 1945 : Essays and Analytic Studies. New York: University of Rochester Press.
Marvin, E. and P. Laprade. (1987). "Relating Musical Contours: Extensions of a Theory for Contour." Journal of Music Theory, 31(2), p. 225-267.
Marvin, E. and R. Hermann (Eds.) (1995). Concert Music, Rock and Jazz Since 1945: Essays and Analytic Studies. New York: University of Rochester Press.
McCreless, P. (1997). "Rethinking Contemporary Music Theory." in Schwarz, D., A. Kassabian, L. Siegel (Eds.). Keeping Score: Music Disciplinarity, Culture. London: University Press of Virginia.
Monson, I. (1996). Saying Something. Chicago: University of Chicago Press.
Morris, R. (1993). "New Directions in the Theory and Analysis of Musical Contour." Music Theory Spectrum 15(2), p. 205-228.
Norden, H. (1968). Form - The Silent Language. Boston: Branden Press.

Parker, Charlie. (1988). Bird: The Complete Charlie Parker on Verve (837141-2). The Netherlands: Polygram International music.
Pascoe, C. (1973). Golden Proportion in Musical Design. D.M.E. diss., University of Cincinnati.
Powell, N. (1979). "Fibonacci and the Golden Mean: Rabbits, Rumbas and Rondeaux." Journal of Music Theory(23), p. 227-273.
Schaap, Phil. (1988) Session notes for Bird: The Complete Charlie Parker on Verve (837141-2). The Netherlands: Polygram International music.
Schuller, G. (1958). "Sonny Rollins and Thematic Improvisation." The Jazz Review. November 1958, p. 6-11.
Schwarz, D., A. Kassabian, L. Siegel (Eds.). (1997) Keeping Score: Music Disciplinarity, Culture. London: University Press of Virginia.
Smith, G. (1983). Homer, Gregory and Bill Evans? The Theory of Formulaic Composition in the Context of Jazz Piano Improvisation. Ph.D. diss., Harvard University, Boston.
Watkins, G. (1994). Pyramids at the Louvre: Music, Culture, and Collage from Stravinsky to the Postmodernists. Cambridge Mass.: Belknap Press of Harvard University Press.
Westendorf, L. (1994). Analysing Free Jazz. Ph.D. diss., University of Washington, Washington.
Winkler, P. (1997). "Writing Ghost Notes: The Poetics and Politics of Transcription." in Schwarz, D., A. Kassabian, L. Siegel (Eds.). Keeping Score: Music Disciplinarity, Culture. London: University
Press of Virginia.
Back to index

Dr
THE EFFECTS OF ALTERING ENVIRONMENTAL CONTEXT ON THE PERFORMANCE OF MEMORIZED

MUSIC
Dr. Jennifer Mishra
mmishra@siue.edu
Background:
The theory of context-dependent memory (CDM) has been studied by numerous

psychologists. Many of these psychologists have found that recall is superior
when attempted in the same context in which the information was learned.
Altering either environmental contexts or internal contexts at the time of
recall generally reduces ability to retrieve information.
Aims:
The purpose of this study was to determine whether altering environmental
context affected performance accuracy of memorized music.
method:
Subjects, 60 randomly selected college instrumentalists, memorized a 36-measure

exercise on their primary instrument. Following a 10-minute break, the subjects
either returned to the learning context, a university music school practice
room (n = 20), or moved to a music professor's private studio (n = 20) or to an
auditorium (n = 20). Subjects completed a 15-minute recognition task to aid
acclimation to the new context and were then asked to perform the memorized
exercise.
Results:
Memorization accuracy was not affected by the change in performance context,

but there was a signficiant interaction by year in school and performance
context. The freshmen and seniors recalled more of the exercise when performing
in the practice room than in the studio or on stage. While the memory of
sophomores and juniors was also affected by a change in context, these subjects
performed more accurately on stage and in the studio than in the practice room.
Conclusions:
Though a change in context alone was insufficient to affect memory, a context

change, when considered with year in school, affected recall of a memorized
piece of music.
Back to index
file:///g|/poster3/Mishra.htm [18/07/2000 00:38:17]

The Hidden Programme of Mahler's Seventh Symphony
Proceedings paper

Niall O'Loughlin
N.Oloughlin@lboro.ac.uk
The Problem
The overall meaning of the Seventh Symphony of the Austrian composer Gustav Mahler (1860-1911) has
consistently baffled analysts and musicologists. It used to be thought of as one of the weakest of the
composer's symphonies. The English musicologist, Deryck Cooke, said: 'The Seventh is undoubtedly the
Cinderella among Mahler's symphonies.'(1) He was particularly critical of it: 'The truth is that
No.7...presents an enigmatic, inscrutable face to the world...one which arouses suspicions as to its
quality.'(2) James L. Zychowicz noted the criticism: 'It is rare, indeed, when an international symposium
is devoted to a controversial - and sometimes castigated work - such as the Seventh Symphony of Gustav
Mahler.'(3) It was this symposium that threw much light on the nature of the work and particularly its
individual parts. The symphony's overall 'meaning' is something that has troubled many commentators.
Typical is the opinion of Peter Franklin: 'The Seventh Symphony (1904-5) makes use of as wide a range
of allusive musical imagery as any of his works, while remaining mysteriously canny about its cumulative
meaning.'(4) Henry-Louis de La Grange voiced similar thoughts: 'Non seulement elle n'est accompagné
d'aucun "programme" qui permettre d'en décrypter le sens, mais elle ne semble pas, comme les autres
symphonies mahlériennes, avoir de grand dessin, de propos général suseptible de justifier le plan de
l'ensemble et la bizarrerie du détail.'(5) La Grange also gave extensive and sympathetic consideration to
the possibility of a programme, citing the influence of the poet Eichendorff and the ideas of Peter
Davison, Peter Revers, Willem Mengelberg and Alphons Diepenbrock without coming to any firm
overall conclusions.(6)
It is an article of many writers' faith that while the middle movements (II, III and IV) are among the
composer's most attractive creations, the first and last movements, for one reason or another, fail to
convince. The second movement, entitled Nachtmusik, gives a vivid picture of nocturnal activities, horn
calls, sinister marches with whirling counterpoints, birdcalls and even screams. The fourth movement,
also entitled Nachtmusik, is superficially a charming serenade, complete with mandolin and guitar. It has
an engaging character with its memorable melodies, simple repetitive rhythms and chamber-music like
orchestral textures. The central third movement, the scherzo, is also very nocturnal in character. Its
heading Schattenhaft (shadowy) clearly indicates this. The way that the accompaniment is built up in the
bass register, with timpani, cellos and basses, bass clarinet and horns, before the violins' whirligig runs
emerge, is typical. The volume, apart from a couple of violent outbursts, is kept low and the textures are
mostly delicately scored. It would appear to be generally uncomplicated.
The problem seems to involve the outer movements, the first and last. In various ways they are thought
not to be convincing. Let us look at these in turn. The first movement by various analysts is considered
somewhat diffuse in form. The slow introduction is integrated thematically into the allegro section, a
feature which should cause no problems, but the central part of the allegro appears to some to be a series
of sections of episodic character. There are sections which exemplify Adorno's 'suspension of time' and
file:///g|/poster3/OLoughli.htm (1 of 8) [18/07/2000 00:38:19]

frequently the music seems to lose its momentum. One can compare this movement to the equivalent one
of the Sixth Symphony. The latter's fairly straightforward classical sonata structure is not difficult to
follow, with the composer making his points in an orderly and traditional way, even if the content is
powerful and imposing. The first movement of the Seventh Symphony contains themes with a family
resemblance to those in the Sixth Symphony, but they are handled far less formally and much more
flexibly and probably for Mahler intuitively. The finale presents even more serious problems. It is said to
be untypical of the composer, especially obviously after the three very evocative night movements. The
almost forced joyous character of the movement comes with something of a jolt after the delicate
serenade. The main thematic material is uncommonly four-square for Mahler and has an uncanny
resemblance to the prelude to Wagner's Die Meistersinger. Rather than point an accusing finger at
Mahler, we should rather try to understand what is happening.
The polarisation of favourable opinion concerning the central movements on the one hand and the critical
opinion of the outer movements on the other hand suggests that Mahler's inspiration was in some way
faulty or that the work itself may have been misunderstood by its audiences. Mahler himself, always his
severest critic, was reportedly satisfied with the work and no less a musician than Arnold Schoenberg was
very impressed by the work.(7)
It is my contention that the overall significance or meaning of the work is lost if one concentrates too
closely on the individual parts, attractive though they are. It is a symphony not a suite and as such it can
be expected to present a coherent and unified message. Further, the traditional methods of analysis,
especially if used singly, are unlikely to illuminate the richness of the work. As Mahler's music responds
to approaches from so many different angles and perspectives, surely it is sensible to take a number of
different viewpoints together to see if the 'meaning' of the work can be better uncovered.
Analytical Methods
The music of Mahler is particularly rich in its features. Much more than the music of, say, the Classical
period, it can be viewed in a number of different ways. Consequently it is reasonable to believe that there
are many different analytical approaches which are valid for this music. Let us look at these in turn.
The traditional approach to music often uses some type of formal analysis. One can thus take a standard
model of a symphony, usually from the Classical period and map this model on to the work being studied.
The relationship that this reveals between the two can be considered in a number of ways: themes, keys,
relative lengths of the 'standard' sections are probably the most important. This form of comparison to a
notional model can of course be very revealing. On the most simplistic and obvious level, points of
similarity indicate a valid contact, while differences show the divergence from tradition. The broad
conclusion that one can arrive at is that one can recognise some common characteristics, even though
there are very many divergences.
One feature which suggests to us that Mahler's music does not adhere strictly to classical procedures is
the appearance of ideas that indicate some overt extra-musical influence. This comes in two forms. The
first is basically musical in nature, involving the complex use of dance forms and marches. Dances are
not restricted to consistent and uniform tempos and styles. They are organic features that are constantly
being developed and changed. The composer's use of marches and march-like music has a similar variety
in its use; it is equally liable to change its character, without any obvious internal musical explanation.
The second is the use of features that do not normally find a place in music, for example, birdsong, and
the sounds of cowbells, that hint at something outside the normal range of music. This imagery is a potent
feature in Mahler's music.
It is not difficult to imagine that some kind of narrative underlies the music. Although the idea of
narrative in music has generated a great deal of controversy, following Adorno, it has gained a wide

currency in discussions of Mahler's symphonies. The questions that one should ask are: in what way can
music 'narrate' and if so does this process apply to the music of Mahler?
We now move naturally into the area of musical hermeneutics, the study of the meaning of music, the
ultimate aim of this study. Herman Kretschmar, one of the pioneers of this type of study, rejected the
conception of music, deriving from purely formal considerations, of Hanslick and the so-called
Formalists, but he also rejected the poetising descriptions of much music writing of his day. He tried to
work out the real emotions which, he argued, were inherent in the music itself, drawing on biographical
and general historical data to support his explanations.(8)
In Mahler's case there is indeed much biographical information, including the diary and notes of Natalie
Bauer-Lechner, the letters and diaries of his wife, and his own letters. Detailed biographical work has
been done by Donald Mitchell and especially by Henry-Louis de la Grange that reveal many important
details. It is to some of these that we can turn to support what may be apparently speculative suggestions
of various interpretations.
Formal Models
Despite their complexity the broad structures of Mahler's Seventh Symphony can be related to traditional
symphonic forms. The first movement is a loose sonata structure with a slow introduction. That some of
the introduction's material is worked into the form of the main allegro should not concern us much at the
moment. The main allegro hinges around a thrusting march-like first theme and an important subsidiary
section (Mit großen Schwung, bar 118) which is strongly linear, with a rich melodic chromaticism and
frequent emotionally charged pauses on the second beats of the bar. It is not difficult to recognise a
similarity with the comparable music from the first movement of the Sixth Symphony, a point made by
numerous commentators.
Leaving aside for the moment the extended development section, we can see a certain regularity in the
recapitulation. The return of the adagio introduction (bar 338) and its transformation into the Allegro
come prima - maestoso (bar 373) is broadly similar to the opening, but significantly abbreviated by the
omission of the march which has played an important part in the music so far. The second theme (bar
465) charts a similar course, but now without the emotionally charged pauses found in the exposition. The
aspect that has drawn most comment is the extended development section (bars 171-337) which works
through numerous sub-sections of very varying tempos. In the slow sections time stands still: Meno
mosso (256-265), Etwas gemessener (bars 298-316) and the subsidiary theme (bars 317-337). The last in
B major forms a link to the return of the slow introduction. The later appearances of the march which was
first heard in the slow introduction are most interesting: the end of the exposition in which it is very loud,
then augmented as a slow chorale in the middle of the development (bars 256-265) and finally loudly in
the coda. We are entitled to ask what is the significance of these changes in the character of the march,
and further what is the relationship between the main themes and between them and the march itself.
The first of the Nachtmusik movements can be seen as a very varied sonata model, but Constantin Floros
identifies a plausible quasi-arch model.(9) The fact that this rather neat plan is strangely unrecognisable in
practice should alert us to the problems of interpreting the music in traditional terms. The sections that
return always come in a different form, something which demands some explanation. The Scherzo leads
to more difficulties of interpretation. The simple plan could look like this: scherzo with repeat, trio,
scherzo with repeat, coda. By further sub-dividing the movement into many sections, it is possible to give
some details of Mahler's micro-structural working, but this kind of analysis becomes pointless.(10) What
is in fact a patchwork of materials that are juxtaposed in various ways is a contradiction of the simple
plan. We might also ask why the first reprise is in the 'wrong' key - 'false recapitulation' is Berio's
phrase.(11)
The second Nachtmusik movement would appear more straightforward. Floros postulates the sequence:

introduction - main section - development - trio - recapitulation - coda.(12) There is a considerable rondo
feel about the music, although it is impossible to connect this with traditional rondo structures.
The problematic finale in some ways is easiest to understand in traditional terms, that of a baroque
ritornello. Floros identified six elements that are used for the ritornello theme. In only two of the eight
appearances of the ritornello (the first and the last) do all six elements appear; in all the others only some
(between one and four) elements are used, a procedure in line with Baroque practice.(13) Why did Mahler
use this strangely archaic formal plan?
In all five movements one can recognise some vestiges of traditional formal structures. However, in all
cases, there are good reasons to believe that there is much more to the music than mapping his music on
to an earlier model.
Musical Imagery
We now can look at one of the richest sources of clues in this symphony: the composer's use of musical
imagery. This can take a number of forms: birdsong, country sounds, military rhythms, dance-like
passages, references to other works of his own, melodies and melodic fragments previously set by Mahler
to words of some significance and quotations or quasi-quotations from other composers' works.
The opening movement seems to be carrying on the drama of the first movement of the Sixth Symphony -
the melodic shapes of the main themes are clearly related. The first Nachtmusik shows this in a
particularly vivid form. An early passage (bars 9-27) was said by Alma Mahler, the composer's wife, to
represent birdsong in its triplet woodwind figures. There are numerous military features that take their
inspiration from the Wunderhorn song Revelge, composed in 1899. Note especially the rhythm: quaver, 2
semiquavers, 2 quavers, crotchet. The three references in this movement (bars 28-29, 187-88, 337-38) to
the motto of the Sixth Symphony (major to minor chord shift) must hold some significance. The
appearance of cowbells as part of an episode that recalls the echoing horn calls of the introduction must
have some possibly pastoral significance. Peter Davison wrote: 'The presence of cowbells, echoing horns,
march music and exotic dance rhythms could initially seem to convey a[n] unconnected sequence of
extramusical significance'.(14) We can listen to the spectral whirligig music of the scherzo and imagine
all kinds of nocturnal activity, some of it very sinister. The second Nachtmusik, a movement that has
caused no end of problems for analysts using traditional criteria, is also full of evocative ideas that
conjure up the image of a beautiful serenade, loving played by the wind instruments with gurgling
accompaniments from the clarinets and gentle plucking from the guitar and mandolin. But perhaps the
most intriguing aspect of the work is the way that the finale seems to derive its main theme from the
prelude to Wagner's Die Meistersinger. Even the appearance of a derivative of Franz Lehár's Die lustige
Witwe ('The Merry Widow') would seem to have some hidden significance.(15)
The polarity between traditional forms which Mahler superficially follows and explicit programme music
(which Mahler rarely uses) is put very forcibly by Peter Davison: 'Complications arise from the need to
explain anomalous formal characteristics within the framework of traditional formal concepts, instead of
within a common-sense approach to the musical narrative.'(16)
Musical Narrative
Other clues to a narrative interpretation are readily forthcoming. The main thematic material of the
opening movement interacts in a very interesting way. The main allegro section has two groups of
materials that have something in common with the first movement of the Sixth Symphony. What is
interesting is the brief and innocent sounding march that first appears in the introduction (bar 19) which
shows dramatic transformations in its reappearances. What is being indicated by these changes? John
Williamson noted: 'In Mahler, Sonata form and march are frequently equated with motivic struggle, even
motivic disorder.'(17) There is indeed some conflict between the main allegro's processes and this march.

It appears fast (Flott) in bars 136-44 and bar 238, but very slowly and quietly in the central development
section (Meno mosso) at bars 258-65 and (Sehr gehalten) at bars 304-11. Its final appearance (Frisch) at
487-94 is very powerful. It seems to infiltrate itself into the other thematic material.
The Nachtmusik movements have strong connections to traditional forms in their reprises and
symmetries. One can sense that they are probably more descriptive than narrative in their nature. The
scherzo that separates them is different. It is the third of a series of developmental scherzos that Mahler
composed for these middle period symphonies. Its complex structure relates clearly to the traditional
scherzo in its overall plan: scherzo with repeat, trio, scherzo with repeat, coda. This simple plan conceals
the constant changes to the thematic material at each appearance. Very notable is the reprise, part of
which appears a semitone higher, in E flat minor rather than D minor. Also of some significance must be
the section marked 'Wild' (bars 416-20) in which there is a violent outburst from the trombones and tuba.
The sinister element in this movement disrupts the generally peaceful mood of the two Nachtmusik
movements.
The finale presents the conflict between form and content dramatically The most plausible model seems
to be a Baroque ritornello with the complete version of the opening part (all six elements) heard only in
the first and last of eight appearances. In the other reprises only between one and four elements are used.
The tonic key of C major is used in only the first three appearances and the last. Unlike in the first
movement, the music barely stops for breath. There are no slow episodes. Is there any narrative
significance in this plan? The return of the allegro music from the first movement must also surely have
some, probably narrative, significance. This type of reference to an earlier movement is, of course, a very
common procedure in Romantic symphonies.
So far we have only a disconnected group of suggestions about narrativity in this music. It does appear to
have the clear sense of direction found in its two predecessors. The Fifth Symphony presents in its first
two movements a conflict that ends in an attempt at a triumph (the D major chorale) which collapses into
fragments and a return to A minor. After an invigorating developmental scherzo and a beautiful
intermezzo (the Adagietto), Mahler takes us through a rondo that again rises to the D major chorale found
in the second movement, but this time it sustains its tonality and key right up to a final triumph.
In the Sixth Symphony, the process is turned on its head, or nearly so. The first movement contrasts a
vigorous minor-key march section with exultant major-key material (the composer referred to this as his
'Alma' music, referring to his wife). Taking the order of the movements that Mahler adhered to in his
lifetime (Andante moderato second, Scherzo third), this is followed by the calm idyll of the Andante
moderato. The parodistic scherzo shatters this calm and mocks the music and tonalities of the first
movement. It propels the narrative with progressively compressed appearances of the main scherzo to a
collapse whose tonalities link directly with the finale. This mammoth movement with its rondo-like
introduction superimposed on an extended sonata structure leads us through great striving for the same
goal as the Fifth Symphony. It is the three hammer blows, placed in somewhat unpredictable places, that
make the narrative convincing, with the final collapse horribly inevitable. Mahler's removal of the third
hammer blow seems to have been the result of a superstition about his own fate. Nothing so obvious can
be related to the Seventh Symphony, so are we asking the wrong questions?
Biographical Evidence
An intriguing piece of biographical information has a bearing on the Seventh Symphony. In 1908 (or
possibly 1909) Mahler conducted in Amsterdam a performance of this symphony, which was prefaced by
three works by Wagner: Eine Faust-Ouvertüre, Siegfried Idyll and the prelude to the opera Die
Meistersinger. This is contained in a letter from Amsterdam to his wife.(18) This may be an example of
Mahler's imaginative programme planning, but it could also be a clue to the inspiration for the symphony.
There is a considerable amount of evidence that Mahler used ideas from his own and other composer's
music in his own works. Inevitably these ideas are modified, sometimes nearly out of recognition. They

occur with such frequency that they can hardly be considered incidental. In his article on the
phenomenon, Henry-Louis de La Grange presents a large number of reasons for the composer doing
this.(19)
A Faust Symphony?
The first suspicion of some underlying idea in the choice of works might be the presence of the name of
Faust. Mahler was very familiar with Goethe's Faust, as his setting of the last section of part 2 in his
Eighth Symphony was to show. The Faust Overture might also be the catalyst for the structure of the first
movement of the Seventh Symphony. The overture itself is a single-movement allegro (Sehr bewegt) with
a slow introduction (Sehr gemessen). Despite the fact that it lasts only ten minutes and its internal
construction is relatively straightforward, it could have acted as a distant model for the first movement of
the Seventh Symphony. It was originally intended in 1840 as an overture to Goethe's Faust Part1. Is the
connection with this idea just a coincidence or are we looking at a Faust symphony? If one does follow
the Faust theory, a great many apparently disconnected features fall into place (see Table 1).
The plan of the first movement presents less of a problem than has been suggested. The two main groups
of the allegro are clearly differentiated in character. It is possible to imagine that the thrusting first theme
stands for Faust himself and the more romantic and slightly sentimental for Gretchen. The fact that the
latter corresponds with the section in the Sixth Symphony that Mahler said represented his wife, Alma,
adds further corroboration to this idea. The march that appears briefly in the introduction of the Seventh
Symphony can be seen as a disruptive element, as a sinister and slightly threatening element at first and a
much more powerful one at the end of the exposition. Its appearance, now as a quiet chorale slowed down
so that it is almost unrecognisable, in the slow episodes of the middle of the movement, is calm and
restrained. In the recapitulation it does not return at first, presumably because but it was held back until
the end of the movement where it makes an aggressive reappearance. If we follow this, the three elements
are seen to be in some kind of unresolved conflict.
The vivid and picturesque first Nachtmusik seems to be the dreamy Faust himself. There are recollections
of the countryside and memories, some slightly threatening. The setting of night is entirely in character
with what we find in Goethe's Faust; scene after scene has a nocturnal setting. The sub-plot of the nature
poetry of Eichendorff fits in perfectly with this. The sinister element which casts its shadow on the scene
is the three appearances of major-minor chord shift that acted as the motto for the terror in the Sixth
Symphony. They are almost like the three hammer blows in the finale of the latter. This should prepare us
for the terrifying experience of the scherzo.
Without using a Wagnerian point of reference, the scherzo can be seen as a sinister night ceremony,
perhaps even an encounter with Mephistopheles. There are screams, 'things that go bump in the night',
and eerie rustlings. The reprise that is in the 'wrong' key can be thought of as a bad omen. Then there is at
bar 146 a savage outburst, from the trombones and tuba, marked Wild, that seems to be the final waltz of
the devil. What follows is a typically Mahlerian collapse, with disjointed fragments that disappear into
nothing just as in the scherzo of the Sixth Symphony. What can be the significance or meaning of this
movement? One possibility can be found toward the end of part 1 of Goethe's Faust. The scene called
'Walpurgisnacht' concerns a nocturnal meeting in the Harz Mountains between Faust and the devil,
Mephistopheles. The Nachtmusik that follows seems to know nothing of what has taken place. The
delightful serenade that Peter Davison maps on to Wagner's Siegfried Idyll, the latter composer's song of
love for his wife, appears as a way of obliterating the memories of the scherzo. It is not implausible to
think of this as a Gretchen movement.
This brings us to the finale. Mahler clearly wants some sort of redemption. In the Fifth Symphony, he
achieved it on the second attempt. In the Sixth Symphony, he failed heroically, despite a moment of
tranquillity early on. That work's Mephistophelean scherzo destroyed that peace. In the Seventh
Symphony Mahler must have wanted to expunge the overwhelming experience of the Sixth's finale. It

emerges then as a headlong and joyous affirmation of his belief in love. The quasi-quotation from the
prelude to Wagner's Die Meistersinger must surely confirm this - the story of Walter in the opera
vindicates his belief in love, something that will triumph over everything. Mahler did not want the idea to
be lost, so he hardly lingered at all in this finale. There are no slow episodes and the second main material
is specifically marked to be played at the same speed as the first. Just in case the music did not convince
Mahler made a second attempt to represent the redemption of love by a woman, in the finale of the Eighth
Symphony. This time he made no mistake: it was explicit and open.
There is one question that remains to be answered and that is, if one accepts this Faustian interpretation of
the Seventh Symphony (and there will be many who find it impossible agree with the points presented
here), to what extent are we talking about Faust, the mythical hero, or are we really talking about Gustav
Mahler, the composer himself. Because some of the earlier symphonies, particularly the First and Sixth,
do seem to be concerned with a hero who can easily be seen as a parallel with Mahler, it is not
unreasonable also to connect the alleged Faust figure of Seventh Symphony with the composer. In that
case we are now dealing with another 'biographical' work whose secret has been for so long been hidden
in the felicities of the quite remarkable nocturnal central movements and the controversies of the
perplexing outer ones.
Table 1
Mahler: Seventh Symphony
Proposed Programme
Main Tempos Title Possible Model Possible Programme
1 Adagio-Allegro Wagner's Eine Faust (Allegro risoluto, bars 50 and

Faust-Ouvertüre following) and Gretchen (l'istesso tempo,
bars 118-134) conflicting with
Mephistopheles (march, bars 19-25)
2 Allegro Nachtmusik Faust wandering in the countryside at

moderato-molto night, with three appearances of the
moderato major-minor 'fate' motive
(Andante)
3 Schattenhaft Scherzo Walpurgisnacht - the nocturnal meeting

of Faust and Mephistopheles as in
Goethe's Faust Part 1
4 Andante amoroso Nachtmusik Wagner's Sigfried 'Love, love, love' based on Wagner's
Idyll Siegfried Idyll
5 Allegro ordinario Rondo-Finale Wagner: Prelude to The triumph of love over the devil,
Die Meistersinger Mephiostopheles.
and Lehár's Die
lustige Witwe
References

1. Deryck Cooke: Gustav Mahler: An Introduction to his Music, Faber, London, 1980, p.88
2. Cooke,1980, p.88
3. James L. Zychowicz (ed): The Seventh Symphony of Gustav Mahler: A Symposium, University of
Cincinnati, Cincinnati, 1990, p.v
4. Peter Franklin: The life of Mahler, Cambridge UP, Cambridge, 1997, pp.158-59
5. Henry-Louis de La Grange: 'L'énigme de la Septième', in Zychowicz (ed), op.cit., p.13
6. Henry-Louis de La Grange: Gustav Mahler, Vienna: Triumph and Disillusion (1904-1907), Oxford
UP, Oxford, 1999, pp.849-53
7. In Alma Mahler: Gustav Mahler Memories and Letters, ed. Donald Mitchell, John Murray,
London, 1968, pp.325-27
8. Tibor Knieff: The New Grove Dictionary of Music and Musicians, Macmillan, London, 1980, vol
8, p.511
9. Constantin Floros: Gustav Mahler: The Symphonies, Scolar, Aldershot, 1994, 198-99
10. See my investigation in Niall O'Loughlin: 'The Rondo in Mahler's Middle Period Symphonies:
Valid Model or Useful Abstraction', Muzikološki zbornik 35, 1999, p.138
11. Talia Pecker Berio: 'Perspectives of a Scherzo', in Zychowicz (ed): op.cit., p.88
12. Floros: op.cit., p.204
13. Floros: op.cit., pp.206-11; Martin Scherzinger: 'The Finale of Mahler's Seventh Symphony: A
Deconstructive Reading', Music Analysis vol 14, 1995, no 1, pp.69-88
14. Peter Davison: 'Nachtmusik I: Sound and Symbol', in Zychowicz (ed): op.cit., p.68
15. Henry-Louis de La Grange: 'Music about music in Mahler: reminiscences, allusions or quotations?',
in Stephen E. Hefling: Mahler Studies, Cambridge UP, Cambridge, 1997, p.166
16. Peter Davison: 'Nachtmusik I: Sound and Symbol', in Zychowicz (ed): op.cit., p.68
17. John Williamson: 'Mahler and Episodic Structure: The First Movement of the Seventh Symphony',
in Zychowicz (ed): op.cit., p.34
18. Alma Mahler: Gustav Mahler: Memories and Letters, ed. Donald Mitchell, John Murray, London,
1968, pp.308-9
19. See in particular: Henry-Louis de La Grange: 'Music about music in Mahler: reminiscences,
allusions or quotations?', in Stephen E. Hefling: Mahler Studies, Cambridge UP, Cambridge, 1997,
pp.122-68
Back to index

Music Cognition: The Relationship of Psychology and Music
Proceedings paper

Olin G. Parker, Professor Emeritus, School of Music, University of Georgia, Athens, GA U.S.A.
How do humans experience music? What is the significance of the music experience? Why are
musical activities inexorably psychologically linked--in all cultures? What does music communicate
and how does it do this?
The central focus of this paper will be music cognition. Cognition is defined as the internal processes
of memory and 'thinking'. (Radocy and Boyle, 1997, p.4)
Many writers now consider that music cognition is a 'dominant domain' while recognizing the
traditional domains--cognitive, psychomotor and affective). The psychologically governed activities,
including perception and cognition, are a priori to musical behavior, which is one important aspect of
human behavior. The study of musical behavior does include related cognitive and perceptual
processes. Neisser (1997, p. 247) opines that cognitive psychology and related disciplines have made
important and ecologically valid discoveries in our continued amalgamation of the discipline of
psychology and music. Gaston (1957) weighed in much earlier by reminding us that musical behavior
is studied through psychology, anthropology and sociology.
Music is a product of the mind. The vibrational (physics of the cosmos) elements of frequency, form,
amplitude, and duration are not music to the human until they are neurologically transformed to, and
interpreted by the brain--as pitch, timbre, loudness, and time (the tonal frame). (Parker, 1990, p. 165)
This transformation into music and the human's subsequent responses (behaviour) is unique to the
perceived (cognition) because of the human's greatly developed cerebrum on the basis of that human's
past musical experiences. Psychology, the science of mind and behaviour, is, then, a means of
understanding the necessity of, and how the proper sensory environment (both direct and indirect
music education) results in enhanced development of the brain--the enrichment of the human's life.
Radocy and Boyle (1997) briefly explain that the overall neural network includes sensory, motor, and
interconnecting neurons. Most of the brain's neurons are interconnectors; they may be considered as
part of a gigantic interrellated computational network. Learning must involve enhancing the meaning
and efficiency of communications among functional units of neurons. Musical inputs, as all sensory
inputs, stimulate neural structures. Current sensory information may be compared with the stored
representative record of prior experiences, thereby guiding the organism's musical behavior.
Humanistic psychology is the point of view from which this article is written. The study of mankind is
based on the assumption that, as human beings, we are free and hence responsible for our actions and
the consequences to our well-being and growth. The humanistic orientation is based upon the view of
man as a conscious being capable of freedom and responsibility in the conduct of his life which are
his--the characteristics which make him uniquely human and separate from other living life in this
world. It is also interesting to note at this point that man, as distinguished from other living objects,
has no innate ways to experience the world or to act. He simply is an imitator of the examples that are
available to him and an inventor of new ways to be in the world.
It is to this latter point that I will address my remarks, mainly that man is a product of his

environmental influences. This is what is most fundamental about man's nature, that is, that he has to
learn everything. The 'nature' of man is that he acts in the world as a being with the limits, strengths
and weaknesses of which he is persuaded. This peculiarity of man (his vulnerability to influence and
persuasion) can and should be the basis for his growth in all types of development and transcendence.
As we think about the whole matter of education, we should remember that the history of man is that
he is always surpassing what once was believed to be ultimate limits--it clearly matters what people
believe to be true about man's potentialities. For example, we now think a child's cognitive abilities
are developed much earlier than was previously thought. Pre-kindergarten training is now known to
be pedagogically essential (Gordon, 1986), even pre-natal 'education' is presently widely researched.
When does the music experiencing begin? The music environment of the young child has long been
the concern of professional musicians, but little has been discovered about the effects on the foetus of
the music and sound environment until the last two decades or so. Wilkin (1994), after a decade of
research, reports that as early as 38 weeks gestation, the foetus appears to be selective in the music to
which it responds--thus she opines that the learning process in humans begins before birth. Also,
during the last two months of gestation it was possible to condition the human foetus. Abrams and
Gerhardt (1997) offer more details to substantiate the case for prenatal music for the foetus by saying
that the degree to which airborne music is heard by the foetus depends on the attenuation features of
the abdominant surface, the distance to the foetal head and the low pass features of bone conduction.
However, music produced by mechanical vibrators against the skin is transmitted more effectively and
with less attenuation to the amniotic fluid. (It has been known for many years that the peripheral and
central components of the auditory system are formed and functional prior to birth.) In consideration
of the continuing research results being reported on the importance of aural experiences for the
unborn child, this educative process may become recognized as equally important as we now
recognize the place of musical experiences in early childhood development.
What does the 'foetal training' have to do with music cognition? The aural sense, being the first of the
five senses to be physically developed, can be musically stimulated, thus enhancing brain
development in its functioning--and the sooner begun, the more learned in the lifetime of the human.
Hodges (March, 2000),although making no claim about the efficacy of pre-natal musical training,
reported that we are knowing more and more about the environment (musical) producing physical
brain changes (wiring and structure), that music influences brain development, and that, because of
the plasticity of the brain, early development (music stimuli) will have a decisive, long-lasting impact.
Contrarily, early negative (no music) experiences may have dramatic and sustained consequences.
One concern very basic to those of us in the field of pedagogy (and all musicians are music
pedagogues) is to shape experiences as a way for persons to act, guided by intelligence and respect for
life, so that needs are satisfied and that they will grow in awareness, confidence and the capacity to
experience meaningfully the humanly unique aspects of life. Music experiences minister to the three
basic 'needs' of the human, viz., spiritual, emotional, and intellectual (the first of the three, spiritual, is
related to man's need to transcend himself. . . and music is the major enhancer of the Individual's
worship experiences. Also, two categories into which responses to music by the human have been
labeled are affective and aesthetic. Most authors have focused on the differences and similarities of
the labels while a very few have defined them as distinct one from the other. For most, the aesthetic
experience is an intense, subjective, personal experience that includes some mood, emotional, or
feelingful aspect , that is a component of the affective response. The affective response is generally
discussed as a less superficial response than the aesthetic response/experience. Such terms as taste,
attitude, or preference have been used inconsistently. (Parker, 1978)
The relationship (interaction) of psychological concepts and music is also exemplified by the fast

growing discipline of Music Therapy which needs to be mentioned here in the context of the
importance of musical experiences in the human's life. Mental illness is certainly a psychological
aberration which is treatable in a music therapy therapeutic setting. Oversimplied for purposes of this
discussion, although a person may have withdrawn from reality (is non-communicative), he may
refuse to hear/interpret the spoken word, but he cannot refuse to hear the music stimuli and respond to
if IF the music stimuli is remiiniscent of his past (learned) music experiences, because those have
been encoded in his brain.
The origin of psychology (as a discipline) was that it was the‚ science of the soul (psyche equals soul
or mind, logos equals science). However, we are not always able to distinguish between what
concerns the body and what belongs to the mind. In applying knowledge and research from
psychology in music to the endeavors of musically educating, we need not think of splitting the
soul/mind and the body--we simply deal with the musical behavior of the individual--holistically, that
is.
Each human has a need for music (there is no society/culture that does not have a music). Therefore,
what music, when? Briefly, Sloboda (1998) opines that feelings are tied to musical structure, that
there is a consistency in musical responses given the relatively same environment. Basically he is
saying that there is a strong cognitive content in the emotional experience that results from the
musical stimuli. He is making a strong case for the cognitive approach in the experiencing of music
stimuli.
There should be constant interaction between the fields of music and psychology, because
psychologists are interested in the interpretation of human behavior and because music, in addition to
being an art, is a form of human behavior which is unique and powerful in its influence. Some
questions which come to mind, in attempting to successfully investigate this uniqueness and influence
of music on the human's behavior might include:
1. What is the language aspect/syntax of music?
2. What is the appropriate combination of aural, visual, and kinesthetic experiences? and,
3. From where does the meaning of music emanate? In searching for answers to such questions,
perhaps the most far-reaching endeavors by the music profession were the three Ann Arbor
symposia, held in 1978, 1979, 1982, involving circa 150 musicians and psychologists.
The format of the symposia centered around questions/presentations by 12 musicians and responses
by 12 psychologists. The six categories selected for discussion and debate were: auditory perception,
motor learning, child development, cognitive skills, memory and information, and affect and
motivation. Subsequent publications, presentations at profession meetings, and seminars were
developed, and answers promulgated/published disseminating the long term pedagogical goals of the
music profession. Out of the intense discussions between musicians and psychologists at the Ann
Arbor symposia important benefits are coming, not only pedagogically, but for institutional
collaboration and for reaching out to other disciplines, e.g., all the social sciences as well as the pure
sciences.
At this time it would be germane to generalize on questions/answers under each of the six categories.
As to auditory perception, people come to any task, especially music, with their sum heritage,
training, and disposition--with apparent auditory limitations depending more on the limitations of the
stimulus than on the person's ability to discriminate. As to motor learning, the 'staggering' complexity
of music performance, psychomotor execution is still drawing a paucity of research in the discipline;
only speculative replies were managed at the symposia, such as 'globals' over 'locals'. Wilson (1986),
with much more lucidity offered helpful understandings in motor learning, such as answering why it is

important to practice slowly.

In the area of child development, inextricably wound up with developmental psychology, some
'useful' answers, albeit pluralistic rather than singular, were proffered. Inconclusive research findings
included the warning that no one theory provides all the snswers. As to the area of cognitive skills, the
participants alluded to the relevance of certain psychological views to music education. The musicians
lamented that there was a critical need for a learning theory in music. The discussion culminated with
the predictions that, as with the physical sciences, where seemingly abstruse research does produce
unexpected practicall results, the same can and likely will happen in the music discipline. Recent
writings, e.g., by Hargreaves and Zimmerman (1992) are illustrative of those 1979 predictions.
In regard to memory and information processing, the issue of 'what to teach and when to teach'
surfaced in conjunction with the basic question of 'what is music'?
Although mention was made of the current brain hemispheric research reports, the symposia
participants were admonished to basically reject the information-processing theorists' analogies drawn
between the computer and the brain. Another chastisement, in reference to the nature-nurture issue
was that psychological researchers most often are not musicians and their subjects are non-musicians
whereas musician researchers often err in the opposite direction by extrapolating what goes on with
skilled performers to what goes on with children. Generally, the consensus centered on continuing to
straddle the nature-nurture issue with the idea that human perception is determined not only by the
stimulus, but also by the experiences and expectations of the perceiver.
The session on affect and motivation included an exhaustive review of research pertaining to the uses
of music as motivating reinforcement with relation to environmental factors. The lament here was in
reference to the dearth of systematic research to undergird pedagogical sequencing for aesthetic
education and aesthetic growth. One psychologist did counter (in connection with extrinsic and
intrinsic motivation discussions) by stressing individual differences and stating that people are
changed by experiences and that, as they get older, they change in predictable systematic ways.
Following the symposia for 1978 and 1979, a number of meetings and workshops devoted to the
symposia were held in Miami, Minneapolis, and Akron as well as on numerous college campuses. The
purpose of the third symposium (1982) was to summarize current knowledge and theory in motivation
and creativity and to apply the results to the teaching and learning of music at all levels. In both areas
the discussants emphasize the concept of self-competence, that this is an intrinsic motivation which
develops over a period of years, beginning in the early stages of learning with extrinsic rewards often
necessary to get students past their perceived lack of progress and boredom. Emphasis was prevalent
on the matter that ability is learnable and changeable, that, for the development of creativity, we
actually must teach the appropriate technical masteries and stylistic sensitivities. What psychology
must teach us is that the field's techniques and styles function as tools and means rather than as
obstacles and hurdles.
Inextricably prevalent in the facts, theories, speculations, and doubts opined throughout the aegis of
the symposia in regard to learning music, were the major themes of individual differences, teaching
goes on at all ages and levels, and that sequencing of experiences is a priori. Obviously, these are in
the domain of music cognition as well as all of life's learnings. The symposia constituted an important
base for future research and applications of the research in music.
A more recent symposium bringing together psychologists and musicians was held at Uppsala
University in Uppsala, Sweden, 7-12 June 1997. This was the Third Triennial Conference of the
European Society for the Cognitive Sciences of Music (ESCOM). There were 162 presenters of

refereed research papers, representing 26 countries. The presenters were about evenly divided--50%
psychologists and 5)% musicians, Hodges (1997) addressed one of the main concerns of both
disciplines as follows:
For its current richness, music psychology is a fractionated discipline. Muchhigh quality
work is going on here and around the world, but often without an overarching conception
of the field as a whole. One of the reasons this is so that differences in training may keep
us from a more coherent and complete view of the field. . .simplistically, musicians may
regard some psychologists' research as musically naive, while psychologists may view
the research of some musicians as less than rigorous. Other differences arise between
basic and applied researches, between researchers and practioners, between differences
based on geography or language, etc. (p. 33)
The ESCOM Conference was, in essence, a call for unification of the goals and objectives of the
researchers and practitioners of the two disciplines, as promulagated by Hodges (1997): . . .one of the
simplest and quickest changes to make would be for everyone working in the field to adopt a multi-
and interdisciplinary view of music psychology. That is, we should be encouraged to read broadly and
to adopt a holistic attitude toward musical behavior. (p. 40)
It is my feeling that the implications of what the psychologists can tell us about such things as:
1. Is there a difference between music perception and general perception;
2. Is there transfer of auditory perceptual skills to music from other modes of listening,; and,
3. Is there a best learning time, mental development, for the acquisition of music perceptual skills?
For another instance, could it be that pitch and not rhythm is the crucial factor in learning
music, as some psychologists are now suggesting, which is contrary to what we have believed
these many years?
We are also having some reports of research that tell us that we do not learn to read music as we learn
to read our language, viz., learning musical syntax is different than learning language syntax.
(Hodges, March, 2000) The foregoing idea is supported by a previous study reported by Storr (1992)
wherein the left hemisphere was sedated while the right hemisphere was in a normal state. The subject
exhibited the emotional effect of music that was heard. As practicioners, we will have to remember
that research reports will be limited in effectiveness only as we fail to put them into practice. It is our
concern to provide the appropriate environment for the human as a child to coincide with his
developmental potentials.
Another major source of confirmation is Gagne's (1970) hierarchies of learning which provide an
understanding of essential conditions for learning, labeling eight learning types for simple to complex,
the prerequisite capabilities, and the external conditions of learning. The empirical adaptation of these,
in connection to the cognitive, psychomotor, and affective domains, has helped music teachers in their
successes because there is a simple to complex continuum for almost all music outcomes. This
continuum is irrevocably ties to the kind of learning nexessary for step-by-step achievement.
Developmental psychology has influenced music education curriculum makers to almost establish
firm and inalterable stages, determined by age for the musical development for children. This means
that steps in musical development are linked with age and an exact coincidence of a certain level of
musical development for the certain age is said to exist. Michel (1973) warns against the
establishment of such narrow and rigid age limits because they can have a serious effect on the whole
range of music education which would lead to selecting teaching materials matched with each stage
and any transgression of the age stages is branded as an educational climb. He feels that this rigid

view overlooks the fact that the essential process of musical development is always an individual
matter in which age can be one fact among sever, and not the sole determinant. He suggests that 'far
greater significance be attached to the opportunities that exist for practical music activity on the part
of the child that is his active dialogue with the musical phenomenon of his environment'.
Michel does suggest that music teachers should not wait for the development alleged to occur
spontaneously at particular age levels, but that teachers must provoke and encourage development to
the fullest possible extent by conscious organization of musical activity. Not only is it imperative that
in the training of younperformersÓg children that we teachers be concerned about the nature of their
acquiring perceptual technical skills (cognitive and psychomotor) but we must additionally concern
ourselves with the training of each individual to have the most possible significant aesthetic
experiences.
Significant aesthetic experiences, it follows, are based on choices that have been developed from past
musical experiences, mainly I am referring to musical tastes--which are defined as 'stable, long-term
preferences to particular types of music composers, or performers'. (Russell, 1997, p. 141). Tastes
develop out of experiences gained in home, church, club, school, and out of contacts with the concert
stage, recordings, radio, television, and the printed page. The agencies of education, propaganda, and
censorship help persons to revere certain composers and/or performers, and to take less seriously
other composers and their works. Age, intelligence, special training, and all musical experiences can
be important variables in this process of taste formation. In summary of the acquisition of how we
come to acquire several standards of musical taste, musical taste is not whimsical and it is
culture-bound, not culture-free.
The idea that the need for music is universal is a viable premise. Therefore, if there is music in every
society, there will be some sort of music education in every society, whether formal or informal.
Music is learned, wherever; therefore the principles of learning are happening, hopefully
psychologically based. Thus, the disciplines of music and psychology are compatibles and not
alternatives. This interdependence becomes more profound as the mental processes develop
sequentially, and as the individual develops a set of music preferences/tastes. The sheer ubiquity of
music's presence in each society, whether as an art form or in a functional mode, establishes music as
a cultural activity, an artifact, which shapes and controls so much of human behavior in an
all-pervasive manner.
Finally, then, teachers/performers of music must search out and employ all the psychological avenues
of learning in order to be a major force in the achievement of the umo universale --that maximal
musical literacy does result for all the world's societies. Utilizing psychological principles of learning
in serving as purveyors of cultures, teachers of music will serve a catalysts in achieving world-wide
musical literacy. And, this must be achieved before the world's population can experience the
sought-for profundity of humaness! 'We are the world.'
References
Abrams, R. M. & Gerhardt. K. K. (1997). Some aspects of the foetal sound environment. In I. Deliege
& J. Sloboda (Eds.), Perception and cognition of music(pp. 83-99). Hove, East Sussex: Psychology
Press Ltd.
Bannon, N. (1999). Out of Africa: The evolution of the human capacity for music. International
Bjorklunmd, D. F. (1995). ChildrenÕs thinking: Developmental function and individual differences.
New York: Brooks/Cole Publishing Company.

Bowman, E. (1998). Universals, relativism, and music education. Bulletin of the Council of Research
in Music Education, 135, 1020.
Boyle, J. d. (1992). Evaluation of music ability. In R. Colwell (Ed.), Handbook of research on music
teaching___ and learning (pp. 247-265). New York: Schirmer Books.
Carlsen, J. C. (1979) Ann Arbor symposium: A forum on the psychology of music education. Journal
of Research in Music Education, 27(1), 51-52.
Crozier, W. R. & Chapman, A. J. (1985). Psychology and the arts: The study of music. Music
Perception, 2(3), 291-298.
Crummer, G. C., Walton, J. P., Wayman, J. W., Hantz, E. D., & Frisina, R. D., (1994). Neural
processing of musical timbre by musicians, nonmusicians, and musicians possessing absolute pitch.
Journal of the Acoustical Society of America, 95(5, Part l), 2720-7.
Davidson, L., & Scripp. L. (1992). Surveying the cognitive skills in music. In R. Colwell (Ed.),
Handbook of research on teaching and learning music (pp. 392-413). New York: Schirmer Books.
Frisina, R. D. & Walton, J. P. (1988). Neural basis for music cognition: Neurophysiological
foundations. Psychomusicology, 7(2), 99-107.
Gagne, R. M. The conditions of learning.New York: Holt, Rinehard, and Winston, Inc., p. 33___4.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences.New York: Basic Books.
Gaston, E. T. (1957). Factors contributing to responses to music. Music therapy. Lawrence, KS: The
Allen Press, pp. 23-30.
Gordon, E. E. (1986, April). Musicality: Preschool/early childhood pedagogical. Paper presented at
the biennial meeting of the Music Educators National Conference, Anaheim, CA.
Hargreaves, D., & Zimmerman, M. P. (1992) Developmental theories of music learning. In R. Colwell
(ed.), Handbook ofresearch on music teaching and learning (pp. 377-391). New York: Schirmer
Books.
Hodges, D. A. (1997). Standing together under one umbrella: A multidisciplinary and
interdisciplinary view of music psychology, In A. Gabrielson (Ed.), Proceedings: Third triennial
conference of the European Society for the Cognitive Sciences of Music (pp. 33-42). Uppsala,
Sweden: Uppsala University Press.
Hodges, D. A. (2000, March). Brain research applied to music education. Paper presented at the
biennial meeting of the Music Educators National Conference, Washington, D. C.
Killam, R. N., & Baczewski, P. (1985). The perception of music by professional musicians. In G. C.
Turk (Ed.), Proceedings of the research symposium on the psychology and acoustics of music (pp.
71-82). Lawrence, KS: University of Kansas Press.
Kleinen, G. (1997). The metaphoric process: What does language reveal about music experience? In
A. Gabrielson (Ed.), Proceedings: Third triennial conference of the European Society for the
Cognitive Sciences of Music (pp. 644-649). Uppsala, Sweden: Uppsala University Press.
Latham-Radocy, W. B., & Rodocy, R. E. (1996). Basic physical and psychoacoustical processes. In
D. A. Hodges (Ed.), Handbook of music psychology ((pp. 69-82). San Antonio, TX: IMR Press.

Lehmann, A. C. (1997). The acquisition of expertise in music: Efficiency of deliberate practice as a

moderating variable in accountiong for sub-expert performance. In I. Deliege & J. Sloboda (Eds.)__,
Perception and cognition of music (pp. 161-190). Hove, East Sussex, UK: Psychology Press Ltd.
Lipscomb, S. D. (996). The cognitive organization of music sound. In D. A. Hodges (Ed.), Handbook
of music psychology (pp. 133-176). San Antonio, Tx: IMR Press.
Lipscomb, S. D., & Hodges, D. A. (1996). Hearing and music perception. In D. A. Hodges (Ed.),
Handbook of music psychology (pp. 83-132). San Antonio, TX: IMR Press.
McDonald, R. T., & Simons, G. M. (1989). Musical growth and development__: Birth through six.
New York: Schirmer Books.
Michel, P. (1973). Optimum development of musical abilities in the first years of life. The Psychology
of Music, 1(2), 14-20.
Niessner, U. (1997). The future of cognitive science. In D. M. Johnson & C. E. Erneling (Eds.), The
future of the cognitive revolution (p. 247). London: Oxford University Press.
Parker, O. G. ((1990). Facing the future with contemporary pedagogical connections. In J. P. H.
Dobbs (Ed.), Music education: Facing the fut___ure (pp. 165- 174). Helsinki, Finland: ISME.
Parker, O. G. (1997). Cognitive knowledge of the basic elements of music. In A. Gabrielson (Ed.),
Proceedings: Third triennial conference of the Europeon Society for the Cognitive Sciences of Music
(pp. 308-314). Uppsala, Sweden: Uppsala University Press.
Peretz, I. (1993). Auditory agnosia: A functional analysis, In S. McAdams & E. Bigard (Eds.),
Thinking in sound: The cognitive psychology of human audiation (pp. 199-230). London: Oxford
University Press.
Radocy, R. E., & Boyle, J. D. (1997). Psychological foundations of music. Springfield, IL: Charles C.
Thomas Publisher, Ltd.
Reimer, B. Aesthetic behaviors in music. Toward an aesthetic education. Washington, D. C.: MENC.
Rideout, R. R. (1992). The role of mental presets in skill acquisition. In R. Colwell (Ed.), Handbook
of research in music teaching and learning (pp. 472-479). New York: Schirmer Books.
Russell, P. A. (1997). Musical tastes and society. In D. J. Hargreaves an__d A. C. North (Eds.), The
social psychology of music (pp. 141-155). London: Oxford University Press.
Sloboda, J. A. (1985) The musical mind: The cognitive psychology of music. Oxford: Clarendon.
Sloboda, J. A. (1998). Does music mean anything? Musicae Scientiae, 1(1), 21-32.
Trehub, S., Schellerbert, E., & Hill D. (1997). The origins of music perception and cognition: A
developmental perspective. In I. Deliege & J. Sloboda (Eds.), Perception and cognition of music (pp.
103-128). Hove, East_Ž Sussex, UK: Psychology Press Ltd.
Umemoto, T. (1997). Developmental approaches to music cognition and behavior. In I. Deliege & J.
Sloboda (Eds.), Perception and cognition of music (pp. 128-142). Hove, East Sussex, UK: Psychology
Press Ltd.
Wilken, P. E. (1994). The foetal auditory environment: Possible effects of music on the human infant.
Presented at the XXI World Conference of the International Society for Music Education, Tampa,

Florida.
Wilson, F. (1986, April). The neurological basis of musical ability. Paper presented at the biennial
meeting of the Music Educators National Conference, Anaheim, CA.
Back to index

From: Dominique Richard <drichard@seas
SKIN, FLESH, BONE AND MARROW: THE BODY IN MUSIC
Dominique M. Richard
drichard@seas.upenn.edu
Background.
Application of the sciences to music often lead to formalisms explaining

perception and cognition. Characterizing the influence that such formalisms
have on music making, especially in environments where computers are used, is
therefore critical. Consequently, I began clarifying aspects of the
relationships that music has with sciences and technology but often found this
relationship wanting because aspects of musical experience escaped the details
of scientific formalisms and technological descriptions.
Aim.
However, perspectives on cognition, representation and the self elaborated in

contemporary philosophy can help complement the scientific approach by
acknowledging that music making is a historical "Event" which goes beyond its
ontological description in the language of the sciences. Specifically, the
modern philosophical theory of the "Event" can provide a coherent framework to
consider at once the formal scientific discourse, the non-discusive musical
practices and the subjectivities which negociate them. Such a model underscores
the centrality of subjectivation which fundamentally defines the musical
subject in the act of music making, a subject which pre-suppose a physical and
a social body to manifest itself.
Main contribution.
I therefore elaborates on the process of subjectivation as it occurs in music

and more specifically seek to understand how it can be interpreted as an
expression of our relationship to the world through our body. Philosophically,
this perspective seeks a "middle way" between a rational approaches described
in Husserl's phenomenology and empirical approaches exemplified in Deleuze's
work. This synthesis dwells on the theory of a modern reconfigured "subject"
where interactions with the world is seen as a "forcing" or an exertion rather
than as a transcendental judgment.
Implications.
Because this reflection seeks to enhance our understanding of the relationship

between our embodied mind and formalization it has obvious implications for
computer music and for other approach to musical composition as well.
Back to index
file:///g|/poster3/Richard.htm [18/07/2000 00:38:22]

Dr. Carlos Xavier Rodriguez
AMERICAN AND ITALIAN CHILDREN'S RECOGNITION OF THEIR MUSICAL INTERPRETATIONS: A

CROSS-CULTURAL ANALYSIS
carlos-x-rodriguez@uiowa.edu
Background:
Musically expressive behavior is often regarded as the domain of the mature

artist, which underestimates children's instinctive use of music for a variety
of interpretative purposes. We must learn more about how these skills develop
during childhood. An earlier study by the first author sought to determine
whether children of various ages could distinguish their interpretation of a
composition from the interpretations of others, and to explore the sensory,
cognitive, and sonic factors invoked in this process. Multiple age groups were
used to explore the emergence of this skill, and possible age-related changes
in this skill.
Aims:
The purpose of this study was to replicate the earlier study using Italian
children in order to compare their performance with that of American children,
with particular attention to the distinctions that might obtain in a different
cultural environment.
method:
Sixty Italian children aged seven, nine, and eleven performed a MIDI file using
the computer keypad as a trigger for musical events. Four days later, the
children were asked to identify which of three performances of the same MIDI
file from within their age group was their own. The children also verbally
explained their decision. Three sets of factors were used to categorize the
subjects' responses; sensory variables, cognitive strategies, and product
variables.
Results:
At the time of this submission, the data are being collected in Italy. We will
report discrimination scores as percentages for age groups. We will categorize
verbal data using a judging task. To detect age-related tendencies in the
verbal responses, we will use ANOVA to test for a linear trend component.
Conclusions:
We will discuss emergent issues in cross-cultural research, including

implications for music teaching and learning.
Back to index
file:///g|/poster3/Rodrigue.htm (1 of 2) [18/07/2000 00:38:23]

file:///g|/poster3/Rodrigue.htm (2 of 2) [18/07/2000 00:38:23]

GENERATIVE PERFORMANCE RULES AND FOLKSONG PERFORMANCE
Proceedings paper

Jaan Ross, Department of Arts, University of Tartu, Estonia. Email: ross@psych.ut.ee
Anders Friberg, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm
1. INTRODUCTION
In Western European tradition, musical works generally exist in the form of written notation, or score, which has been produced by a composer and
which must be converted into sound by a (group of) performer(s). The recent decades have witnessed an increase in empirical studies on musical
performance (Gabrielsson 1999). There is a general agreement regarding the fact that if the score is transmitted into sound without any
modifications, this would result in the so-called deadpan version, i.e. something musically unacceptable. It is believed that expressive devices
complementary to the score are used by performers mainly for two purposes. First, to make it easier for the listener to differentiate between
musically relevant categories in the domains of pitch and duration and, second, to provide for a better grasp of the hierarchical structure of musical
work (Sundberg 1999b).
Friberg (1991) has successfully modelled the performance of musical work by some twenty-odd generative rules that automatically convert input
note files into sound performance on a synthesizer. The rules introduce into the performance micropauses, lengthenings and shortenings of tone
duration as well as long and short-term increases and decreases of sound level. The system of rules should be understood as a generative grammar
of musical performance, reflecting the musical competence available to its authors.
There exist other cultures in the world, however, which do not resort to prescriptive notation in the process of musical communication. An absolute
majority of musical folklore, as well as much of popular music (e.g. jazz) belongs to the oral tradition. Performance in such traditions is the result
of improvisation, i.e. spontaneous (re)creation of music from memory. If notations exist for folk music performance, these must be regarded as
descriptive, in the sense that they have been produced post factum by an ethnomusicologist or anthropologist, rather than in advance by a
composer. (One can thus hardly think of the category 'composer' in folk music.) Due to the fact that notation in such cases aims at describing what
is happening in the performance, as opposed to recreating the music anew, the target of such notation is thus research and not performance.
The old Baltic-Finnic folksongs are an example of such an oral musical tradition (Lippus 1995). This tradition has been shared by native speakers
of most of the Baltic-Finnic languages: the Finns, Estonians, Karelians, Votes, and Izhorians. The other two Baltic-Finnic ethnic groups, the
Vepsians and Livs (Livonians), do not evidence it. (There are about five million speakers of Finnish and about one million of Estonian in the
world; both of these succeeded in establishing nation states upon World War I. The other Baltic-Finnic languages have significantly fewer speakers
left.) The old folksongs are also called Kalevala songs (after the famous Finnish epic) or runic songs. In this paper the three terms will be used as
synonyms. The old folksongs are estimated to be two to three thousand years old. Although preserved in a relatively extensive body of archival
recordings, they have been fading from daily circulation since the18th century. The written part of the recordigs has mainly been collected during
the second half of the 19th and the first half of the 20th century; the majority of sound recordings come from the years 1930 to 1970.
In the present study, we will look into the extent of similarity between musical competence in the performance of works of the professional musical
tradition of Western Europe and that of the old Baltic-Finnic folksongs. We will be restricting ourselves to the domain of duration, i.e. we will not
study pitch, sound level or the timbral characteristics of musical performance. The majority of musical performance studies have so far been
concentrating on the European classical piano repertoire from the 19th century, their favourite objects of study being shorter compositions by e.g.
Fryderyk Chopin or Robert Schumann (e.g. Repp 1998 and, 1999a and b). No doubt there are substantive differences between the performative
situations of such piano pieces and runic folksongs. We see the following two differences to be the most significant.
1. A 19th century piano work is performed from a score, while a folksong is improvised. At a first glance, the conversion of note values from the
score into acoustic events of certain duration seems to have no equivalent in folksong performance. We have to take it into account, however, that
the runic folksong melodies are mostly isochronous, i.e. consisting of note durations of (nearly) equal value. This enables us to compute the
average note duration value over a certain portion of the musical work (provided that the tempo remains constant) and to hypothesize that
deviations from the average note duration value are used by the performer for expressive goals, in a similar way to that in which the performer of a
piano piece by Chopin employs deviations from normative durational values for expressive purposes.
2. In folksongs, the durations of sound events may also depend on the sung text (lyrics) and the verse metre. In the Baltic-Finnic languages,
quantity plays an important role in speech prosody. Duration differences in these languages serve the semantic function, i.e. distinguish the
meanings of words. In the Estonian language, one and the same disyllabic sequence may have three different meanings depending on whether the
approximate ratio of its constituent syllables equals 0.66, 1.5 or 2.0 (Lehiste 1997). Also the metre in Estonian folksongs, defined as trochee, uses
oppositions not only between the stressed and unstressed syllables but also between the long and short syllables for contrasting the ictus and
off-ictus positions in verse lines (Tampere 1983). The issue to what extent the requirements of word prosody and metre are combined or contrasted
in the musical realisation still remains open to a significant extent. If the former were supported by musical rhythm in folksongs, it would certainly
enhance the intelligibility of words to the listener. What cannot be ruled out, however, is that it may be difficult for the performer to meet the
structural requirements coming from three separate domains-speech, metre and music-at the same time. Nor can it be ruled out that possible
conflicts between the three systems may be creatively used by performers for aesthetic purposes.
2. MATERIAL
Seven one-voiced folksongs from the repertoire of the female singer LK were used for analysis in this study. All recordings had been made in 1937
file:///g|/poster3/Ross.htm (1 of 6) [18/07/2000 00:38:25]

when the singer was 64 years old. She was recorded in a professional studio according to top audio standards of the time. LK lived on the northern
coast of Estonia. The archival recordings of old folksongs of her district (Haljala), including texts and (where available) melodies have been
published by Laugaste (1989); the post factum notations of folksongs in the volume have been accomplished by the ethnomusicologist Udo Kolk.
In old Estonian folksong tradition, words and melodies are in principle interchangeable (one and the same melody may be sung with different lyrics
and vice versa). This is made possible by the rigid structure of the songs. The main element in the structure of the songs is the verse line. It consists
of eight positions, divided between four trochaic feet. In most cases, each of the eight positions holds one syllable and one melody note. As an
exception, two syllables may fit one note, or a syllable (usually a diphthong) may be divided between two notes.
From the point of view of musical rhythm, the majority of old folksong melodies are roughly isochronous, i.e. consisting of notes of about the same
duration. There are some exceptions to this rule in the overall song repertoire (such as the swing songs, the rhythm of which imitates the movement
of a swing); the seven songs analysed in this study do not belong to these exceptions.
The words in old folksongs may vary but the musical structure remains repetitive. In simple cases, the repeated melody consists of a single line of
eight notes (syllables); in more elaborate songs, the original musical material may be extended to two, three or four lines, or 16, 24 or 32 notes
(syllables) respectively.
The acoustical durations of the syllables (notes) were measured by the first author of this paper in collaboration with Ilse Lehiste (Ohio State
University, Columbus, USA), using parallel narrow and wide-band spectrographic representations of sound signal. The sound was fed to the
computer from the cassette tape, a copy of the original recording stored at the Estonian Folklore Archives in Tartu. The sampling frequency was 10
kHz. For the spectrographic representation of the material, a Kay Elemetrics Workstation model 4300 was used. Decisions regarding locations of
segment boundaries were made on the basis of both visual and auditory cues.
Measurements of syllable (note) durations in seven songs yielded approximately two thousand sound events. The results of measurements may be
represented as a matrix of 8 columns and about 250 rows, the columns corresponding to verse line positions and the rows to verse lines. Each cell
in such a matrix could be read as a sound event duration xij, measured in milliseconds, where i=1, 2 ... 8, and j=1, 2 ... 250.
3. SEGMENTATION
From the structural point of view the shortest meaningful sound events in the runic songs are syllables (defined on the phonetic basis), or notes
(defined on the basis of the melody). Provided that the number of syllables in a verse line equals the number of notes, the question would be: Do
the boundaries between successive syllables in old folksongs coincide with the boundaries between successive notes? The answer to this question
need not necessarily be affirmative. Sundberg (1999a) argues that in sung performances, the tone (note) is expected to start with the onset of a
vowel. The use of quantity in spoken Estonian, however, occurs in differentiating the length of vowels as well as consonants. Thus the lengthening
of a stop consonant yields a geminate which, by definition, consists of two parts belonging to neighbouring syllables. The problem Sundberg (id.)
is pointing to concerns the segmentation of Estonian words, e.g. [saakki]: either to [saak-ki] as two syllables, or [(s)aakk-i] as two tones (notes).
The determination of the onset of tone as coinciding with the onset of vowel, however, seems to be related to the theory of P-centers in phonetics
(see e.g. Pompino-Marschall, Tillmann and Kühnert 1987). According to this theory, in alternating sequences of monosyllables the perceived onset
(P-center) of a syllable as a rule does not correspond to its acoustic onset. Generally, the syllable onset (the beat) is highly correlated with the onset
of syllable nucleus (the vowel), while being somewhat displaced as a function of both the initial consonant(s) and the length of rhyme.
The method of segmentation adopted by Sundberg would be hard, if not impossible, to combine with the theory of quantity in spoken Estonian
(Lehiste 1997). According to Lehiste's theory, the quantity in spoken Estonian utterances is determined on the level of disyllabic sequences because
a strong tendency of these to isochronism. For the disyllabic sequence there are three contrastive duration degrees called the short, the long and the
overlong. The three durational degrees may be applied to the first syllable in a disyllabic sequence, while the duration of the second syllable
depends on the duration of the first. If the first syllable is longer, the second syllable must be shorter and vice versa. Therefore the best device for
an acoustical description of the functioning of the Estonian quantity is the duration ratio of the syllables in disyllabic sequences (see above).
The present study has adopted the 'phonetic' method of segmentation, which establishes boundaries according to the syllable structure of Estonian.
4. GENERATIVE PERFORMANCE RULES AND FOLKSONG PERFORMANCE

In this section we will analyse the possible effect of seven primary generative rules (Friberg 1991) applicable in the domain of duration on the
performance of old folksongs. The denotation of the rules appears as in the original.
1. Durational Contrast (DDC1). Short notes are made shorter than their nominal values, i.e. the durational contrast between long and short notes is
increasedNotes of duration between 30 and 600 ms are shortened. Since practically all notes in old folksongs are of about equal durationshorter
than 600 ms, this rule is not relevant for the music tradition concerned.
2. Double Duration (DDC2B). A tone half as long as the preceding one is increased in duration, while the preceding tone is shortened by the same
amount of time. No such rhythmical patterns occur in the runic songs. The rule is not applicable.
3. Melodic Charge (DPC2A). This rule stems from the assumption that the structure underlying the music is tonal, i.e. that there exists a
hierarchical relationship between the tones of the scale used for melodies, and that this relationship is patterned according to three primary
functions called the tonic, the subdominant and the dominant. Each note of the scale receives a charge reflecting its harmonic distance from the
tonic. For the tonic itself the value of the distance is zero. In the diatonic scale, this value is the largest for the leading note (d=5) and in the
chromatic scale, it is the largest for the diminished 2nd step of the scale (d=6.5).
4. Leap Tone Duration (GMI1B). This rule modifies the duration of tones in singular leaps. Since it is untypical for such leaps to occur in the runic
songs, this rule seems to be irrelevant.
5. Faster Uphill (GMI1C). This rule progressively shortens tone durations in ascending melodic sequences, such as ascending scales. It may have

some effect on the performance of runic songs.
6. InegallesInegales (GMI3). In sequences of paired tones of equal duration, the duration of the tone appearing in a metrically stressed position is
lengthened, and of the following tone shortened by the same amount. This rule has wide application in old folksongs because the songs consist of
tones of equal duration.
7. Phrase (GMA1). The last tones of structural units are lengthened in accordance with the position of the unit in the hierarchy. The rule is
applicable in folksongs as well as any other musical genre characterised by structures.
In conclusion, there are three rules of the seven in Friberg's (1991) set expected to be relevant for runic song performance: Faster Uphill,
InegallesInegales, and Phrase. In addition, the fourth rule, Melodic Charge, is expected to be relevant if the runic song tradition is to be regarded as
relying on the tonal music idiom. We do not think that it is possible to give an unambiguous answer to the last question because in those runic
songs which are thought to be of more recent origin the ambitus reaches an octave, implying relatively strong underlying elements of tonal
structure. In older melodies, on the contrary, the employed pitch range may be much more narrow and the number of scale steps used more limited,
implying a tonal structure not so well developed. The three remaining rules-Durational Contrast, Double Duration and Leap Tone Duration-are
irrelevant for old folksong performance.
5. MODELLING OF SYLLABLE (NOTE) DURATIONS

On the whole, we assumed all event durations xij in the analysed material to be equal . The average event duration across seven songs was 291 ms.
The deviations from the average were in each individual case presumed to be caused by a set of contextual factors which in our opinion may
originate in domains such as musical expression, speech prosody and verse metre. The following variables were defined for the model.
1. Syllable length. This variable originates in the prosody of Estonian speech. All non-compound Estonian words are broken down into successive
disyllabic sequences, starting with the word-initial syllable (which is always stressed). There are three types of disyllabic sequences called the
short, the long and the overlong (Lehiste 1997). For the sake of simplicity we overlooked differences between the long and overlong disyllables.
Consequently there may occur only two types of disyllabic sequences, the short and the long. In the first case, such a disyllabic sequence consists
of a short and a long syllable. In the second case, such a sequence consists of a long and a short syllable. In the case of words containing an odd
number of syllables, single syllables remaining at the end of a word were (somewhat arbitrarily) classified as long. For example, syllables in a
non-compound Estonian word [mi-ne-mat-ta-gi] are classified as short-long-long-short-long.
This variable has no equivalent rule in Friberg's (1991) set because it is language-specific, and because the rule set deals with the instrumental
music only.
2. Metrical position. This variable goes back to the nature of trochaic metre employed in old folksongs. The eight sound events per verse line are
distributed between four trochaic feet, each one consisting of a strong (ictus) and a weak (off-ictus) position.
The equivalent for this variable is the rule InegallesInegales.
3. Melodic charge. As applied by Friberg (1991), see preceding section.
4. Final. It is generally recognised that one of the functions of musical expression is to make the structure of the music more explicit to the listener
(Sundberg 1999b). The phenomenon of final lengthening is well known in speech as well as in music. Alternatively, the location of the structural
boundary may be signalled to the listener by making the sound event(s) near the boundary shorter. Under the present variable, sound event no. 8 in
each verse (melody) line is encoded as final, in order to oppose it to sound events no. 1 through 7 in the line.
The equivalent for this variable is the rule Phrase.
5. Melodic peak. It seems reasonable to believe that the highest notes in folksong melody are somewhat more important than others. This is
supported by a visual inspection of post factum notations of the songs (Laugaste 1989), indicating that the singer has apparently often intended to
underline these notes using also other means, such as articulation and duration. The highest note has been picked out (encoded) as the melodic peak
under this variable. Some melodies do not have a clearly pronounced single peak; in such cases, if the highest note occurred more than three times
during one and the same line, it was concluded that the variable was not applicable for that particular verse line.
The above variable is related to the rule Faster Uphill.
6. Number of phonemes per syllableExpected syllable duration in speech. It seems logical that even in singing (not to mention in speech), the
duration of a syllable must depend on the amount of articulatory activity needed in order to produce it. A straightforward way to estimate this
activity would be to calculate the number of different phonemes per each syllable. On the other hand, there is evidence regarding the hierarchical
relationship between partsstructure of an utterances in speech SIIN EI OSANUD SEDA JÄRGNEVAT TÄPSUSTUST PAIGUTADA: as
sequences of syllables. The disyllabic sequences of Estonian discussed above provide one example of such a hierarchy. Those sequences are
roughly isochronous, which means that if the first syllable is longer, the second syllable will be shorter, and vice versa. This makes it
complicatedimpossible to postulate a direct dependence of syllable duration on the number of its constituent phonemes.
Two methods were compared to each other for the purposes of the present study. First, tFor the purpose of our study, the issue was settled using the
text-to-speech synthesizer of spoken Estonian (Mihkla, Eek and Meister 1999). The subroutine which determines individual phoneme durations for
Estonian text-to-speech synthesis, was deleted from the softwarethe corresponding software (Mihkla, Eek and Meister 1999). This subroutine
requires the input of an orthographically correct Estonian text, complemented with information about palatalisation and overlong quantity degrees
(which are not reflected in the orthography). The resulting individual phoneme durations were summed to the limits of the syllable. Second, the
number of phonemes was counted for each syllable. No distinction was made between short and long vowels or consonants because the first
variable in our set, the syllable length, seems to count for this opposition. Geminates were divided between the two successive syllables so that
each syllable received a half.
In order to compare the two methods to each other, the correlation was computed between the measured syllable durations in singing and the
estimated syllable durations for the synthesis resp. the number of phonemes per syllable. The correlation was slightly higher for the second variable

(r=.36) than for the first (r=.35), so we decided to use the number of syllables as a suitable variable in the following statistics.
There is no equivalent rule for this variable as Friberg's (1991) set deals with instrumental music only.
7. The post factum notations, otherwise rhythmically uniform, occasionally contain instances where two successive eighth-notes have been replaced
by a dotted eighth followed by a sixteenth. Those instances were singled out despite the possibility of their eventually derivative or artefactual
character. We doubt, firstly, that the performer has in those cases intentionally used a dotted pattern instead of the normal pair of eighth-notes. We
would rather suggest that pairs of a dotted eighth and a sixteenth are in principle not different from most of the pairs of double eighths. Secondly,
pilot calculations of average durations for sixteenths, normal eighths and dotted eights reveal that the ratios between averages diverge from those
expected from notation, i.e. the sixteenths, for example, not coming out nearly twice shorter than the eighths.
6. MODEL RESULTS AND THEIR DISCUSSION

Covariance analysis of the material was performed in order to estimate the influence of selected variables on sound event durations. A summary of
the results appears in Table 1 below.
Table 1. Covariance analysis of sound event durations in seven old Estonian folksongs by a single female performer. N is the total number of
syllables (notes) in each song. r2 is the coefficient of determination which measures the amount of sound event duration variability in each song
that can be determined by the total effect of seven individual variables. Columns 4 to 10 present the statistical significance level of the effect of
each variable to the sound event duration in each song. In some of the songs the effect may be non-significant (n/s), or the variable not applicable
(n/a).
N r2 Significance level
phonol syll metric pos mel peak # of phonemes dotted final mel charge
exp syll durat
1 2 3 4 5 6 7 8 9 10
leskim 204 .128 .0021 n/s n/s n/s n/a n/s n/s
läksinm 367 .400 n/s .0001 n/s .0001 .0001 .0405 n/s
minulk 284 .396 .0073 .0014 n/s .0001 n/a n/s .0295
vendas 493 .168 .0267 .0001 n/s .0001 n/a n/s n/s
kallism 76 .332 n/s .0323 .0178 .0008 n/s .0006 n/s
minav 95 .274 .0099 n/s n/s .0228 n/a n/s n/s
peren 318 .515 .0507 .0001 n/s .0001 .0001 n/s n/s
Overall 1837 .298 .0008 .0001 n/s .0001 .0001 n/s n/s
Covariance analysis of the material was performed in order to estimate the influence of selected variables on sound event durations. A summary of
the results appears in the Table.
The effects of four out of seven variables on sound event durations were highly significant. The variables concerned include those of metrical
position (strong or weak), the number of phonemes per event (in ms), deviations in the score from isochronous sequence (p< .0001 in all three
cases) and the phonological length of the event (short or long, p= .0008). The effects of three variables, the melodic peak, the final note of the line,
and melodic charge, were not significant.
The covariance analysis model is capable of accounting for an average total of 30 per cent of sound event duration variance in the seven songs
studied. The percentage varies across songs. It reaches the highest value of 52 per cent in the song 'Peren' and the lowest value of 17 per cent in the
song 'Vendas'.
An earlier finding (Ross and Lehiste 1998), according to which metrically strong, or ictus, positions of the line are performed longer than the
metrically weak, or off-ictus, positions, was confirmed by the covariance analysis. There seem to exist at least two possibilities for interpreting this
result. When projected against the background of the Kalevala metre theory (Lippus 1995), it suggests that the partly quantitative nature of the
trochaic metre, as described on the 'phonological' level, is indeed acoustically realized in runic songs. Application of the swing rule (Inegales)
should in this case be specific to the runic song tradition. If, however, the swing rule also operates in Friberg's (1991) set of rules intended to
simulate the performance of a musical idiom different from the runic songs, it would suggest that making stressed positions longer and unstressed
positions shorter is a more universal cognitive principle in the musical performance, specific neither to the old Baltic-Finnic folksongs nor to the

Western tonal music idiom.
The effects of four out of seven variables on sound event durations were highly significant. The variables concerned include those of metrical
position (strong or weak), estimated acoustical duration of event (in ms), deviations in the score from isochronous sequence (p< .0001 in all three
cases) and the phonological length of the event (short or long, p= .0008). The effects of three variables, the melodic peak, the final note of the line,
and melodic charge, were not significant.
The covariance analysis model is capable of accounting for a total of 30 per cent of sound event duration variance TAAS SAMA
LUGU?-KUHU/MILLEGA TÄPSELT SEOSTUBas the average in studied performances. The percentage varies across songs. It reaches the
highest value of 52 per cent in the song 'Peren' and the lowest value of 17 per cent in the song 'Vendas'.
There seem to exist at least two possibilities for interpreting the significant influence of the metrical position on sound even durations in folksong.
OneKUI SEE ON ESIMENE, KAS SIIS ON KA TEINE? confirms an earlier finding (Ross and Lehiste 1998), according to which metrically
strong, or ictus, positions of the line are performed longer than the metrically weak, or off-ictus, positions. When projected against the background
of the Kalevala metre theory (Lippus 1995), it suggests that the partly quantitative nature of the trochaic metre can be acoustically followed in runic
songs. Providing, however, that the so-called swing rule (Inegalles) is also working in Friberg's (1991) set, intended to simulate the performance of
a different musical idiom, this may suggest that making stressed positions longer and unstressed positions shorter is a more universal principle
specific neither to the old Baltic-Finnic folksongs nor to the European tonal music idiom.
The influence of the number of different phonemesexpected syllable duration in speech on the real acoustic duration of sound events in singing
should be readily accessible to common sense. It is evident that the duration of syllables consisting of, for example, twoa single phonemes (such as
both the first and the second the first in the Estonian word [as-tei-sa]) must be shorter than the duration of syllables consisting of six phonemes
(like the first one in the word [vintsk-le-ma]). Since both vowels and consonants may have different lengths in Estonian, the phonological length of
a syllable is another of the characteristics of speech prosody independent of the number of phonemes in the syllable. (It is, however, reflected in the
predicted duration of a syllable.) As the covariance analysis demonstrates, both variables, the number of phonemes in a syllableestimated syllable
duration as well as itsits phonological length exert a significant influence on the acoustic durations of sound events. This points to the fact that
some features essential from the point of view of speech communication tend to remain undamaged in singing.
The note pairs, which in the post factum notation are rendered as the dotted eighth followed by a sixteenth, were found significantly different in
their acoustical duration from the rest of the notes rendered as isochronous strings of eighths. This suggests that it is justified to separate these from
other notes in the notation, even where their real proportions only remotely resemble the nominal values, a sixteenth expected to be twice less in
duration than an eighth.
As the melodic charge rule failed to significantly influence sound event durations in folksongs, the acoustical form of folksong performance
appears not to reflect the underlying tonal structure of the songs. This seems to point to the essentially linear nature of the runic songs (Lippus
1995), with the tonal component either weak or not present at all.
7. CONCLUSIONS
Seven generative rules affecting sound event duration in musical performance (Friberg 1991) were examined in order to determine their suitability
for modelling the performance of old Estonian folksongs. Three rules out of seven were rejected because of their irrelevance for the musical
tradition investigatexamined. The remaining four rules were complemented by three additional ones derived from the nature of articulatory
productionsinging, the prosodic description of the Estonian language, and the specifics of the old folksong performancetradition. The influence of
seven variables on the acoustical duration of sound events in folksong performance was investigated by means of covariance analysis. Of the four
original rules adopted from Friberg's (1991) set, the InegallesInegales rule, which makes metrically strong positions in isochronous melodies longer
and weak positions shorter, was found to apply in runic songs. We did not find evidence of the effect of three rules: Melodic Charge, Faster Uphill
and Phrase. The Melodic Charge rule is expected not to work because of weak tonal structure in the runic songs. The Faster Uphill rule is expected
not to work because of the relatively narrow ambitus of melody in those songs. The two additional variables, which were formulated ad hoc and
were not present in Friberg's set, relate to the prosodic characteristics of Estonian speech. Their significant influence on sound event durations in
runic songs, together with the negligible effect of some of Friberg's rules, suggests a pronouncedly speech-like character of the analysed folksongs.
The latter retain a number of characteristics stemming from speech prosody, while failing to evidence other characteristics thought to be
specifically musical.
ACKNOWLEDGEMENTS
We wish to thank Meelis Mihkla of the Institute of the Estonian Language, Tallinn, for making it possible to use the text-to-speech Estonian
synthesis software for prediction of expected syllable durations in texts, Professor Ene Tiit of the University of Tartu, for help in statistical
processing of data, and Professor Ilse Lehiste of the Ohio State University, for productive discussions on many aspects of this work.
REFERENCES
Friberg A (1991). Generative rules for music performance: A formal description of a rule system. Computer Music Journal 15, 56-71.
Gabrielsson A (1999). The performance of music. In D Deutsch (Ed). The Psychology of Music. San Diego et al: Academic, pp 501-602.
Laugaste E (1989). Vana Kannel VI. Haljala regilaulud (Old Folksongs from Haljala District, in Estonian, 2 vols). Tallinn: Eesti Raamat.
Lehiste I (1997). Search for phonetic correlates in Estonian prosody. In I Lehiste and J Ross (Eds). Estonian Prosody: Papers from a Symposium.
Tallinn: Institute of Estonian Language, pp 11-35.
Lippus U (1995). Linear Musical Thinking. A Theory of Musical Thinking and the Runic Song Tradition of Baltic-Finnish Peoples (= Studia

Musicologica Universitatis Helsingiensis VII). Helsinki: University of Helsinki.
Mihkla M, Eek A and Meister E (1999). Text-to-speech synthesis of Estonian. In Eurospeech '99: Proceedings of the European Speech
Communication Association. Budapest, pp 2095-2098.
Pompino-Marschall B, Tillmann HG and Kühnert B (1987). P-centers and the perception of 'momentary tempo'. In Viks Ü (Ed). Proceedings of the
Eleventh International Congress of Phonetic Sciences, Vol 4. Tallinn: Institute of Language and Literature, pp 94-97.
Repp B (1998). A microcosm of musical expression. I. Quantitative analysis of pianists' timing in the initial measures of Chopin's Etude in E
major. J Acoust Soc Am 104, 1085-1100.
Repp B (1999a). A microcosm of musical expression. II. Quantitative analysis of pianists' dynamics in the initial measures of Chopin's Etude in E
major. J Acoust Soc Am 105, 1972-1988.
Repp B (1999b). A microcosm of musical expression. III. Contributions of timing and dynamics to the aesthetic impression of pianists'
performances of the initial measures of Chopin's Etude in E major. J Acoust Soc Am 106, 469-478.
Ross J and Lehiste I (1998). Timing in Estonian folk songs as interaction between speech prosody, meter, and musical rhythm. Music Perception
15, 319-333.
Sundberg J (1999a). Emotive transforms. Speech, Music and Hearing Quarterly Progress and Status Report (TMH-QPSR) 3-4, 75-85.
Sundberg J (1999b). Music technology and audio processing: Rall or accel into the new millenium? Speech, Music and Hearing Quarterly Progress
and Status Report (TMH-QPSR) 3-4, 45-53.
Tampere H (1983). Ýñòîíñêàÿ íàðîäíàÿ ïåñíÿ (Estonian Folksongs, in Russian). Leningrad: Muzyka.
Back to index

CONSTRUCTION OF MEANING IN DEVANEIO SOBRE AS ONDAS BY GLAUCO VELASQUEZ: A PERFORMER'S VIEW
Proceedings paper
CONSTRUCTION OF MEANING IN DEVANEIO SOBRE AS ONDAS BY GLAUCO VELASQUEZ: A PERFORMER'S VIEW.
DIANA SANTIAGO
Assistant Professor, Department of Applied Music
Universidade Federal da Bahia, Brazil
When music suggests the sea, for example,
it may do so rather more by expressing

the effect that the sea might have on a person.
R. J. Watt and R. L. Ash (1998, p. 49).
Researchers' conception of meaning in music do not coincide. For some of them, musical meaning is found at the level of structural description; for others, musical meaning is
something much broader in scope (Sloboda, 1998, p. 25). Besides, several layers of meaning may be ascribed to a piece of music (Dunsby, 1988 , pp. 217-218).
The discussion directs us to the very nature of the musical event, and there is not a consensus of its basic features, either. For some, music is comparable to a language (Sloboda,
1990, p. 65), while others do not believe there is such a thing as a musical grammar (Dempster, 1998). As a performer, two questions have aroused my mind. What is that which
performers struggle to convey to the music they play? Which are the necessary steps for building an appropriate musical performance?
The construction of a theory of musical performance is still on the way. Although studies on the nature of such performance conducted by music psychologists during the past
decades have shed much light upon it, there is still a vast field to be explored. Studies on musical understanding, for instance, usually focus on the listeners (Shaffer, 1995, p. 18).
Furthermore, these studies are usually conducted by psychologists, and they should not be neglected by the performers, who could stress aspects forgotten by other researchers.
Certainly a better understanding of the performing action in itself would mostly benefit performers and instrument teachers.
This paper aims to present a practical view of the study of meaning in music, a view that details the aspects which the author, herself a performer, took into account while
practising the piece. It is expected that the findings will provide not only a better understanding of the way an individual performance was conceived, but also may be applied to
the elaboration of other performance plans.
At the very beginning: the score

The piece chosen for this paper, Devaneio sobre as ondas ("Daydream over the waves"), was composed in 1911 by Glauco Velasquez, an Italian-born Brazilian composer who
lived from 1884 to 1914. It was his last piece for the piano. I had the opportunity to register it last year on the CD Alma Brasileira (MCK 199007456).
file:///g|/poster3/Santiago.htm (1 of 10) [18/07/2000 00:38:29]

Velasquez is unknown to many Brazilian musicians, and it was in a reference book that his name first called my attention. Praise was given to his harmonic ability, so I strove to
obtain one of his pieces. It eventually happened when a colleague of the Universidade Federal da Bahia went to Rio de Janeiro and brought me a copy of the edition which is at the
Music Department of the Biblioteca Nacional. After reading and deciding to perform it, a series of findings came to the fore.
A structural analysis of Devaneio shows clearly delineated proportions which testify to a conscious process of writing.
Graph 1 shows the structure of Devaneio sobre as ondas. The horizontal line represents the total duration of the piece; the dotted crochet which has been taken as the unit of
measurement (metronome around 60), equals one milimetre.

The numbers indicate the measures, and the vertical lines demarcate relevant structural points. The dotted lines indicate both the half and the golden section of the piece. Circles,
squares, and the rectangle represent the main thematic material. Some tonal references appear below the horizontal line, while the small dashes represent notes in the bass register
which assume pedal function.
Well-balanced proportions indicate that symmetry was a guiding principle for the composer. The half of the piece is surrounded by equivalent portions of music. The 40
millimetres of mm.[18-37] correspond to the 41 millimetres of mm.[38-58], and the 34 millimetres of mm.[1-17] correspond to the 32 millimetres of mm.[59-74].
The form of the piece is clarified by the thematic material that occurs over the D pedal of mm. [1-4, 18-21, 59-62, 71-74]. The melody exposed in the introductory measures
returns at m.[18]. However, transformed at m.[22], this melody participates of what can be considered as the development of the piece − mm.[18-58]. At m.[59], the introductory
material is recapitulated, and, at m.[71], it appears in transformation, functioning as a coda.
The predominant texture of the piece is that of accompanied melody. As a matter of fact, the accompaniment is one of its most characteristic features. Arpeggios circulating on the
low and medium registers of the instrument create a wave-like movement easily associated with the title of the piece. This type of accompaniment is very common in barcaroles.
Commentary on the melodic material will be given after examining the work's tonal structure.
There is not a single authentic cadence in the piece. Resources such as evaded cadences − mm.[10-11, 12-13, 16-17, 33-34, 46-47], augmented sixths chords − mm. [17, 58, 72],
etc. are used to provide continuous mobility to the harmonic progression. However, this mobility is contrasted by the static quality which is characteristic of added sixths chords.
Their frequent use (see Example 1) lends a special colour to the sonority of this piece. Although they do include a dissonance of a second or a seventh (as they can be seen as
first-inversion seventh chords), added sixths chords have their static character determined mostly by the absence of the interval of a tritone. Example 1 makes easy to observe that,
among the versions used in the piece, only one has a tritone (D-F-A-B).
Example 1 Added sixths chords

Each one of the twelve pitches of the octave (in the system of equal temperament) is represented in the piece by a chord or a tonal area. In major or minor versions, or in both,
some pitches receive more projection, although a real tonic quality is not attributed to any one of them. For instance, D major assumes a prominent position because it is located
over the pedal that begins the piece and delimits important formal divisions, but it is not established as a tonality. Prominent at mm. [13 and 53], C major is also not established.
The study of the proportions of the piece lets us perceive, however, that although E major does not appear on the pure triadic version which would convey to it the stability of
tonic, it comes out as the principal candidate to what can be seen as the "focal point of the tonal structure". Four arguments favour this candidature:
1. pedal E in the middle of the structure, followed by E major at m.[39],
2. V13 of E as the final sonority,
3. modulation to B major - which enhances the dominant of E - in symmetrical disposition in the structure, for mm.[30 and 43] are equidistant, respectively, from the
beginning and the end of the piece, and
4. A flat at mm.[33 and 46] which enhances by enharmony the third of E major.
It is worth observing that the enhancement of these areas represents an expansion of the pitches that constitute the triad of E major: E - G#/Ab - B. (Other important areas in the
piece will be considered later.)
The main melody, on the top of the texture, confirms the importance of E major. By observing the melodic contour in the first half of the piece, it becomes possible to verify that
the notes of the dominant triad of E (B - D# - F#) work as melodic points of reference. The initial F sharp note, reiterated at m.[7], after ascending to the note D sharp in the treble
register at m.[15], prevails again at m.[16], while the note B is emphasised at m.[30]. In the second half, departing from F sharp at m.[38] in the middle register of the instrument,
the melodic line ascends to the highest note in the piece − the C of m.[53], which may be seen as a relation of half-tone above the dominant of E. F sharp comes back at m.[59]
and, after being reiterated at m.[71], ascends to the treble register at m.[73] to compose the dominant of E in the final sonority of the piece.
The contour of the main melody is full of surprises, and this seems to explain the harmonic progression with all its evaded cadences and unforeseen resolutions. It is worth
observing the arrival of the G note on the downbeat of m.[13], when the E flat note should resolve the tension of its leading note D in the preceding measure. Still in regard to the
linear treatment, there is an aspect which ought to be commented. It consists of melodic fragments that create a subtle counterpoint to the main melody. These fragments, which
originate in the accompaniment, sometimes interweave with the main melody. This can be clearly observed, for instance, from m.[27] to m.[28]. The melodic gesture of the left
hand from the first beat to the second at m.[26], after being imitated on the two following beats, appears again and, by acquiring new melodic strength, interweaves with the main
melody on the C sharp note of m.[28]. This type of counterpoint is much used by Chopin, a composer who was certainly studied by Velasquez.
The melodic character given to the accompaniment creates an uninterrupted musical discourse. At m.[8], for instance, while the right hand sustains a melodic C sharp, the
accompaniment gains melodic character when, at m.[9], it reaches the G sharp of the main melody. This sense of continuity searched by Velasquez in the piece is what mostly
suggests that he might have been in contact with Wagner's "endless melody". It would be necessary to investigate if his contact with the music of Wagner was direct or indirect.
There is information that Velasquez has been the first Brazilian composer to be seduced by the new harmonic conceptions, at the beginning of the twentieth century. He got
acquaintance with them through the works presented by Alberto Nepomuceno (1864-1920) in the concerts of the Exposição Nacional which, in 1908, celebrated the centennial of
the opening of Brazilian ports during the Portuguese Kingdom. In Devaneio, diverse factors such as the absence of authentic cadences, evaded cadences, unresolved tensions,

added sixths triads, inconclusive ending, associated to agogical and dynamical effects emphasised by the use of the pedals, all this creates the sense of continuous expectancy so
wagnerian which, after influencing the French strongly, might have come to influence Velasquez's technique through them. One certitude: in Devaneio, the sum of the above
mentioned factors promotes an indeterminacy much in accordance with the title of the work.
There is one metrical aspect which must be approached now. Apparently, the accompaniment is constituted by the regular subdivision of the beat (dotted crochet) in six
semiquavers in the meters that appear in the piece − 6/8, 9/8, and 3/8. Nevertheless, hemiola is inserted with subtlety to this regular subdivision. It can be observed, for instance, at
mm. [3, 30, 32, 34]. It is easy to notice that the B note in the octave between the right hand and the left hand in the third and fifth quavers of m.[34] creates hemiola between three
crochets and the two "official" dotted crochets of the assigned 6/8. This hemiola provides some rhythmic and metrical ambiguity which, besides favouring the work's title, shows
the composer giving to his piece rhythmic and metrical characteristics usually found in the genre waltz.
Having acquired popular preference since the end of the eighteenth century, the waltz became not only the most preferred dance in all social layers, but managed to penetrate
gradual and ineffaceably into the music of European tradition produced along the nineteenth century. It is very plausible to think that Velasquez, heir to this tradition as well as a
participant of the carioca life overwhelmed both by the Viennese waltzes and the waltzes of the chorões, had let himself be conduced by the stimuli offered by the genre to create
his Devaneio.
Concerning proportions in Devaneio, reference has still to be made to the "golden section". It occurs at m.[45], 147 millimetres x 0.618 = 90.846. It is followed by two events the
composer enhances through the accelerando promoted by the "molto mosso" at m.[49]. One is the great melodic gesture in octaves of mm.[51-53]. The other is the exclusive
de-activation of the instrument's low register at mm.[55-58], which permits the melody to be expressively highlighted by the cantabile power of the piano's middle register. The
importance of this sequence of events may be understood either as a demarcation of the golden section or as a preparation for the return of the first material. It is impossible to
know the composer's intention.
The ending of the piece sounds inconclusive. Above the D pedal which had been heard at mm.[71-73], the notes B-D#-F#-A-C#-G# are added. Had Velasquez written after them a
single note E, even if in pp at the weak top of the instrument, the listener would get a feeling of resolution. However, the composer would have contradicted his proposal of
continuous expectancy.
General indications of tempo and character appear at the beginning of the piece, at m.[49], and in the coda. They are, respectively: Mosso, Molto mosso and Poco meno mosso che
il tempo Io. However, as if to suggest more flexibility, colour differentiation, and expressive alterations to the wave-like quality of the motion, the composer adds a profusion of
terms along the piece: rit. poco, allarg. poco, ritenendo un poco, allarg., allarg. molto, animando poco a poco, animando sempre, ritard., poco meno mosso, allarg. sempre,
cresc., cresc. poco a poco, cresc. molto, dim., dolcemente, express., dolce, dolciss., suave, molto express., con expressione sempre, appassionato, molto express. sempre, and suave
l'arpeggio. Careful indications of changes between una corda and tre corde are added to the continuously indicated sustain pedal. Once accomplished, they create an atmosphere
quite favourable to the program.
First enquires on the piece's hidden meaning

Something drew my attention as soon as I started examining the score: the word DEVANEIO is printed in capital letters, and Sobre as ondas in small letters. Had the manuscript
been available, the composer's indications would be verifiable. Unfortunately, I only had a copy of the edition which is available at the Biblioteca Nacional in Rio de Janeiro.
This fact, the waltz-like characteristics of the piece mentioned above, and the cosmopolitan character of Rio de Janeiro at the beginning of the century made me think if there
would not be a possible allusion to the waltz Sobre las olas ("Over the waves" - in Spanish), composed in 1891 by Mexican composer Juventino Rosas (1868-1894). The
fragments in Example 2 allow the identification of melodies which, still well-known today, might have been a great vogue in Velasquez's time.

Example 2 Melodic fragments in Sobre las olas by Juventino Rosas
A comparative study of both pieces transcends the limits of this paper. Nevertheless, some aspects will be approached here, because the understanding of Devaneio's tonal
organisation is altered after the tonality of Sobre las olas is examined.
With G major as the tonal centre, the piece by Rosas is made of 14 portions, with 32 measures each. As can be seen in Graph 2, the thematic material in these portions is repeated
(AA, BB, CC, DD, EE, AA, BB), organising 7 sections of 64 measures each. Sections A and B are marked in the score as Waltz 1, and the sections C and D as Waltz 2. Section E is
not numbered. Waltz 1 returns at m.[325]. The sole interruption in the regularity of the piece's organisation is due to the insertion of 4 measures with a modulation from G major to
E major between Waltz 1 and Waltz 2.

From some angles, Sobre las onlas is extremely simple. From others, it displays very original traits. One of these traits is its tonal organisation. G major is used for Waltz 1, for the
D section, and for the recapitulation of Waltz at m.[325], i. e., on both extremes and in the middle of the structure. The use of C major in section E does not bring any novelty, for
C is the subdominant of G. In section B, however, there occurs an unexpected tonal event: eloquently announced by its individual dominant, E major is established at m.[133]. But,
instead of maintaining it, the composer creates a modulation to B major and ends the section in this tonal area. Would not the unusual procedure of beginning the section in one
tonality and concluding in another have inspired Velasquez the originality of his Devaneio? This question has sparked my curiosity about the relationship between the two works

because, as we have already seen, E major functions as tonal focus in Velasquez's piece, and the modulations to B major - emphasised by dynamic, syncopation and diverse
articulations - clarify, at mm.[30 and 43], the symmetry of the work.
As soon as this perception was obtained, there happened an alteration on my thoughts concerning the use of the already mentioned D pedal. In relation to E major, D can be seen
as the subdominant of the subdominant. By getting away from the dominant-tonic axis, the subdominant area promotes a "reduction of tension", which Velasquez might have
endeavoured to add and, thus, create the calmness that characterises the portions above the D pedal. Another possibility was to see D as a lowering of the leading tone in E. This
could be understood as an adoption, by the composer, of a trait much common in the pseudo-modal music fashionable among the French in the second part of the nineteenth
century − Chabrier, Fauré, Satie, Ravel, Debussy, and the ones who were influenced by them. After observing Rosas's piece, I wandered if the D pedal had not been conceived to
remind the obstinate bass D-G on the downbeats of the ternary measures in the G major portions of the waltz Sobre las olas.
Concerning C major, highlighted at mm.[13 and 53] in Devaneio, the section C of Sobre las olas might have been also emulated. This perception has not altered, however, my
understanding of this area in relation to E major. C major works, on a large scale, as a "half-tone above the dominant" relationship (the graph shows it very clearly).
Regarding the prominent tonal areas in Devaneio (without even considering the relationships with Sobre las olas), E flat must be observed, for it assumes different roles
throughout the piece:
● at m.[13], although prepared by an individual dominant, E flat does not appear;
● at m.[23], E flat functions as the dominant of A flat, but resolution in this tonality does not occur;
● at m.[32], integrating the circle of fifths F-B flat-E flat-A flat, E flat acts as the dominant of A flat, which occurs at m.[33];
● at m.[36], as happened at m.[23], E flat functions as the dominant of A flat, which does not appear; what appears instead, both at mm. [24-25] and at mm. [37-38], is a bass
E followed by E major at mm.[26 and 39], respectively; this leads to the understanding that, on a large scale, E flat acts as an enharmonic expansion of the leading tone in E
major;
● at m.[45], E flat plays the same role it had played at m.[32], since mm.[31-34] are equivalent to mm.[44-47];
● at m.[53], E flat is inserted in the accompaniment and hinders the establishment of C major;
● in the last two measures, the enharmonic transformation in D sharp confirms that the events in E flat are nothing but an expansion of the leading tone in E.
Concerning E flat, it is also worth mentioning that the distance from the beginning of the piece up to the event at m.[23] − 22 measures, is the same which separates the event at
m.[53] from the end of the piece. Thus, the placement of E flat and B major allows one to perceive the principle of symmetry acting in the tonal organisation.
Would symmetry be emulating the symmetrical organisation of the sections in Sobre las olas? Would the above mentioned hemiolic effects in Devaneio be emulating the explicit
hemiola of the portions in E major and C major (see Example 2) of Sobre las Olas? Would the expressive melodic line of Velasquez be emulating the so popularised melodies by
Juventino Rosas? An answer to these questions demands further research.
Composer's intentions unveiled?

The fact that Velasquez has lived in the small island of Paquetá, the reference to waves in the title of the piece, and, above all, the evident effort to musically describe this
reference justify the classification of the work as programmatic.
It is not possible to affirm with absolute certainty that behind Devaneio sobre as ondas is the waltz Sobre las olas. However, the evidence seems strong. Specially when we
consider the craftsmanship of the young composer to elaborate a form so proportionally balanced and, at the same time, to incorporate the novel impressionistic elements of the
beginning of the century, we are inclined to think that he might very well have desired to appropriate the charming qualities of a popular melody to his advantage. (Would he have
felt that the ocean never reaches the tonic?).

Performance plan: technical considerations

How should the performer, then, bring meaning into life? Intuition, analysis, or both?
Kinetic elements, from what I have experienced, assume the most preponderant role in our struggle to "bring music to life". After years and years of training, hours and hours are
still demanded for preparing the performance of a piece and, depending on its problems, even years and years again. Besides, there come the emotions, strong as they are.
Specifically for this Devaneio, a conscious report of what has been done to bring the piece to the level of public performance runs like follows.
After reading the piece, choosing the fingering, etc., and after having understood the "fixed" material which forms the structure of the piece, the main problem was to establish a
flux of continuity that could transmit to the listener the barcarole-like impulses, without loosing the cantabile and the voicing. The perception of the rhythmic-meter interplay
asked for gestures that could favour irregularities such as syncopations, hemiolas, or anything else that, apparently, the composer conceived for making the sound mass reflect the
"wave-like" flexibility of the program. It was difficult, for instance, to balance ritardandos and the continuous flux. Also difficult was to negotiate the allarg. sempre followed by
the final alarg. molto within a "right" timing.
At a more subjective level, the establishment of an ethereal sonority, suggesting a day-dream, has been of great concern. I have tried to achieve it by using the una corda pedal as
well as Velasquez's precise directions for dynamics. This intimate atmosphere cannot, however, prevent the accomplishment of the fs and ffs the composer suggests for the more
exalted points of the music. Great care has to be taken, though, because the piano can never be hammered!
The understanding of the tonal proposal demanded that the composer's many suggestions for dynamics, character and agogics (cresc. poco a poco, dolce, ritenendo un poco, for
instance) should be followed to highlight all the so important cadences and modulations.
The understanding of the piece's proportions led me to organise a plan which could provide greater emphasis for the two points of major structural significance. One is the middle
of the piece, demarcated by the soft dynamics the composer suggests at mm.[37-38] for the E pedal in the lowest register of the instrument. The other is the golden section,
celebrated by the return of the low E at m.[47], which leads to the most distinct portion of the piece, that at mm. [33-38]. This portion is distinct not only for being the sole portion
where the bass register is de-activated and the main melody, transferred from the right hand to the left, is heard in the middle register of the piano, the most appropriate for
cantabile. Also because it is to this portion that is assigned the role of bringing back, at m.[59], the work's initial material.
Due to the piece's tonal ambiguity and the innumerable variations applied to the motivic material, the process for memorising it became a laborious task. Better knowledge of the
structure helped this process, besides allowing interpretative decisions to be taken in a more conscious way.
The author presented several aspects dealt with in this paper at the I Seminário Nacional de Pesquisa em Performance Musical (I National Seminar on the Research of
Music Performance), held in April 12-15, 2000 in Belo Horizonte (Minas Gerais, Brazil). A paper in Portuguese was then submitted for publication. She wants to
thank the assistance of Dr. Elisabeth Rangel Pinheiro, her doctoral advisor, during the process of writing both.
References
APPLEBY, David P. (1989). Music in Brazil. Austin: University of Texas Press.
DEMPSTER, D. (1998). Is there even a grammar of music? Musicae Scientiae, 2(1), 55-65.
DUNSBY, J. & WHITTALL, A. (1988). Musical analysis in theory and practice. London: Faber Music.
HOWAT, R. (1983). Debussy in proportions: A musical analysis. Cambridge: Cambridge University Press.
http://pub2.lncc.br/dimas/velasquez . Glauco Velasquez.

ROSEN, C. (1980). The Classical Style. rev. ed. London: Faber and Faber.
SANTIAGO, D. (Pianist). (1999). Alma Brasileira. (CD MCK 199007456). Manaus (Brazil), Sonopress-Rimo da Amazônia.
SHAFFER, H. (1995). Musical performance as interpretation. Psychology of Music, 23(1), 17-38.
SLOBODA, J. A. (1990). The musical mind: The cognitive psychology of music. rev. reprint. Oxford, Clarendon Press.
SLOBODA, J. A. (1998). Does music mean anything? Musicae Scientiae, 2(1), 21-31.
SOUZA, Elizabeth Rangel Pinheiro de (1995). Proporções no Opus 110 de Beethoven. Campinas (São Paulo, Brazil): Editora da Universidade Estadual de Campinas.
WATT, R. J. & ASH, R. L. (1998). A psychological investigation of meaning in music. Musicae Scientiae, 2(1), 33-53.
Back to index

giant screeching spinal gibbon
Proceedings paper
THE PSYCHODYNAMICS OF MUSICAL CREATIVITY

Dan Sapinkopf, M.A.
Gordon F. Derner Institute of Advanced Psychological Studies
cathector@earthlink.net
Statement of Purpose
This study is a comparison of two frames of mind belonging to those who are drawn to music and to
business. It is hoped that insights will be yielded regarding the psychodynamics of object choice and
usage, particularly in the area of musical creativity.
College music and business majors are administered three instruments: one measuring a dimension of
ego functioning referred to as boundaries; one measuring relational tendencies; and a third measuring
the ability to find connections between sets of words . The extent to which artists, as compared to
those engaged in business studies, have greater access to their emotions, are more open to accessing
connections between ideas, less adept socially, more insecure and inclined to emotional vulnerability,
will be addressed. Stereotypes of the musician will be addressed via concepts of boundary thinness,
insecurity and attachment issues, and the capacity for heuristic thinking. The stereotype of the
business-person who distances himself from emotions in addressing the bottom line and impersonal
material exchange, who places a premium on efficiency in his operations and relationships, will be
tested with regard to boundaries, social competency, flexibility in association and openness to inner
experience.
The impulse for this study began with the question "why are artists creative?", relevant irrespective of
talent or quality of output, and addresses the urge to create, to manifest the "aesthetic of the
personality" (Bollas, 1992), to seek or create environments, relationships and occupations which
permit this. Psychologists and philosophers of many schools have addressed the nature of creativity.
The urge to create, the cultural, social and psychological functions of creativity, the significance of
that which is created, the link between creativity, states of consciousness and psychopathology, the
use of the creative process as therapy - all have been explored. However, most work in the area of
psychodynamics and creativity has been theoretical or anecdotal. There has been a paucity of
empirical work linking personality with creativity from a psychodynamic perspective. I hope to show
that what we do reveals who we are and how we are constructed, through the use of valid
psychological instruments. This study is founded on recent developments in object relations and self
theory which have focused increasingly on use and choice of objects, as in the work of Christopher
Bollas (1992). Additionally, essays on music and psychology such as those by Anthony Storr (1992),
and case studies and empirical research by Janet Dvorkin (1991, 1992) highlight the accessibility of
this area of inquiry.
file:///g|/poster3/Sapen.htm (1 of 14) [18/07/2000 00:38:34]

Definitions: Self, Boundaries, Association and Object Relations

Boundaries
Boundaries, following Hartmann (1990, 1991) are defined as the separateness or lack thereof between
inner experiences; inner and outer experiences; and thoughts, ideas and feelings about aspects of the
self and aspects of the world.
Ernest Hartmann (1990) states
We can consider (psychic contents) as separated by a boundary; the degree of separateness is
considered boundary thickness, the degree of communication boundary thinness. We also think of
some kind of boundary around our whole selves, separating us from others and from the world; again
this boundary may be relatively thick or thin.
Boundaries mediate between the mind and the world, or inner and outer reality. Inner boundaries
mediate the personal world; outer boundaries are linked with interpersonal and reality-testing issues;
and both are linked to the affective states, expectations and transferences around which the
individual's character and identity coherence evolves and operates. I will argue, following Bollas, that
one's manner of living reflects the dynamic arrangement of these factors.
The main empirical grounding for the theory of boundaries comes Thin and Thick Boundaries:
Personality, Dreams and Imagination (Hartmann, 1990), which will be discussed in detail further on.
Object Relations
"Object" in this study differs from both the more classical notion of object as person in whom libido
has been invested in establishing an attachment, and from more recent notions as Kohut's selfobject,
which is primarily concerned with the intrapsychic significance of an actual person. Winnicott
advanced the understanding of transitional object as substitute or representative of a person in the
process of developing intrapsychic resources formerly looked for in the other. Bollas (1989, 1992) and
Michael Eigen (1996) each contribute to the understanding of the intrapsychic object and its real
world counterpart. Bollas' refers to that which is engaged in expressing the aesthetic or idiom of the
personality.. Eigen, in amplifying Winnicott, addresses object as not only the person or person
substitute, but as an area of experience. These more inclusive notions suggest the association matrix in
which mature creativity of some kind finds its home. Here, the "object" is a person, a thing, or activity
focused within a set or system - a sport, a religion, a fetish, a psssionate hobby. The choice of object is
never arbitrary and always idiosyncratic; one does not choose an object for "objective" reasons, to
satisfy a set of a priori criteria, even if it happened to do so. Nor is the object chosen haphazardly,
except to the extent that it "happens" upon the ego so as to fix, or in some way bear witness to, a
formative event. The object is anything in or with which one engages in a personally meaningful and
repeatable way.
This meaningful-ness is unqiquely addressed by Eigen (1999). In Toxic Nourishment (1999), Michael
Eigen states that "music permeates the self", and asks the question "what if music is not an adjunct to
life, a hobby, but part of life's basic ingredients?". He explores through clinical examples the means
by which individuals who cannot partake healthfully of relationships to self or other find their
"heaven", or a psychically nourishing engagement with the world - including one patient who felt
authentic, or that her true self, as defined by Winnicott, was finally being expressed - only in the
playing of music. He asks, in the context of Lacan's notions of desire and its circumscribed nature as
impediments to the self's access of jouissance, "where is your desire and your love one?". Jouissance
may be understood as an ideal, boundless affirmative which can generally be experienced only in part

and through deflected or corrupted strategies embodied in the limited dimension of desire. Eigen
thereby addresses the question of whether character, as akin to calling and true self, or any of the other
phrases used to illuminate the poetic but necessary signifier of a wholly personal essence, can find
expression and harmony in the act of creating, and in so doing reach toward a medium of expression
with some degree of fidelity to that essence. He also raises interesting questions addressed elsewhere
in the literature on intrinsic and extrinsic motivation as to the deflection and compromise encountered,
here by the musician, and he as a subset of humanity, in the inevitable encounter with demands,
agendas and concepts superimposed upon the act of true-self-expression. The marriage of ideas of
Lacan and Winnicott is relevant to this study in its implications regarding the ravages of language as
essential limiters and deflectors of meaning, and the extra- (or "pre-") verbal power of music as a
symbol-world offering a privileged access to some manner of authentic self-experience This view of
music is relevant also as a bridge to communal experience which, again, is far more limited by the
limiting and refractory nature of language. These are potent ideas which are compatible and
complementary with the operational ideas of boundaries, cognitive strategies, object relations and self
or character used in this study, and the challenge to establish sufficient access to the energies of that
hypothetical true or authentic self.
The ways in which one does this will be studied using the Bell Object Relations/Reality Testing
Inventory.
Association
Association refers to the connection between signs or symbols. This connection may be arrived at
"algorithmically", through reference to a formula. As such, it has both a predetermined solution as its
end product and a finite series of steps established to reach that end.
Connections may also be made heuristically. Heuristic describes a non-linear process by which
responses or solutions are arrived at through unpredictable, idiosyncratic means. Heuristic refers to
the process by which thoughts, images, fantasy and feelings emerge uniquely for each individual,
rather than by reference to a fixed, external source or algorithm for deriving meaning.
A test of word association will be used to test the hypothesis that musicians have greater access to
obscure connections, due to their ability to heuristically use or create unfamiliar cognitive paths - a
hallmark of creativity.
Literature Review
Little empirical work has been done regarding the psychodynamic characteristics of artists, alone or in
comparison with people in other fields. However, the following reviews some significant work
supporting the relevance of this study from different perpsectives
Association and Cognition

While not psychodynamic in perspective, the work of Sarnoff Mednick on the psychological basis of
creativity (represented in this study via the use of the Remote Associates Test (Mednick, 1962) uses a
hypothesis which is consistent with aspects of ego psychology. Mednick's initial idea and findings
suggest that, given roughly equal and adequate knowledge and intelligence, less creative individuals
are more confined to stereotypical responses to new or challenging stimuli, whereas more creative
individuals go further in producing novel or obscure associations, and that "the more mutually remote
the elements of the new combination" (Coney and Serna, 1995), the more creative the solution. Coney

and Serna, who use an information processing perspective, state that the common thread of thought on
creativity, from Aristotle, Locke, through Ribot, Hollingsworth, and Freud and contemporary
information processing researchers and philosophers of aesthetics, was that "the essence of creative
thought inheres in the process of bringing disparate mental elements together to form new and useful
combinations." (Coney and Serna, 1995).
Coney and Serna discuss a number of information processing theories which would seem to be
consistent with psychodynamic explanations. One such, put forth by Lewis and Anderson (1976), has
been dubbed "the fan effect", and is based on Anderson's (1976) ACT model. This suggests that, in
information processing parlance, the capacity for new associations is inversely proportional to the
number of links supported by a given knowledge structure, or which connect a given "node" to other
nodes. More creative people have access to a greater number of source nodes, which can communicate
via a greater number of different pathways or links, rather than all cognitive tasks being supported by
a limited number of links from a limited number of source nodes. The capacity to access, modulate
and interrelate greater and more vivid modes of experience is precisely one of the goals of
psychodynamic therapy and a measure of psychological adaptability. While numerous secondary
hypotheses of Mednick's have not been supported, the basic one, that creative people were able to
produce significantly more associations to target stimuli than non-creative people, has been supported
in a variety of contexts and supports the use of the Remote Associates Test in this study.
Keith Sawyer (1992) explores, via interviews with jazz improvisers and citations of the creativity and
cognitive-processing literature such as Csikszentmihalyi, Simonton, Campbell, Hadamard and
Rothington, et.al, the subjective and functional aspects playing music. He refers to the two-stage
model of creativity which divides the process of creativity into the ideation and selection phases,
corresponding to the unconsciously generated material and the process and criteria by which it is
edited and selected. Sawyer explores the subjective experience of this in which the non-intellectual or
seemingly spontaneous, "non-conscious" generation of musical ideas in improvisation is then filtered
through partly-conscious form, personal criteria and social necessity (the ongoing effort of the band
and the response of the audience). The selection and refinement process applied to raw musical
impulse describes what would be subsumed under ego functioning in the psychoanalytic paradigm.
An important stimulus for this study is the work of Mary Louise Serafine (1988) in her exploration of
the music as a cognitive faculty inherent in the biological mind. Dr. Serafine explores the social
functions of music, as well as studying in the correspondence between learning, music theory and the
experience of making and hearing music, linking epistemological concerns and developmental
psychology. She explores the music as an activity of mind akin to language, its development in the
child, as well as the nature of abstraction, time, emotion, collaboration, and the nature of vocal and
non-vocal musics. She argues against the notion that music is a human appropriation of physics and
thus only a method, technique or artifact.
Boundaries and Relational Phenomena
One explicitly psychodynamically-oriented study was done by Juni et al (1986). The authors correlate
preferences for selected passages of music noted for a distinct emotional tone or program with the
subject's Rorschach percepts. Subjects were first administered the Rorschach, which was standardized
to 25 uniform responses, and scored for anal, oral, sadistic and phallic fixations, according to Juni and
Fischer's (1985) expanded lexical word count. Musical preferences were scored on a 1-4 score, and
correlated to the tonalities of each piece of music presented to subjects. The study identified oral
fixation issues correlating to the preference for minor tonality music; minor tonalities are commonly
experienced as sad or evoking feelings of loss or passage. This is taken to illustrate the emotional

valence of music and its influence upon affective, psychodynamic dispositions. These results lend
strength to the expectation that oral-stage attachment issues are reflected in those more susceptible to
and seeking strong musical-emotional valences.
In Sousa's study The Relationship Between Boundary Permeability of Psychoanalysts and their
Attitudes Toward Countertransference (1997), Dr. Sousa points out that the capacity for mature object
relations is predicated upon a clear differentiation between internal and external reality - understood
as a chief function of the ego. She cites work from theoretical positions including those of Mahler,
Pine and Bergmann (1975), Winnicott (1971), Searles (1986), Hartmann, Kris and Loewenstein
(1949), and Blatt and Wild (1976), supporting the notion that psychic functions are interdependent.
However, of immediate concern for this study is Sousa's finding of correlations between measures of
"insecure attachment" on the Bell Object Relations/Reality Testing Inventory, and thin inner
boundaries on the Hartmann Boundaries Questionnaire. This correlation and relevance of these
variables in the very different context of her study supported the choice of these two instruments for
this study.
Kohut (1955) describes music in terms of regression in tservice of the ego, and implies functioning
across boundaries:
"Music...as an extraverbal mode of mental functioning, permits a specific, subtle regression to
preverbal, i.e. to truly primitive forms of mental experience while at the same time remaining socially
and aesthetically acceptable".
Kohut's comment illustrates the psychoanalytic notion that the ego brings into cooperation the drives,
external criteria and the need for social conformity. This implies that the successful regression in
service of the ego depends upon "social acceptability", involving managing relational necessity in a
pre-verbal mode - straddling developmental levels without violating reality-testing and social
participation. It also raises the issue of compromise between conflicting agencies, a way of making
the unacceptable acceptable and giving structure and expression to what is chaotic and primitive - a
questionaable understanding of the creative process.
A few entries in the literature address clinical links between object-relations and music. Dvorkin
(1991), offers clinical examples of the link between pathology, therapeutic process, and affect on the
one hand, and musical creation, affective tone of the music and verbal reflection upon the music in the
treatment of 17 year old borderline girl. She supports the notion of music as a transitional object
facilitating and structuring the emergence and expression of primitive contents. Types and ranges of
tonality and musical dynamics are linked to particular affective and self-states - precisely those areas
left open by Serafine's work.
Dvorkin (1992) explores the use of music as a transitional form of non-verbal communication and
social involvement among high school students. She finds higher degrees of capacity for trust and
intimacy among music students than a control group, with evidence of higher developmental levels,
but no greater capacity for individuality, suggesting that this supposed relational capacity is dependent
upon the ego-binding and interpersonally-connective functions of the music. This raises the question
of whether the engagement in music provides an ego-binding function at the same time relying upon a
degree of ego permeability, which manages to exist without a significant increase in pathology or
distress.
However, pathological considerations are not the focus of this study. More immediately relevant is the
question of how and why one engages with - or is engaged by - the materials of ones life, consistent
with Bollas' idea of the personal idiom. This is tested by the Bell Object-Relations/Reality Testing

Inventory (BORRTI). The BORRTI reveals general, clearly defined and operationalized relational
patterns and tendencies, as well as indicating pathological extremes where they are evident. Of
greatest relevance here are attachment issues, social competence, egocentricity and alienation.
Instruments
Hartmann et. al. lay the groundwork for the Boundaries Questionnaire studying nightmare sufferers
(1981). Two studies compared nightmare sufferers with non-nightmare vivid dreamers and
non-nightmare non-vivid dreamers, giving subjects at least two psychiatric interviews, the Rorschach,
MMPI and five TAT cards. The results are as follows: Compared with controls and with population
norms, nightmare sufferers dreamed in greater length and frequency. They displayed greater fluidity
with the respect to the content and transformative quality of their dream imaages, self-and
other-representations and emotions; they shifted readily from one dream into another or awakened
from one directly into another. They reported difficulty knowing whether they had awakened or not
after nightmares or other intense dreams. They reported more drowsiness and/or daydreaming, with a
more "daymares", or reverie drifting into unpleasantness.
During interviews, the nightmare subjects were reported to have free-associated more readily, taken
more time offering detailed answers and many more associations. They were "immensely trusting and
open...sharing all kinds of intimate detail...much more so than the control groups". They reported
more conflictual relationships in their personal lives. They all described childhood and adolescence as
difficult or complicated, more so than control groups. However, there was no greater incidence of
trauma or abuse. Nearly all nightmare sufferers reported involvement in the arts, teaching or forms of
healing or therapy. No nightmare sufferers (26 out of n=50 subjects) were in blue-collar or 9-5 white
collar jobs.
Hartmann points out that although descriptions suggest psychopathology, based upon
symptomatology fewer than one third of 26 nightmare sufferers qualified for formal DSM-III
diagnosis, despite reported intensity and chronicity of nightmares. Of these, most were tentative Axis
II diagnoses, two were possible schizophrenics and none were anxiety disorders.
Nightmare sufferers showed distinct MMPI characteristics, with significant elevations on psychotic
scales (Pa, Pt , SC Ma) but no elevations on the "neurotic" side associated with depression and
anxiety. Hartmann states that compared to controls, this does not indicate a "sick" population. He also
points out that elevations on psychotic scales are equally characteristic of borderline patients, people
with psychotic diagnoses and art students - the latter having no greater incidence of serious diagnoses.
On the Rorschach, nightmare sufferers showed more primary process and vivid content in their
percepts, but did not differ from the other groups on any standard Exner measures. However, with
respect to "permeable boundaries", from the work of Blatt and Ritzler (1974) and Fisher and
Cleveland (1958), the nightmare group scored significantly higher (p>.01).
On TAT cards targeting interpersonal aggression and hostility, nightmare sufferers showed no
elevation. The highest levels of hostility and aggression were displayed by male
non-nightmare/non-vivid dreamers. This suggests, the relevance of further study regarding the effects
of repression.
Hartmann summarizes the findings as indicating that nightmare sufferers, who tended to be in the arts
and helping professions, were no more anxious, depressed or hostile than controls, and displayed only
slightly greater incidence of specific, well-contained pathologies. He reports that the interviewers' and
testers' descriptions of the nightmare subjects frequently contained words and phrases like
"vulnerable", "undefended", "vivid" (with respect to both verbal imagery and behavior), and

"tendency to merge". Hartmann states that these descriptions yielded the term "thin boundaries".
In order to systematically study the boundary phenomena emerging from the dream research,
Hartmann, et al. devised the 145 item Boundary Questionnaire. Hartmann distinguishes between inner
boundaries and outer boundaries. Inner refers to phenomena of feeling and dreaming and the ways in
which thinking, feeling, and particular thoughts and feelings are separate or continuous with each
other. Outer refers to tendencies, preferences and opinions about the outside and social world.
Hartmann suggests two areas of inquiry for this study, reporting that subjects with thick inner and thin
outer boundaries had little psychiatric difficulty and close ties to significant people in their lives.
Conversely, those with thin inner and thick outer, including artists, had significantly more psychiatric
difficulty, whereas successful artists had a more even balance of inner and outer thinness, perhaps
indicating less vulnerability to chaotic primary process and a more efficient handling of inner and
outer reality.
The Hartmann Boundaries Questionnaire is outlined further in Appendix A.
BORRTI
Object relations are measured utilizing the Bell Object Relations-Reality Testing Inventory. Bellak,
Hurvich and Gediman (1973) devised an ego-functioning-oriented clinical interview to identify
aspects of object relations and reality testing, the thrust of which had many points of convergence
with key boundaries concepts. The latter consisted of "reflective awareness", "accuracy of
perception", and the "ability to distinguish between internal and external" experience. The former is
derived from the quality of relationships and self-experience in relation to others.
Bell, Billington and Becker (1985, 1986) created a self-report, true-false measure which would
address the areas aimed at by Bellak, et al. Their inventory consists of subscales assessing Object
Relations (OR) issues of alienation (Aln), insecure attachment (IA), egocentricity (Egc), social
incompetence (SI), and Reality Testing issues (RT) of reality distortion (RD), uncertain perception
(UP) and hallucinations and delusions (HD).
The BORRTI subscale most relevant to this study is the insecure attachment scale (IA), which Bell
(1991) points out is the most likely of the scales to be elevated in high-functioning individuals. This
serves the dual purpose for this study of reducing the influence of pathology per se as a focus, and of
identifying intrapsychic issues hypothesized to be characteristic of creative people in any number of
fields. Individuals with elevations on this scale are likely to be sensitive to rejection, criticism and
threats to closeness. The sensitivity of the BORRTI to indicators of pathology, should these be
relevant in the sample tested, will serve additionally as a control mechanism.
The BORRTI is outlined further in Appendix B
Remote Associates Test
Sarnoff Mednick (1962) studied subjects' associations to stimulus words (with predetermined correct
solutions) as a measure of creativity, looking at the clustering of correct associations to both obvious
and obscure stimulus words. He states that his method is not intended to identify any particular
creative process in any particular field, but rather to tap into a set of processes which underlie all
creative thought. Mednick's original published use of the instrument achieved reliability scores of .92
among women and .91 among men of college age. In this study, it is used as a general measure of the
cognitive capacity to access conceptual connections; the distinction between algorithmic and heuristic
problem solving is relevant to this ability, with respect to the ability to find the idiosyncratic
connections necessary for the creative process.

To clarify algorithmic and heuristic: The former is, by definition, a formula for discovering or
clarifying a correct solution through the least number of pre-determined steps and is thus reductive in
nature, and non-creative. Summarizing Mednick, heuristic refers to an idiosyncratic or novel approach
to a problem and presumes that neither the method nor the result are predictable or predetermined, but
depend upon access to personal associations rather than learned method or technique. With respect to
the RAT items, it is expected that the "Difficult" items require more heuristic thinking, based on a
fluidity of boundaries allowing an ease of association and an absence of obvious or ready-made
connections between words. Though there has been argument in the literature as to whether the RAT
tests creativity per se, the abilities, as tested by the RAT, to access and synthesize both established
forms and methods and personal image and idea part and parcel of creative work.
Methodology:
40 subjects participated in this study. These were major-declared students, culled from music and
business classes and two neighboring Long Island universities of similar demographic constitution.
They were approached, with the cooperation of professors and department chairpersons, via brief talks
given to classes, and via sign-up sheets, approved by department chairpersons. Incentive was offered
in the form of 3 lottery style cash prizes to be paid at the completion of the study. Students were
encouraged to attend group testing sessions held in university classrooms. Those who were unable to
do so were tested individually or in pairs at times and places selected for minimum distraction.
Procedure did not differ between settings, except for waiting for all subjects in group administration to
finish one instrument before commencing with the next. Each was administered a questionnaire
packet consisting of consent form, demographic data, the Remote Associates Test, Boundaries and
BORRTI. Each RAT item was allotted 30 seconds for completion,
Subjects were informed that the experiment concerned personality and career choice. They were asked
to read and sign the consent form and complete a brief demographic questionnaire. The timed Remote
Associates Test was administered next. After instructions, subjects were given up to thirty seconds for
each of the word association problems. They were then instructed on the procedure for the Hartmann
questionnaire, followed by the BORRTI questionnaire, neither of which were timed. The examiner
was available for questions and debriefing subsequent to testing.
RAT items were hand scored, and Boundaries and BORRTI questionnaires were computer-scored and
analyzed using dedicated software.
Pilot Hypotheses
Hyp.1: On the Remote Associates Test, music majors will achieve a greater number of correct
answers than business majors, particularly on the Difficult test items. As an exploratory hypothesis, it
is predicted that this will be accompanied by a greater willingness to try, indicated by higher number
of attempted responses.
The "heuristic" thinking, or the freer associating, necessary to complete the difficult items, will be
easier for the music majors due both to previously discussed personality factors and that, due to the
continuous exposure to novel and non-reductive challenges associated with their work, they will be
more adept at finding remote or counterintuitive solutions. This hypothesis follows Mednick's original
results to this effect.
Hyp.2: Musicians will have thinner boundary scores than business majors, as indicated by higher
Sumbound scores.

Hartmann's initial research indicated that thinner-bounded individuals were found more often, among
others, in artistic professions. In addition, thick responses correspond to the organized, discrete,
quantifiable, emotionally neutral axis of cognitive style associated with business and finance.
Hypothesis 3:
Musicians will score higher than business majors on the BORRTI measure of insecure attachment.
The functions of music as a special category of transitional object, as previously discussed, would
suggest the proximity of oral and attachment issues, among the other "primitive forms of mental
experience", (Kohut, 1955), such as the omnipotence and pre-verbal expressive functions of playing
music. However, as BORRTI measures are more pathology-dependent than not, and previous findings
(Dvorkin, '92) indicate the adaptive value of musical involvement, this is a tentative prediction.
Hypothesis 4: There will be a positive correlation between boundary thinness and Insecure
Attachment subscales, based upon Sousa's (1993, 1996) findings to this effect. Sousa (1994) found a
significant positive correlation between Insecure Attachment and the Sumbound scale of the
Hartmann Boundaries questionnaire (r= .4276, p < .001), suggesting that people with thinner
boundaries overall will also have issues in the areas described by Insecure Attachment. This may be a
stronger indicator of the idea suggested in Hyp. 3.
Results
N=40 (20/20)
CORRELATIONS BETWEEN SUBSCALES: BOUNDARIES AND BORRTI
SUM INNER OUTER

ALIENATION .3749 .4756 .1391
p= .029 p= .004 p= .433
EGOCENTRICITY .4189 .5645 .0412
p= .014 p= .001 p= .817
INSECURE ATTACHMENT .4471 .5983 .0411
p= .008 p< .001 p= .817
SOCIAL INCOMPETENCE .1982 .2442 .0130
p= .261 p= .164 p= .942
UNCERTAIN PERCEPTION .3011 .4898 .0031
p= .084 p= .003 p=.986
Hypothesis 1: Music majors will achieve a greater number of correct answers than business majors on
the Remote Associates Test, particularly the Difficult test items. As an exploratory hypothesis, it is

predicted that this will be accompanied by a greater willingness to try, as indicated by a higher
number of attempted responses.
TABLE 1
Means
TOTAL CORRECT DIFFICULT CORR EASY CORRECT
MUSIC 9.0 2.1 6.89
BUSINESS 6.87 2.0 4.87
(p= .035) (p=.425) (p=.006)
TABLE 1a.
CORRELATIONS BETWEEN SUBSCALES: BOUNDARIES AND REMOTE ASSOCIATES
TEST
SUM INNER OUTER

TOTAL CORRECT .3993 .1104 .6071
P= .010 P= .267 P< .001
EASY CORRECT .3471 .0393 .6375
P= .022 P = .413 P< .001
DIFFICULT CORRECT .3320 .1801 .3338
P= .028 P= .154 P= .027
Music majors did achieve a significantly greater number correct as indicated by the Total Correct
figures, but these were clustered within the Easy items, with no significant difference found on the
Difficult items. Hypothesis #1 is thus supported, but not with the expected strength. The exploratory
hypothesis is not supported, as there was no significant difference in number of responses attempted.
Hypothesis #2: Musicians will have thinner boundaries than business majors, as indicated by higher
Sumbound scores.
Sumbound
Music: M=317.8
Business: M=257.9 (p=.001)

Musicians demonstrated thinner overall boundaries by a wide margin. This finding was consistent
across most boundary subscales with the exception of the Precision and Sensitivity subscales, in
which there was no difference.
Hypothesis #3: Musicians will score higher than business majors on the BORRTI measure of insecure
attachment.
z-score
Music: M=.0863
Business: M=-.2279 (p=.10)
While the mean score was, at first glance, quite a bit higher for musicians, this hypothesis is rejected
on the criterion of variance.
Hypothesis #4: There will be a positive correlation between boundary thinness and Insecure
Attachment.
SUM INNER OUTER

INSECURE ATTACH r=.4471 r=.5983 .0411
(p=.008) (p<.001) (p=.817)
This hypothesis is confirmed, as a correlation was found of .4471 (p=.008) between Sumbound and
Insecure Attachment subscales, which is accounted for largely by thin inner boundaries. Interestingly,
outer boundaries are shown to be orthogonal to insecure attachment.
Other Findings
There is a high correlation (r=.5113, p=.002) between Sumbound and the # of "Pathological"
responses on the BORRTI. This is accompanied by greater report of Hallucination and Delusion,
Uncertain Perception and Social Incompetence amongst the BORRTI scores of music majors, and
much higher Unusual Experience boundary scores. There is a cluster of strong correlations between
Inner Boundaries and the BORRTI measures of Egocentricity, Alienation and # of Pathological
Responses. These correlations were far higher for Inner Boundaries than for Outer.
Discussion
A number of issues point toward further research. One comes from the pathological/non-pathological
distinction. Sousa (1996) documents the correlation between severe psychiatric illness and the very
thin inner/very thick outer boundary profile. While the musicians in this study had thinner inner
boundaries than outer, the differences between musicians and business majors with respect to outer
boundary thickness were far less than for inner boundaries, suggesting that the ego functions of both

groups were intact, consistent with the dismissal of pathology as a factor. In other words, the
populations may be considered to be well adapted and not, on the whole, pathological. The issue may
be one of degree, wherein a "normal" population of artists have both serviceable though less flexible
or sophisticated social coping capacities (outer boundaries) than the inner associative fluidity
responsible for creativity (inner boundaries). This profile can easily be seen as yielding to pathology
when the inner experience becomes overly fluid and undifferentiated and irreconcilable with outer
reality, which is dealt with in a brittle and inept manner, a description which, though incomplete, is
consistent with psychosis. Measures of egocentricity, alienation and uncertain perception may reflect
both a preoccupation with ones own inner experiences and interpretations and a resultant sense of
otherness, doubt and separateness - an interpretation which seems consistent with the thin inner/thick
outer profile.
The pathology-sensitive BORRTI yields results supporting the predicted links between object
relations and career choice, while indicating no significant psychopathology. Hartmann's findings that
his thin-bounded subjects had more vivid and distressing access to inner, irrational material but
without diagnosable disorders corresponds to this, and suggests that "psychopathology" rests along a
loose continuum of common human experience and may often be a matter of degree, not kind, of
factors producing reality and relational distortion. With regard to Kissen's (1995) work on the linkage
of affects and internalized objects, the fact that the inner boundaries of musicians are far thinner than
outer and that thin inner boundaries as a whole are highly correlated with attachment issues may
demonstrate both the links between creativity and inner fluidity, and that the affects which "drive" the
creative process and motivate the choice of an artistic career derive from the affect and personal
meaning of internalized early object relationships. Given the thin inner boundaries of musicians and
the strong correlation between inner boundary thinness and insecure attachment, the variance-related
lack of significance in the difference between musicians and business majors with regard to insecure
attachment may be considered a statistical anomaly. Additionally, the lack of correlation between
insecure attachments and outer boundary fluidity may suggest higher level ego functions "layered
atop" the oral issues explored in the aforementioned study by Juni et. al. (1986), corresponding most
closely to the attachment issues as reflected in the BORRTI. This is a fruitful topic for future research.
An offshoot of this would be the study of Hartmann's (1981) preliminary finding that composers
("pure" artists) scored thinner than instrumentalists ("interpretive" artists).
It may be more difficult to account for the reversal of the inner/outer profile with regard to
performance on the Remote Associates Test than the Boundaries Questionnaire. There was a strong
correlation between performance on the RAT and outer boundary thinness, but no correlation between
RAT performance and inner boundaries. It may be that there is a qualitative difference
psychodynamically between fluidity in verbal and non-verbal associative ability, and that the
object-activating functions utilized by musicians while playing music (i.e. those involving objects
active in insecure attachment issues) tap into pre-verbal psychic territory; both music and enterprise
tap into distinct but related psychological processes. Thus, the "creativity" which the musician uses in
making music is distinct from that which he uses when solving a heuristically challenging word
puzzle, and that it is due to a globally, reasonably well-adapted and diversified personality that he is
able to do both. This should not be surprising given recent advances in understanding the neural and
informational mechanism activated in various cognitive processes, such as those of Anderson and
Lewis (1976) discussed earlier.
As previously discussed, a key area for further study involves how character is both a determinant and
a result of choosing "objects" in the broader sense, a choice which involves confronting several
"boundaries", mandates, personal criteria, etc., and modifying them by adapting them to the emerging
structure of the individual character, which is never fixed but is in a dynamic process of fixing itself

and modifying its boundaries through such choices. One can consider a reed player's angst-ridden
decision to play the sax in a jazz quartet instead of the clarinet in an orchestra as a determining factor -
a vector. or moment of truth - in the emerging sense of himself and the life he will lead, as much as
having been brought about by earlier factors of which he was merely the locus or result. This can only
expand the conception of the individual self as a key player in his own dynamics, not merely a dyad of
conscious puppet and unconscious puppeteer residing in a single body, or of historical cause and
human effect. Further study should address the complementarity and interaction or psychic
mechanisms such as those addressed here and in related study, perhaps with respect to insights from
other areas of study of dynamic systems. There is much in this fertile field to apply to the uniqueness
of the individual and the phenomenon of the dynamic relation with the inner and outer world which
yields something as odd as a personality or a self. In so doing, we may achieve valuable insights into
creativity, work and love in vivo, and neither merely in vitro, cross-section nor theory.
References
Anderson, J.R., (1983). The Architecture of Cognition. Cambridge, MA: Harvard University Press
Bell, M. (1991). An Introduction to the Bell Object Relations and Reality Testing Inventory, Los
Angeles, CA: Western Psychological Services
Bell, M., Billington, R. & Becker, B., (1985). A Scale for the assessment of object relations:
reliability, validity and factorial invariance. Journal of Consulting and Clinical Psychology, 42, 5,
733-741.
Blatt, S.J. and Ritzler, B.A., (1974). Thought disorder and boundary disturbance in psychosis. Journal
of Consulting and Clinical Psychology, 42,3 370-381.
Bollas, C. (1992). Being a Character: Psychoanalysis and Self Experience. New York: Hill and Wang
Bollas, C. (1989). Forces of Destiny: Psychoanalysis and Human Idiom. New Jersey: Jason Aronson
Coney, J. & Serna, P. (1995). Creative thinking from an information processing perspective: A new
approach to Mednick's theory of associative hierarchies. Journal of Creative Behavior, Vol. 29,
Number 2, 109-132
Dvorkin, J. (1992) Ego Development and Self Representation Among High School Adolescents in
Music Performing Groups, Doctoral Dissertation, Pace University, Department of Psychology, New
York, NY.
Dvorkin, J., (1991). Individual Music therapy for an adolescent with borderline personality disorder:
An object relations approach. In Case Studies in Music Therapy (ed. Bruscia, K.E.). Phoenixville, PA:
Barcelona Publishers
Eigen, M. (1996). Psychic Deadness. New Jersey: Jason Aronson.
Forbach, G.B., & Evans, R.G. (1981) The remote associates test as a precondition of productivity in
brainstorming groups. Applied Psychological Measurement, 5, 333-339.
Greenberg, J.R., & Mitchell, S.A. (1983). Object Relations in Psychoanalytic Theory. Cambridge,
MA: Harvard University Press.
Hartmann, E., 1990. Thin and thick boundaries: personality, dreams and imagination. In Mental

Imagery, (ed. Kunzendorf, R.) New York: Plenum Press.

Hartmann, E., 1991. Boundaries in the Mind. New York: Basic Books.
Hartmann, E., Sivin, I., Cooper, S., & Treger, F. (1984) The personality of lifelong nightmare
sufferers: Projective test results. Sleep Research, 13, 118
Hillman, J. (1972). The Myth of Analysis. New York: Harper Perennial
Juni, S., Nelson, S.P., & Brannon, R. (1986). Minor tonality music preference and oral dependency.
The Journal of Psychology, 121(3), 229-236
Kissen, M. (1995). Affect, Object and Character Structure. Madison, CT; International Universities
Press
Kohut, H. (Goldberg, A. & Stepansky, P., editors) (1984). How Does Analysis Cure? Chicago, IL:
The University of Chicago Press.
Kohut, H. (1955). Some psychological effects of music and their relation to music therapy. Music
Therapy, 5:17-20.
Lewis, C.H. & Anderson, J.R. (1976). Interference with real world knowledge. Cognitive Psychology,
7, 311-335
Mednick, S. (1962). The associative basis of the c
Back to index

The Psychodynamics Of Creativity - Appendices
Appendix to proceedings paper

Daniel Sapen, M.A.
The Derner Institute of Advanced Psychological Study
Appendix A
Hartmann et al organized their questions into twelve general categories, which comprise the subscales
of the questionnaire. Each subscale contains questions about experiences, opinions, preferences,
tendencies, etc., reflecting more concretely the personal boundary tendencies of the subject. They are:
1. Sleep/Wake Dream
example: "When I wake in the morning, I am not sure whether I am awake for a few minutes".
2. Unusual Experiences
example: "I have had deja vu experiences"
3. Thoughts, Feelings, Moods
example: "Sometimes I don't know whether I am thinking or feeling"
4. Childhood, Adolescence, Adulthood
example: "I am very close to my childhood feelings"
5. Interpersonal
example: "When I get involved with someone, sometimes we get too close"
6. Sensitivity
example: "I am very sensitive to other people's feelings"
7. Neat, Exact, Precise
example: "I keep my desk and worktable neat and well organized"
8. Edges, Lines, Clothes
example: "I like houses with flexible spaces, where you can shift things around and make different
uses of the same rooms"
9. Opinions about Children
example: "I think a good teacher must remain in part a child"
file:///g|/poster3/Sapenapp.htm (1 of 3) [18/07/2000 00:38:35]

10. Opinions about Organizations

example: "In an organization, everyone should have a definite place and a specific role"
11. Opinions about People, Nations, Groups
example: There are no sharp dividing lines between normal people, people with problems and people
who are considered psychotic or crazy"
12. Opinions about Beauty, Truth
example: "Either you are telling the truth or you are lying; that's all there is to it"
Subjects respond to each item on a five point scale, with 4 indicating "not at all true of me" and 4
indicating "definitely true of me" Approximately two-thirds of the items are worded so that a response
of 4 will constitute a "thin" response (e.g. :My dreams are so vivid that even later I can't tell them
from waking reality"). The other one-third are worded so that the opposite is true, namely that
"thickest" response would be a 0 (e.g. "I like to pigeonhole things as much as possible").
Psychometric Properties of the Hartmann Boundaries Questionnaire
The authors report that the measure has been tested for reliability and validity in several ways.
Split-half reliability is reported as c-alpha= .925. Face validity is described as excellent based on
results conforming closely to a priori predictions, using populations of art students and naval officers
as subjects. Significant correlations were established with the MMPI, including -.37 between
Sumbound and the MMPI K-scale, measuring defensiveness, which is precisely what the authors
predict. A similar negative correlation was found between Sumbound and L (lying or wanting to make
a good impression). Both correlations were p<.001. Strong correlations were found between
Sumbound and the "psychotic" scales of the MMPI, particularly with the F-scale regarding unusual
experiences. Women tended to score marginally thinner than men. Older people tended to score
thicker, which was tentatively accounted for by the authors by reference to the notion that we
"thicken" with age, or as a result of cultural differences based on having grown up before or after the
sixties. This is not relevant to the current study, as all subjects were between eighteen and
twenty-three years of age. The authors go on to document a significant affinity between thin
boundaries measures and the capacity for imagery, which is highly relevant to the study of the
creative personality. In any case, the Boundaries questionnaire is revealed to be reliable, to have at
least significant face validity and as argued by the authors, is a good trait measure tapping an enduring
dimension of personality.
Psychometric Properties of the BORRTI

The subscales of this instrument were created via an oblique factor analysis (n=336) of a prior version
which had produced a single score for RT and OR. A replication revealed strong factorial invariance.
The authors claim a lack of response bias due to age or gender, although men show higher SI scores.
There is no significant correlation between the OR subscales and Marlowe-Crown Social Desirability
Scale, eliminating striving for social desirability as a factor. Internal consistency and split-half
reliability for all subscales is good to excellent, with Cronbach's Alpha and Spearman Split-Half
scores ranging from .77 to .90. Test-retest reliability varies on particular scales from fair to excellent;
this is affected by the criterion that subjects report based on their most recent experience, and
therefore state changes would be reflected in lower correlations over time, especially with psychiatric

samples undergoing treatment. Validity estimates are based on the use of the BORRTI with diverse
and divergent clinical populations over time. The authors report on more than a dozen such studies
prior to 1991 in which the BORRTI has demonstrated discriminate, concurrent and predictive validity,
others in which it has been used as an outcome measure, and a review in Tests Critiques (Alpher,
1990) which concluded that it is a reliable and valid instrument for assessment of object relations and
reality testing.
Back to main paper

Mr Cooper
RECOGNITION OF RHYTHM AND PITCH SEQUENCES IN RELATION TO A WORKING MEMORY MODEL
Mr. William B. Cooper
wcooper@utdallas.edu
Background:
With respect to Baddeley's (1984) model of working memory, previous research

using recall tasks has indicated that the resources used to process a
left-hemisphere phonological task overlap with the reources used to process
brief rhythm sequences, or brief sequences of pitches. Additionally, it has
been shown that rhythm processing requires some of the resources used by a
right-hemisphere visual-spatial task.
Aims:
By using a recognition paradigm, opposed to an earlier use of a recall

paradigm, the current investigation seeks to replicate the earlier findings
without incurring potential artifacts that may arise through the use of a
recall paradigm.
method:
Participants in this study will engage in a dual-task paradigm where they will
be exposed to both a musical stimulus (either a brief rhythm sequence or a
brief pitch sequence) and a non-musical stimulus (either a string of digits or
a temporally presented sequence pattern of lighted squares). They will be asked
to remember both sequences simultaneously. Immediately following this
presentation, subjects will perform a two-alternative forced choice decision
for each of the two stimuli presented. Each of the two-alternative forced
choice tasks will present a correct and incorrect sequence, respective to the
earlier presented stimulus. For each of the two tasks, the subject responds by
indicating which stimulus sequence was presented earlier.
Results:
Results from the previous study indicate that the performance on memory tasks
for the four stimuli sequences is poorer when digit sequences are paired with
rhythm or pitch sequences, and when rhythm sequences are paired with block
sequences (lighted squares). It is predicted that the current experiment will
produce similar results.
Conclusions:
Baddeley's model of working memory seeks to account for how one might store
phonological and visual-spatial information in working memory. However, it is
not clear how this model accounts for the storage of different types of musical
information. The present line of research described here helps to further
define Baddeley's model by indicating that the resources used to process
musical and non-musical information overlap in predictable ways.

Mr Cooper
Back to index

SHORT-TERM MEMORY FOR TEMPO
Proceedings paper

Marek Franek
Department of Psychology, Faculty of Philosophy, Charles University, Celetná 20, 110 00 Prague 1, Czech Republic. E-mail:
mfranek@volny.cz
Nearly all memory theorists agree that two forms of memory storage exist: short-term memory and long-term memory (the first, who
proposed this duality, was James, 1890). Short-term memory refers to the information that forms the focus of current attention and that
remains in consciousness after it has been perceived and forms part of the psychological present, while long-term memory contains
information about events that have left consciousness, and are therefore part of the psychological past. It holds information for a long time
- days, months, years. It is obvious that short-term memory plays an important role in perception of music. Successful processing of just
perceived pitch and temporal information requires to keep perceived stimuli for certain time period in a short-term memory store.
There are not enough empirical works concerning memory for tempo. The significant study by Levitin and Cook (1996) was devoted to
long-term remembering of tempo of familiar songs. However, any research in short-term remembering of speed of tempo in general form,
e.g. without association with a specific piece of music, has not been done yet. Psychology of music disposes with a large body of
knowledge from the domain of short-term memory for pitch (see Deutsch, 1999). It is known that there is a special memory store for pitch
and that remembering of pitch decays very slowly. However, no similar knowledge is available on short-term memory for temporal
information, for instance for a rate of tempo
Our research was devoted to the short-term memory for rate of presented metronomic sequence. The aim of the present study was to
investigate how remembering of a rate of a tempo gradually decays and to test whether the decay of memory differs in various tempo
zones. The experiment was designed to be a pilot research in this problem.
Method
Thirty-four subjects, music amateurs, aged between 19 and 28 years, participated in the experiment. They were asked to hear a short
sequence in the standard tempo and after a retention interval to reproduce the rate of the sequence by finger tapping onto a special tapping
device connected with a computer. The device enabled to measure duration of intertap intervals produced be the subjects. The standard
tempo was presented via an electronic metronome. The rate of a tempo was defined by duration of the intervals between metronome clicks.
The following tempi were used: 300, 600, 900, 1200, and 1500 msec. The retention interval was 3, 10, 20, or 30 sec, respectively. The end
of the retention interval was marked by a green light, which appeared on the computer display.
All stimuli were presented in a random order. During the experimental session, subjects completed 20 trials, each trial consisted of a
certain combination of a particular tempo rate and retention interval (5 tempi x 4 durations of the retention interval). Subjects were asked to
avoid rhythmical body movements and/or internal continuation of the tempo (counting or continuos mental presentation of the tempo)
during the retention interval.
Results
For each trial, the averages from durations of intertap intervals referring to the rate of the retrieved tempo, were computed. The memory
decay was expressed as the magnitude of an error of the retrieved tempo with respect to the rate of the standard tempo, in the form of
absolute deviation (absolute magnitudes of positive/negative deviations between retrieved and the standard tempo, see Fig.1.).
file:///g|/poster3/Franek.htm (1 of 3) [18/07/2000 00:38:38]

Fig.1. Memory decay of retrieved tempo as a function of tempo zone and duration of retention interval. The means and standard errors are
presented. The error is expressed in the form of absolute deviation (absolute magnitudes of positive/negative deviations between retrieved
and the standard tempo).
The effect of the (1) duration of a retention interval and (2) the rate of the tempo on the magnitude of error on absolute deviation was tested
with repeated measures ANOVA. Both (1) duration of a retention interval (F = 51.59467, p <.000000) and (2) tempo (F = 5.38411, p
<.000000) revealed significant effects. Generally, the longer retention interval, the greater error. However, the clear effect of memory
decay as the function of duration of a retention interval was observed only in fast and intermediate tempi. The error was greater in slower
tempi than in faster tempi. When the effect of duration of retention interval absolute deviation was tested within particular groups of tempi,
the significant impact was found only in tempi 300 and 1200 ms.
Discussion
In our experiment we demonstrated that the remembering of a rate of tempo depends on duration of a retention interval and a tempo zone.
However, the memory decay, which we observed, was not very large. Especially in fast and intermediate tempi the decay, even after the
retention interval of 30 seconds, was rather small. On the other hand, it is surprising that after the shortest retention interval 3 seconds the
tempo recall was not very precise. It is in the contrast with assumptions of time psychologists, that the temporal interval 3 seconds falls
into the psychological present (for instance, Fraisse,1984).
To date there is no appropriate theory describing the process of remembering a tempo rate. It is obvious that there is a substantial
difference between the task to remember a tempo and classical memory experiments, where subjects are asked to remember, e.g., digits or
letters. Schulze (1978) and Keele et al. (1989) refer memory model for discrimination of tempo change. According to the model, a listener
derives from a regular temporal sequence an internal reference interval, which he/she uses for mental representation of the rate of the
sequence. In accord with the mentioned model we might assume that, in our experimental task, the subjects encoded a rate of a standard
tempo in a form of a reference interval, which they remembered. Memory decay is then caused by gradual forgetting the precise duration
of the reference interval. We also assume that the subjects helped themselves in remembering by applying the categorization strategy. They
assigned particular tempi, which they perceived, to particular tempo categories (for instance "very fast", "fast", "intermediate" ...). If they
forgot a precise rate of a tempo, they might simply produce a similar tempo falling into the category. This encoding strategy might explain
relative small and slow memory decay of a remembered tempo rate.

Finally, it should be noted that this pilot experiment was done under laboratory conditions, which differ very much from real life situations.
Whilst in the experiment the subjects remembered the stimulus and waited for its recall without any other action, in real life situation
musicians during remembering of a tempo can produce large scale of various activities. This aspect will be taken into account in our future
research.
References
Deutsch, D. (1999). The processing of pitch combinations. In D. Deutsch (Ed.), The Psychology of Music (pp.349-411), 2nd Edition. San
Diego: Academic Press.
Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35, 1-36.
Keele, S.W., Nicoletti, R., Ivry, R. I., & Pokorny, R. A. (1989). Mechanisms of perceptual timing: Beat-based or interval-based judgment?
Levitin, D. J. & Cook, P. R. (1996). Memory for musical tempo: additional evidence that auditory memory is absolute. Perception &
Psychophysics, 58, (6), 927-935.
Schulze, H. H. (1978). The detectability of local and global displacements in regular rhythmic patterns. Psychological Research, 40,
173-181.
Back to index

Implicit memory for musical rhythm:
Proceedings paper

Is pitch information coded into representation underlying perceptual priming of rhythmic
tone sequence?
Yasuhiro Goto
School of Humanities, Hokusei Gakuen University
Nishi 2-Chome 3-1 Ohyachi, Atsubetu-ku, Sapporo 004-8631
JAPAN
goto@hokusei.ac.jp
Introduction
The purpose of this paper is to investigate the nature of the implicit memory for the musical rhythm.
A great deal of recent studies about memory are devoted to examining the relation between explicit and implicit forms of
memory. Explicit memory refers to conscious or intentional recollection of previous experience; implicit memory, in
contrast, refers to unintentional retrieval of previously acquired information on tests that do not require intentional
recollection of a specific prior episode. Current researches have revealed dissociations between explicit and implicit
memory. For example, some studies have showed that explicit and implicit memory are differentially affected by such
variables as study/test modality shifts (e.g., Graf, Shimamura, & Squire; Roediger & Blaxton, 1987; Schacter & Graf, 1989),
levels and types of study processing (e.g., Graf & Schacter, 1989; Jacoby, 1983; Schacter & Graf, 1986;) and various other
manipulations (e.g., Hyman & Tulving, 1989a, 1989b; Michell & Brown, 1988).
Implicit memory can be confirmed by examining what we call priming effect. Priming is a phenomenon where processing of
the preceding stimulus influences the process of the succeeding stimulus and is classified into two types: direct priming and
indirect priming. Direct priming is observed when the preceding stimuli are exactly the same as the succeeding stimulus.
Therefore, it is also called repetition priming or perceptual priming. In contrast, indirect priming is observed when there is
semantic relation between the preceding and succeeding stimulus. In this article the term "priming" is used to refer to direct
priming.
Research concerning implicit memory has focused almost exclusively on tests involving visual processing. As for the
studies using verbal materials, for example, word identification (e.g., Graf & Ryan, 1990; Jacoby & Dallas, 1981), fragment
and stem completion (e.g., Hayman & Tulving, 1989a; Roediger & Blaxton, 1987), and lexical decision (e.g., Rueckl, 1990;
Scarborough, Gerard, & Cortese, 1979) are used as a test involving visual processing. There are also many papers about
implicit memory for nonverbal objects such as picture completion (e.g., Jacoby , Baker & Brooks, 1989; Snodgrass, 1989),
picture naming (e.g., Bartram, 1974; Michell & Brown, 1988), object decision (e.g., Schacter, Cooper, & Delaney, 1990),
and pattern completion and identification (e.g., Musen & Treisman, 1990).
Similarly, some researches have explored implicit memory in the auditory domain. Several studies have demonstrated
priming effects on auditory-word identification and sentence-identification tasks (e.g., Franks, Plybon, & Auble, 1982;
Jackson & Morton, 1984: Schacter & Church, 1992), on an auditory stem-completion task (Bassili, Smith, & MacLeod,
1989; McClelland & Pring, 1991), and the like. It is true that there is relatively little research in this field, but over the past
few decades a number of studies have been made on the implicit memory in the auditory domain.
In comparison with these researches, the study of the memory for music information so far only has scratched the surface of
the topic. To put it more precisely, there are some studies concerned with the musical information, but most of them deal
with explicit memory; for example, pitch information (e.g., Deutsch, 1970; Massaro, 1970; Sloboda, 1976), short tone
sequences with pitch height (e.g., Mikumo, 1990, 1992,1994a, 1994b), melodic contour (e.g., Bartlett & Dowling, 1980;
Dowling & Bartlett, 1981; Dowling & Fujitani, 1971) and so on. These researches have investigated the nature of explicit
file:///g|/poster3/Goto.htm (1 of 9) [18/07/2000 00:38:42]

memory for a melody, focusing on the "pitch height" aspects.

There are also some researches on memory that investigate into the aspect of duration of a melody, put in another way,
rhythmic aspects. For example, Povel and Essens (1985) argued the memory of a temporal pattern. They concluded that the
"internal clock," which was periodical time units the listeners felt, played an important role when listeners perceived and
memorized temporal pattern. Goto and Abe (1998) examined the nature of memory for metrical units of a melody. They
performed recognition and rating task and confirmed the psychological reality of metrical unit in musical rhythm perception.
There are a few studies about the role played by the rhythm in memory for a melody, and conflicting results have been
reported (Hebert, & Peretz, 1977; Sculkind, 1999; White, 1960).
To implicit memory about music, however, little attention has been focused. Only few attempts have so far been made at the
implicit memory for music information. There are some studies about chord perception (Bharuha & Stoeckig, 1986, 1987;
Kawaguchi and Mikumo, 1994; Tekman & Bharucha, 1992, 1998) and melody perception of alcoholic Korasakoffâ€(tm)s
syndrome patients (Johnson, Kim & Risse, 1985), but as far as I know, there is no research of implicit memory for rhythmic
aspects of a melody. However, it is unnatural that the implicit memory for a rhythmic tone sequence has not been
investigated despite the fact that the evidence has been found for implicit memory in various other fields. Or rather, it is
reasonable to interpret that there is also implicit memory for musical rhythm perception as in other domains.
In this article, I extended implicit memory research into a previously unexplored domain: the representation and retrieval of
information about musical rhythm. The most important objective of this study is to investigate whether a direct priming
effect for a rhythmic tone sequence is observed. In order to the clarify the nature of implicit memory for musical rhythm, I
used the "transformation paradigm." This is the way to examine what kinds of information are important by changing all or
part of physical characteristics of stimuli between study phase and test phase. What I concentrated especially in this
experiment is pitch height. That is, how the pitch height information is represented in implicit memory for musical rhythm
will be investigated experimentally.
What sort of task could be used to assess implicit memory for rhythmic tone sequence? One promising candidate -
promising from a number of research perspectives - is a task similar to the ons used in the Mandler, Nakamura, and Zandt
(1987). They used the "brightness judgment task" in order to assess of the implicit memory. In their experiments, two
octagons, one was an old item and another was a new item, were presented in the same brightness at the very short
durations. Subjects were asked to judge which of the two shapes appeared brighter. The result was that subjects selected old
items correctly in spite of the fact that the brightness of two items was the same. This result was interpreted as a reflection of
facilitation caused by the prior exposure; in other words, a priming effect was observed for the studied items. In this
research, "loudness judgment task" would be used in order to assess of the priming effect for rhythmic tone sequence; in the
priming task listeners heard two rhythmic tone sequences and asked to judge which of the two tone sequences sounded
louder. If listeners select the old item correctly, it would be interpreted as the evidence for the implicit memory for a
rhythmic tone sequence.
Method
Design
The experimental design consisted of one factor, a pitch shift factor: the pitch-identical condition, pitch-up condition and
pitch-down condition. This factor was manipulated as a within-subject variable. Dependent measures were either
performances in the priming task or recognition task.
Participants
Forty Hokkaido University undergraduates and graduates participated in the experiment.
Apparatus
Each tone sequence was generated on a Roland Sound Canvas SC-88 tone generator using the "Cakewalk Pro Audio"
software. It was controlled by PC/AT compatible computer. Operating system was Windows 98.
Materials
The experimental materials consisted of a total of 42 tone sequences. Each of them was the rhythmic tone sequences
consisted of 3-6 notes. The kinds of a note constituting the tone sequence were "eight notes," "quarter notes," "dotted quarter
notes," and/or "half notes." The real time length of the note was 250msec, 500msec, 750msec and 1000msec, respectively.
The total time length of each tone sequence was 2000msec and it was equivalent to "1 measure" of 4/4 meter. They were all
"equitonal" musical tone sequences, which had only the note-length information and all the other factors (e.g., pitch height,
timbre, intensity) were equal. The pitch height of all these tones was same, A4.

These tone sequences were generated to satisfy three constrains: (a) the tone sequence was consisted of by more than two
kinds of note values, in other words, the tone sequence was NOT consisted of by one kind of note value; (b) the tone
sequence was not extremely syncopated one; and (c) the tone sequence was metrical but not random one. The reason for
these constrains was that the tone sequence consisted of only one note value was too simple and not appropriate as an
experimental material, and that a "metrical" tone sequence was more difficult to memorize than that of metrical one (cf.
Povel and Essence, 1985). With regard to the constraints (b) and (c), two musicians who did not participated in the present
experiment rated all the candidate tone sequences in terms of "how each tone sequence was "metrical"" on a 7-point
scale(1= not metrical , 7= metrical enough) and the tone sequences which were judged "metrical" were used in the present
experiment.
Participants studied 21 tone sequences. The remaining 21 were not studied; they were included on the priming task in order
to determine baseline levels of performance and on the recognition task as distracter items. The priming task and
recognition task thus consisted of 42 critical items: 21 studied tone sequences and 21 nonstudied tone sequences. The
presentation order of tone sequences on both tasks was determined randomly for each subject. The sound pressure level of
all tones were equal, at the comfortable listening level (about 75dB SPL). The distance between a subject and the speaker
was about 70cm. The timbre of a note was equal, the piano sound. The pitch height of all notes was equal in the study phase,
A4, and shifted either to E5 or B4 in the test phase (this is described more precisely in a procedure section).
Procedure
All subjects were tested individually in a soundproof chamber. Each experiment was conducted under conditions of
incidental encoding. Subjects were told that they were participating in an experiment on the preliminary experiment of
music perception, and no mention of a later memory test was made.
Study phase. In the study phase, subjects were informed that rhythmic tone sequences would be presented from the speaker
and their task was to rate "coherence" of the tone sequence on a 7-point scale (1= not coherent at all, 7= completely
coherent). They were further instructed to hear the tone sequence carefully, because the tone sequences were very short one,
and were told that it was important for them to make an accurate rating. The study phase then began with five practice items,
followed by presentation of the 21 critical tone sequences in a random order.
Test phase. Immediately after a study presentation, the half of the subjects were given instructions for the priming task and
the other half were given instructions for the recognition task.
The priming task was two-forced-choice task. On each trial, 2 tone sequences (both old and new items, in a random order)
were presented in succession at an interval of 2.0 sec. Loudness of two tone sequences (that is, intensity of all notes
consisting of the tone sequence) was the same.
The pitch of one third of old items or 7 tone sequences in the priming task was raised by shifting all pitches 5 scale steps
higher from A4 to E5. Similarly the other 7 tone sequences were lowered by shifting all pitches 5 scale steps lower from A4
to D4 (Figure 1). The pitch height of old items and new items was the same on each trial. Subjects were told that their task
was to judge which of tone sequences was the loudest. Subjects were instructed to hear carefully and mark at the designated
place on the sheet.
The recognition task was a surprise yes/no task. Subjects were instructed to mark either "Hai (means "yes" in Japanese)" on
the sheet if they remembered hearing

Figure 1. Example of stimulus used in this experiment; A is the original tone sequence used in the study phase and in the
pitch-identical condition of the test phase, B is the tone sequence used in the pitch-up condition and C is the tone sequence
used in the pitch-down condition.
the tone sequence during the prior rating task, or "Iie (means "no" in Japanese)" if they did not remember hearing the tone
sequences.
The pitch of one third of old items or 14 tone sequences in the recognition task was raised by shifting all pitches 5 scale
steps higher to from A4 to E5. Similarly, the other 14 tone sequences were lowered by shifting all pitches 5 scale lower from
A4 to D4. Six practice items (three new and three old) were presented before the 42 critical items. The order of presentation
was randomized. As in the priming task, a period of about 1 min intervened between the completion of the study task and
the appearance of the first critical item on the recognition test. The exact length of time to complete the recognition task
varied from subject to subject, but it generally took about 10 min.
After the completion of the test, all subjects were debriefed about the nature and purpose of the experiment.
Results
The results of the priming and recognition tasks are first considered separately and then followed by a contingency analysis
of the relation between them.
Priming task. Because the priming task was the two-forced-choice task, only the hit rate was analyzed. Hit rate was the
proportion of studied items called "louder." The priming effect was defined as the difference between the hit rate and chance
level

Figure 2. The result of priming task. The dotted line represents the chance level of 50%.
(50%, Figure 2).
Three important points should be noted about the results of the priming task. First, the priming effect was observed in all the
conditions. Subjects selected the studied items correctly in the pitch-identical condition, the pitch-up condition and the pitch-
down condition. Second, the priming effect in the pitch-identical condition was larger than that in the pitch-up condition and
that in the pitch-down condition. Third, the priming effect was almost the same in the pitch-up condition and the pitch-down
condition. This result shows that the shift of a pitch does not have an influence on magnitude of the priming effect.
Analysis of variance (ANOVA) confirmed this description of the results. A significant main effect of a pitch shift was
observed (F (2, 38)= 12.33, p<.01). Tukeyâ€(tm)s HSD test showed significant differences between the pitch-identical
condition and the pitch-up condition and between the pitch-identical condition and the pitch-down condition (HSD=0.24, p<
.01, HSD=0.19, p< .01, respectively). No significant difference was observed between the pitch-up condition and the
pitch-down condition.
Then, a further analysis was performed in order to confirm the difference between the result of hit rate and the chance level
in each condition. In all conditions, t test revealed that the result of hit rate was significantly higher than the chance level (t
(19)=3.11, p< .001, t (19)=2.96, p< .001, t (19)=3.21, p<. 001, respectively).

Figure 3. The result of recognition task. The dotted line represents the chance level of 50%.
Recognition task. Two different measures of recognition were subjected to an ANOVA: the hit rate and hit rate minus false
alarm rate. Since both analyses led to an identical conclusion, only the result of hit rate analysis was reported; this simply
reflects the fact that false alarm rates were relatively constant across conditions.
An overall ANOVA revealed no significant main effect of pitch shift. The difference between the result of the hit rate and
chance level in each condition was not significant (Figure 3).
Contingency analysis of coherence judgment and recognition performance. The purpose of the contingency analysis was to
determine whether priming effects on coherence performance are dependent on, or independent of, recognition memory. In
order to determine the relation between priming task and recognition task, the Yule Q statistic was used. Q is a measure of
the strength of relation between two variables that can vary from â€"1 (negative association ) to +1 (positive association); 0
indicates complete independence (Hayman & Tulving , 1989a). For the present data, Q=+.082 at the pitch identical
condition, +.99 at the pitch up condition, and +. 099 at the pitch-down condition. These value did not differ significantly
from zero; significance was assessed by a chi-square test suggested by Hayman and Tulving, Ï‡2 (1, N=20)=0.52 at the
pitch-identical condition, Ï‡2 (1, N=20)=0.42 at the pitch-up condition, and Ï‡2 (1, N=20)=0.48 at the pitch-down condition.
The contingency analysis thus demonstrates stochastic independence between recognition and coherence judgment
performance.
Discussion
In this study the implicit memory for the music was investigated experimentally.
Priming task and recognition task were performed for the musical rhythm, that is to say, the rhythmic tone sequences
consisted of by more than one note value. The following results were obtained in the priming task: First, perceptual priming
was observed not only in the pitch-identical condition but also both in the pitch-up and pitch-down conditions. The results
show that the information that depends on the pitch height is encoded in the priming representation of a rhythmic tone
sequence. Secondly, the result of priming both in the pitch-up condition and the pitch-down condition was smaller than that
of the priming effect obtained in the pitch-identical condition. This result indicates that the information depending on the
pitch height of the stimuli is encoded in the priming representation. These results lead to the conclusion that the priming
representation of the rhythmic tone sequence encodes both information dependent and independent of the pitch height.
Thirdly, the magnitude of the priming effect was the same in both the pitch-up condition and the pitch-down condition. In
other word, the direction of a pitch shift did not influence the priming effect. We can explain this results as follows; the
priming effect both in the pitch-up and -down conditions is caused only by the information independent of the pitch shift

because the information dependent on the pitch shift is lost by ascending or descending the pitch height, and such
information is independent of the pitch shift.
On the recognition task, the results were at the chance level in all conditions. Coherent judgment performance seems to have
not been influenced by the recognition performance because the results of recognition task were statistically independent of
priming task.
After the experiment, almost all subjects who were assigned to the priming task stated their impressions that they could not
judge which of tone sequences was the loudest. They often said that they had judged haphazardly. The results show,
however, they selected studied item as a "loudest" tone sequence correctly. This suggests that the task was performed by
incidental encoding and subjectsâ€(tm) judgment was unconscious. We may, therefore, reasonably conclude that the
loudness judgment task reflects implicit memory.
As I said at the outset, the primary purpose of this research was to investigate whether a direct priming effect for a rhythmic
tone sequence was observed in the experiment. The result of the experiment showed the priming effect for the rhythmic tone
sequence and implicit memory for musical tone sequence was confirmed. The results also suggested that the pitch had a
certain influence on the priming representation of rhythm. We come to the conclusion that the pitch information is equally
encoded in the priming representation of the rhythmic tone sequence in spite of its direction of change.
This research examined the implicit memory for the musical rhythm. The nature of priming representation for rhythmic tone
sequences was clarified especially focusing on the pitch height of notes. Needless to say, we cannot say that the whole
nature of priming representation for the rhythmic tone sequence was clarified by only one experiment described above.
There is a room for further investigation; for example, whether the degree of pitch shift has an influence on the magnitude of
the priming effect, whether other information about a note, such as timbre, intensity and so on, can be represented in the
priming representation, and whether a non-metrical rhythmic tone sequence has a similar priming effect. These are
remaining to be proved in future investigation.
References
Bartlett, J. C. & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance effect in
developmental perspective. Journal of Experimental Psychology: Human Perception and Performance, 12. 403-410.
Bassili, J. N., Smith, M. C., & MacLeod, C. M. (1989). Auditory and visual word-stem completion: Separating
data-driven and conceptually driven process. Quartely Journal of Experimental Psychology, 41A, 439-453.
Bartram, D. J. (1974). The role of visual and semantic codes in object naming. Cognitive Psychology, 6, 325-356.
Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of chords. Journal of
Experiment Psychology: Human Perception and Performance, 12, 403-410.
Bharucha, J. J., & Stoeckig, K. (1987). Priming of chords: Spreading activation or overlapping frequency specta?
Dertsch, D. (1970). Tones and numbers: Specificity of interference in short-term memory. Science, 168, 1604-1605.
Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in long-term memory for melodies.
Dowling, W. J., & Fujitani, D. A. (1971). Contour, interval, and pitch recognition in memory for melodies.
Franks, J. J., Plybon, C. J., & Auble, P. M. (1982). Units of episodic memory in perceptual recognition. Memory &
Cognition. 10, 62-28.
Goto, Y., & Abe, J. (1998). Psychological reality of metrical units in rhythm perception. Proceedings of fifth
International Conference of Music Perception and Cognition, 335-340.
Graf, P., & Ryan, L. (1990). Transfer-appropriate processing for implicit and explicit memory. Journal of
Graf, P., & Schacter, D. L. (1989). Unitization and grouping mediate dissociations in memory for new associations,
Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 930-940.
Graf, P., Shimamura, A. P., & Squire, L. R. (1985). Priming across modalities and priming across category level:

Extending the domain of preserved function in amnesia. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 11, 385-395.
Hebert, S., & Peretz, I. (1997). Recognition of music in long-term memory: Are melodic and temporal patterns equal
partners? Memory & Cognition, 25, 518-533.
Hyman, C. A. G., & Tulving, E. (1989a). Contingent dissociation between recognition and fragment completion: The
method of triangulation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 228-240.
Hyman, C. A. G., & Tulving, E. (1989b). Is priming in fragment completion based on a "traceless" memory system?
Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 941-956.
Jackson, A., & Morton, J. (1984). Facilitation of auditory word recognition. Memory & Cognition. 12, 568-574.
Jacoby, L. L. (1983). Perceptual enhancement: Persistent effect of an experience. Journal of Experimental
Psychology: Learning , Memory, and Cognition, 9, 21-38.
Jacoby, L. L., Baker, J. G., & Brooks, L. R. (1989). Episodic effects on picture identification: Implications for
theories of concept learning and theories of memory. Journal of Experimental Psychology: Learning , Memory, and
Cognition, 15, 275-281.
Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning.
Journal of Experimental Psychology: General, 110, 306-340.
Johnson, M. K., Kim, J. K., & Risse, G. (1985). Do alcoholic Korasakoffâ€(tm)s syndrome patiendts acquire affective
reactions? Journal of Experimental Psychology: Learning , Memory, and Cognition, 11, 22-36.
Kawaguchi, J., & Mikumo, M. (1994). Implicit memory for music information: Priming effect on majar-minor
decision task for chord. Abstract of the 3rd Practical Aspects of Memory Conference. Mayland, U.S.A. 107.
MaClelland, A. G. R., & Pring, L. (1991). An investigation of cross-modality effects in implicit and explicit memory.
Quarterly Journal of Experimental Psychology, 43A, 19-33.
Mandler, G., Nakamura, Y., & Zandt, B. J. S. V. (1987). Nonspecific effects of exposure on stimuli that cannot be
recognized. Journal of Experimental Psychology: Learning , Memory, and Cognition, 13, 646-648.
Mitchell, D. G., & Brown, A. S. (1988). Persistent repetition priming in picuture naming and its dissociation from
recognition memory. Journal of Experimental Psychology: Learning , Memory, and Cognition, 14, 213-222.
Massaro, D. W. (1970). Retroactive interference in short-term recognition memory for pitch. Journal of Experimental
Mikumo, M. (1990). Merodi no fugouka to sainin [ Coding strategies and recognition of melodies]. The Japanese
Journal of Psychology, 61, 291-298. [In Japanese].
Mikumo, M. (1992). Encoding strategies for tonal and atonal melodies. Music Perception, 10, 73-81.
Mikumo, M. (1994a). Finger-tapping for pitch encoding of melodies. Japanese Psychological Research, 36, 53-64.
Mikumo, M. (1994b). Motor encoding strategy for pitches of melodies. Music Perception, 12, 175-197.
Musen, G., & Treisman, A (1990). Implicit and explicit memory for visual patterns. Journal of Experimental
Psychology; Learning, Memory and Cognition, 16, 127-137.
Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2, 411-440.
Roediger, H. L. III, & Blaxton, T. A. (1987). Effects of varying modality, surface features, and retention interval on
priming in word-fragment completion. Memory & Cognition. 15. 379-388.
Rueckl, J. G. (1990). Similarity effects in word and pseudoword repetition priming. Journal of Experimental
Psychology; Learning, Memory and Cognition, 16, 374-391.
Scarborough, D. L., Gerard, L., & Cortese, C. (1979). Accessing lexical memory: the transfer of word repetition
effects across task and modality. Memory & Cognition. 7. 3-12.
Schacter, D. L., & Church, B. A. (1992). Auditory priming: implicit and explicit memory for words and voices.
Journal of Experimental Psychology; Learning, Memory and Cognition, 18, 915-930.

Schacter, D. L., Cooper, L. A., & Delaney, S. M. (1990). Implicit memory for unfamiliar objects depends on access to
structural descriptions. Journal of Experimental Psychology; General, 119, 5-24.
Schacter, D. L., & Graf, P. (1986). Effect of elaborative processing on implicit and explicit memory for new
associations. Journal of Experimental Psychology; Learning, Memory and Cognition, 12, 432-444.
Schacter, D. L., & Graf, P. (1989). Modality specificity of implicit memory for new associations. Journal of
Experimental Psychology; Learning, Memory and Cognition, 15, 3-12.
Schulkind, M. D. (1999). Long-term memory for temporal structure: Evidence from the identification of well-known
and novel songs. Memory & Cognition, 27, 896-906.
Sloboda, J. A. (1976). Visual perception of musical notation: Registering pitch symbols in memory. Quarterly
Journal of Experimental Psychology, 28, 1-16.
Snodgrass, J. G. (1989). Sources of learning in the picture fragment completion task. In S. Lewandowsky, J. C. Dunn,
& K. Kirsner (Eds.), Implicit memory: Theoretical issues (pp. 259-282). Hillsdale, NJ: Erlbaum.
Tekman, H. G., & Bharucha, J. J. (1992). Time course of chords priming. Perception and Psychophysics, 51, 33-39.
White, B. W. (1960). Recognition of distorted melodies. American Journal of Psychology, 73.100-107.
Back to index

Minami
Proceedings paper
CROSS-CULTURAL COMPARISONS IN THE MOVEMENT OF CLAPPING IN TIME

Yoko Minami, Department of Music Education, Aichi University of Education
Tadahiro Murao, Department of Music Education, Aichi University of Education
PURPOSE OF THIS STUDY
This is a comparative study on the rhythmic behaviors of English native speakers and Japanese native speakers. Additionally, it
is an experimental one to clarify the differences between the two from the statistical viewpoint. The musicality peculiar to
Japanese has been often proposed generally in comparison with that of western people. In fact, a "Japanese-style" especially in
the rhythmic behaviors seems to be recognizable in daily life. Giving a familiar example, in "Who is the Best Singer?" which is
a long-running NHK-TV program on the air every Sunday afternoon, it is observed that not only the old audiences but also the
young ones clap their hands to music so to speak in a "Japanese-style". Thus, one can acknowledge the features of the
"Japanese-style" visually, however, it is hard to describe it in a written form.
In Japan, over the last few decades a great deal of discussion has been made on the rhythmic musicality peculiar to Japanese.
Turning now to the previous studies, in traditional Japanese arts, the bouncing movement, that is to say, an up and down
beating motion with accelerated-decelerated velocity has been regarded as abominable, or distasteful (Murao 1989). Take the
case of the Noh dance, for example, performers are supposed to keep their head position at the same height (non-bouncing),
which forms a striking contrast with dance in Western countries. In music, too, the beat of traditional art music does not evoke
the feeling of bouncing. Japanese philosophers and musicologists have described this quality of Japanese beat in such a way
that the Japanese beat in music is not the on-going succession of beating but sets of beating-hold, or striking-stop movements
(Nakai, Fujita, Ogura, Yamada 1989). Accordingly, we do not have the concept of â€˜up and down') beats nor â€˜strong and
weak') beats in traditional music. What we have is the similar but different concept of â€˜front and back') beats (Koizumi
1984), which is named â€˜landing rhythm') (Ogura) or â€˜tago beat') (Murao). In this way, many researchers have been
involved with the problem of the Japanese rhythmic style. What seems lacking, however, is that few studies provide statistical
data to determine the rhythmic peculiarities of Japanese. So, the present study approaches this issue based on the experimental
results.
In order to compare the features of the rhythmic behaviors between English native speakers and Japanese native speakers,
clapping in time was employed as an experimental task because it is a common cross-cultural rhythmic behavior and requires
no great effort to perform. An apparatus, which is produced by Laboratory of Image Information Science and Technology in
Osaka, was used for sampling data.
EXPERIMENTAL METHODS
Subjects
Six English native speakers and six Japanese, who have no special music education, participated in this study. Japanese
participants were undergraduate students of Aichi University of Education (Mean age=21, two females, 4 males). English
native speakers were English teachers of a language school in Nagoya (Mean age=32, two females, 4 males). Two of them
were from UK, another two were from USA, and the rest were from Canada and from Australia. The subjects in each group
were assembled separately in Music Hall at Aichi University of Education. All of them were willing to carry out the
experimental task.
Chanting Materials
Subjects were asked to clap chanting a children')s rhyme according to the following ideas; firstly, the act of chanting raises the
rhythmic activity of each subject; secondly, clapping in time with chanting is easier to synchronize to metronome sounds than
it is without chanting; thirdly, the potential differences of the movement of clapping between the two groups can be overt with
the help of their own native languages, an idea which is based on the general thought that the rhythmic feelings are deeply
related to words.
A Japanese rhyme, "Lunch Box"(a total of 24 bars) for the Japanese group and an English rhyme," One Potato"(a total of 8
bars) for the English one were selected for reasons that they are both in simple duple time and very common in each culture. "
One Potato" was repeated three times so as to have the same duration as "Lunch Box". Subjects clapped at every downbeat
looking at the written verses, which are shown below. The accent marks, which were not presented to performers, are
coincident with the timing of clapping.
file:///g|/poster3/Minami.htm (1 of 6) [18/07/2000 00:38:50]

Minami
By the way, "Lunch Box" was chosen for a chanting material for the Japanese subjects for good reason "Lunch Box" is not a
traditional rhyme but a relatively new one, which had spread among the postwar generation. Nowadays, it is no exaggeration to
say that all Japanese children learn it in kindergarten or nursery school. In general, it is often said that the rhythmic feelings of
the Japanese young generation quite differ from that of the old one. So, this study focuses on the clapping behaviors of the
younger generation as Japanese subjects. That is an important reason why "Lunch Box" was selected.
English rhyme, "One Potato"
Óne Potato Twó Potatoes Thrée Potatoes Fóur,
Fíve Potatoes Síx Potatoes Séven Potatoes Móre,
Japanese rhyme, "Lunch Box"
Korékkurainó Obéntobakoní, Ónigiriónigiri Chóttotsumeté,
Kízamishóugani Gómajiopáppa, Nínjinsánn, Góbosán
Ánanoáita Rénkonsán, Sújinotóutta Fúkí
Apparatuses
A clipboard for holding a paper printed with the rhyme, a metronome and a videotape recorder were set in front of subjects.
Each of the subjects was equipped with a motor sensor "ATOM 8" on their right wrist (see Photo 1 and 2), which detects speed
of rotary motion. Each performance was taped in digital videocassette and simultaneously digitized by computer (Macintosh
PowerBook G3) through ATOM 8. The data were recorded every 0.02 sec.
Photo 1 Photo 2
"Open" "Hold"
Procedures
First of all, each subject was asked whether he was familiar with the prepared rhyme. Everyone said, "Yes" and then he was
orientated towards the experiment phases. They were requested to equip with "ATOM 8" on their right wrist and to clap at the
downbeat chanting the rhyme individually. The common tempo, M.M=108, which seemed to be somewhat slow for both
groups, was adopted with the reasons that it is rhythmically acceptable and a slow tempo can make the differences of clapping
behaviors between the two groups clear. The experimental procedures consisted of two phases, that is, the training phase and
the testing one. In the former, each subject trained to chant the rhyme and to clap synchronizing to metronome sounds. After
two or three trials they entered into the testing phase. They performed the task three times. All the participants accomplished
their task successfully.
DATA PROCESSING
The last trial of each subject was adopted as a representative of one')s trials. Moreover, the six claps both from the front and
from the rear were excluded from further analysis due to the reason that the conditions of the performance were sometimes
unstable at the beginning and at the end of trials. So, twelve claps in the middle part of the last trial of each subject were
analyzed respectively.
A cycle of clapping motion was divided into three parts, that is, "Close", "Hold", and "Open". In addition, a state of "Stop" in
"Hold" was taken into account in data processing. Table 1, which is a fragment of digitized raw data of a subject from the
English group, shows a progression of a cycle of clapping motion. The number of Rotary Motion (R.M) means the direction
and the speed of the rotary motion of ATOM 8; if ATOM 8 turns clockwise, R.M becomes a positive number, and if ATOM 8
turns anticlockwise, R.M becomes a negative number. Furthermore, the absolute value of R.M correlates with the speed at

Minami
which ATOM 8 moves; for example, at Frame No. 1103 to 1105, the subject turned his right hand anticlockwise at top speed
and clockwise also at top speed at Frame No. 1138 to 1140. On the other hand,
at Frame No. 1111 to 1122, he nearly stopped his right hand moving. Fig.1 Frame Time R.M Frame Time R.M
illustrates two cycles of clapping motion of another subject from the English group 1090 10:28:22 5 1119 10:28:23 1
in a visual manner.
1091 10:28:22 5 1120 10:28:23 1
1092 10:28:22 5 1121 10:28:23 -3
1093 10:28:22 -12 1122 10:28:23 -3
1094 10:28:22 -12 1123 10:28:23 -9
1095 10:28:22 -26 1124 10:28:23 -9
1096 10:28:22 -26 1125 10:28:23 -9
1097 10:28:22 -26 1126 10:28:23 -4
1098 10:28:22 -41 1127 10:28:23 -4
1099 10:28:22 -41 1128 10:28:23 14
1100 10:28:22 -41 1129 10:28:23 14
1101 10:28:22 -53 1130 10:28:23 14
ã€€ 10:28:22 -53 1131 10:28:23 28

?/4'102
1103 10:28:22 -54 1132 10:28:23 28
1104 10:28:22 -54 1133 10:28:23 39
1105 10:28:22 -54 1134 10:28:23 39

The number of frames of "Close", "Hold", and "Open" in a cycle of clapping
1106 10:28:22 -44 1135 10:28:23 39
motion were measured individually. The number of frames of "Stop" in "Hold" was
also measured. Before the measurement the criteria for deciding the beginning of 1107 10:28:22 -44 1136 10:28:23 47
each part were determined as follows; 1) when R.M becomes equal to or lower than
1108 10:28:22 6 1137 10:28:23 47
zero, "Close" motion begins, 2) when R.M becomes equal to or higher than ?10,
"Hold" motion does, 3) when R.M becomes equal to or higher than 10, "Open" 1109 10:28:22 6 1138 10:28:23 50
motion begins, 4) the number of frames of "Stop" motion is sum of 1110 10:28:22 6 1139 10:28:23 50
-3â‰¦R.Mâ‰¦3. After the measurement, the mean numbers of frames and the ratios
of "Stop" to "Hold" of each subject were calculated. 1111 10:28:22 1 1140 10:28:23 50
1112 10:28:22 1 1141 10:28:23 45
1113 10:28:22 3 1142 10:28:23 45
1114 10:28:22 3 1143 10:28:23 34
1115 10:28:22 3 1144 10:28:23 34
1116 10:28:23 5 1145 10:28:23 34
1117 10:28:23 5 1146 10:28:23 19
1118 10:28:23 1 1147 10:28:23 19
RESULTS

Minami
A two-way ANOVA was computed on the mean number of frames, with subject group (English vs. Japanese) as
between-subjects factors and the motion types ("Close", "Hold", "Open") as the within-subjects factor. The results showed
significant main effects for subject group (F (2,30)=19.41, p<.01, see. Fig.2) and motion type (F (2,30)=42.16, p<. 01.).
Examination of the mean number of frames shows that: (a) in the Japanese group, the number of frames were significantly
varied according to the motion types (F (2,15)=96.38, p<. 01), while there were no significant differences among the motion
types in the English one , and (b) "Hold" motion of the Japanese group was significantly longer than that of the English one (F
(1,10)=15.43, p<. 01), while "Close" motion of the English group was significantly longer than that of the Japanese one (F
(1,10)=18.89, p<.01). Furthermore, comparing the ratios of "Stop" to "Hold" between the two groups, it was significantly
higher in the Japanese group than in the English one (t (10)=1.96, p<. 05). By the way, the number of frames of each motion
shows how long the motion went on. So, the results of this study are summarized as follows;
1) In the English group the duration of a cycle of clapping motion was evenly distributed among the three motions, whereas in
the Japanese group a half time of the duration was assigned to "Hold" motion and the remaining half time was assigned to
"Close" and to " Open" respectively.
2) In the English group "Open" motion begins before upbeat, whereas in the Japanese group it starts synchronizing to upbeat.
3) "Close" motion in the Japanese group is quicker than that in the English one.
4) "Hold" motion in the Japanese group tends to be in a state of "Stop" in comparison with that in the English one.
DISCUSSIONS
The results of this experiment clearly show that there are significant differences of the clapping behaviors between English
native speakers and Japanese. The main points of the differences can be explained by how they allot time among "Close",
"Hold", and "Open" in a cycle of clapping motion. Fig. 3 and 4 illustrate the time allocation of the two groups respectively. The
circle goes clockwise. The moment of downbeat is at the top of the circle and at the bottom is that of upbeat. "Hold" motion
starts at the moment of downbeat.
The allocation of time among the three parts, that is, "Hold", "Open" and "Close" in a cycle of the clapping motion is, in fact,
the key to an understanding of the rhythmic body movement. In order to attempt to discuss the entire question of this study, to
compare the following four points between the two groups seems to be helpful; 1) the temporal relationship between upbeat
and the beginning of "Open", 2) the allocation of time for "Hold" motion, 3) the allocation of time for a sequence of "Open"-
"Close", and 4) the treatment of upbeat.
See Figure 3 and 4 again. In the English group "Open" motion starts preceded to upbeat, whereas in the Japanese group it starts
coincidentally with upbeat. In other words, at the moment when upbeat comes English native speakers are already in the
middle of "Open" motion while Japanese eventually begin to open their hands. In order to explain why the difference is caused,
the second point, that is, the differences of the allocation of time for "Hold" motion between the two groups should be

Minami
emphasized here. In the English group, the duration of "Hold" is featureless nevertheless it starts with downbeat. On the
contrary, in the Japanese group, the duration of "Hold" seems to constitute a characteristic feature of their clapping style, that is
to say, they are keeping their hands together during downbeat. The reason why "Hold" motion in the English group does not
continue as long as that in the Japanese group is explained simply because English native speakers') hands bounce up at the
moment of clapping. The converse holds for the Japanese group. When Japanese clap, their hands do not bounce but they
fasten firmly so that "Hold" motion lasts during downbeat. Thus, the fact suggests that the rhythmic movement at downbeat is
varied according to the cultural background, especially to the language. Although it is beyond the scope of this paper to argue
the language matter, it is possible to say that the different sound structure of the English language and the Japanese language;
that is to say, of a stress accent language and a tonal accent one causes "Hold" motion to vary. Referring to the duration of
"Hold" motion, it must be also said that "Hold" motion in the Japanese group involves more time of "Stop" in comparison with
in the English one. Briefly stated, while "Hold" motion of English native speakers can be described as "bouncing", the
Japanese one can be described as "enclosed" or "fastened".
Nevertheless, it should not be simply understood that Japanese clapping behavior is static and still in comparison with that of
English native speakers. In fact, the Japanese subjects did not give any impression of being static or still to us in the
experimental phases, the characteristics of the "Japanese-style" are observed though. Far from being static or still, as the results
show, the motions of "Open" and "Close" in the Japanese group are quick and speedy. Especially the "Close" motion is
significantly quicker than that of the English group.
Here it is necessary to look carefully into the third point, that is, the allocation of time for a sequence of "Open"- "Close". The
total time of " Open" - "Clap" can be interpreted as the time in which performers prepare for the next downbeat. As Fig. 3 and
4 show, the total time of "Open" -"Close" in the English group is fairly longer than that in the Japanese one. This gap is
inevitably caused by the different characteristics of "Hold" motion between the two groups, namely, whether bouncing or
enclosed. For this reason, English native speakers have "surplus" time enough to prepare for the next downbeat whereas
Japanese use exactly only a half time of a cycle of the clapping motion. We want to emphasize that during "Open"-" Close"
motions performers') hands are unrestrained and move in space waiting for the coming downbeat. So, the longer the total
duration of "Open"-" Close" is, the more the preparatory motor action in space for downbeat is exaggerated. For this reason,
the clapping behavior of English native speakers is characterized by the distinguishable preparatory motor action, therefore, it
gives us a visual impression of something "continuous" and "circulated".
On the other hand, since Japanese do not have enough time till downbeat comes, their preparatory motor action for downbeat is
quick and speedy. It is a dynamic movement, nevertheless it does not cause so distinctive an impression as that of English
native speakers. The Japanese clapping style may be characterized mainly by the contrast between an "enclosed" or a
"fastened" state of "Hold" motion and the quick motions of "Open" and "Close".
Finally, what is most important of all is the treatment of upbeat. It would be crucial to determine the stylistic differences of the
rhythmic movement of the two groups because the movement of upbeat is not a simple motor response but it is caused from
one')s inner psychomotor process. In the English group the time from the beginning of "Open" to the moment of upbeat, which
is the "surplus" time caused by the bouncing movement of clapping, is assigned to prepare for upbeat. At that time, their hands
are moving in space. In contrast, Japanese wait for it while keeping their hands together. So, the preparatory movement for
upbeat is extremely different. It is helpful to discuss in the light of psychomotor point of view. English native speakers seem to
have much freedom of time and space in preparing upbeat because, as it were, it is apparent to them how to move their hands in
space. Indeed, the shapes of the movement of their hands in space during "Open" and "Close" were more rich in variety than
the Japanese ones. From the psychomotor viewpoint, however, the freedom of the movement sometimes causes a sense of
"instability" or "uncertainty". It is presumed that English native speakers need to be aware of the coming upbeat so as they can
control their drifting hands. On the other hand, Japanese are not necessarily as aware of upbeat as English native speakers
because their enclosed hands rather automatically open at the "saturation point "of downbeat, that is, at the beginning of
upbeat.
In conclusion, the following accounts are derived from the above discussions;
The differences in the clapping style between English native speakers and Japanese are due to the characteristics of downbeat,
that is to say, "bouncing" or " enclosed". After bouncing, in the case of English native speakers, they prepare for the coming
downbeat moving their hands in space. By this preparatory motor action in space, the clapping style of English native speakers
is characterized as "continuous" or "circulated". In the case of Japanese, they enclose their hands up to the "saturation point "of
downbeat. Inevitably, preparatory motor action in space for downbeat becomes quick and dynamic. Thus, the Japanese
clapping style is characterized by the contrast between the "enclosed" motion and the "dynamic" one. Furthermore, it is
supposed that English native speakers need to be aware of upbeat so as they can control their drifting hands. On the contrary,
since Japanese hands, which fasten firmly, automatically open when upbeat comes, Japanese need not be conscious of upbeat.
Besides, this statistical result seems to be matched with, so to speak, a Japanese musical behavior, "Momide", that is, in brief,
after clapping at downbeat, one waits for the next downbeat rubbing one')s hands. It can be seen not only in musical context.
For instance, when especially old people pray at Shinto')s shrine, they clap two times and then bow two times rubbing their

Minami
hands together. Although there is no space for an extended discussion on "Momide", it is interesting that there is a point of
contact between our statistical results and a cultural phenomenon.
So far, the main stress for comparisons between the two groups has fallen on the allocation of time among the three motions,
"Close", "Hold", and "Open" in a cycle of clapping motion. However, another approach, for example, the allocation of energy
among them still remains to be examined. And then, an unanticipated new problem whether or how the allocation of time in a
cycle of clapping motion has any relation with musical time is surfacing. As Fig.3 and 4 show, the clapping motion of the
English group seems to be something in triple time from the standpoint of the movement. On the contrary, the Japanese one,
which is definitely divided into two sections, seems to be "the ultimate" in simple duple time. That should be discussed in the
near future. Furthermore, it would be interesting to involve participants from other cultures in our experiment. Anyway, the
future direction of this study will be one that encompasses a wide range of cross-cultural issues regarding rhythmic body
movement.
REFERENCES
Fujita, T. (1976). The Rhythm of Japanese Music. Hutou Press. Tokyo
Koizumi, F. (1984). A Study of Japanese Traditional Music No. 2: Rhythm.
Ongakunotomo Press. Tokyo.
Murao, T. (1988). Discovery of the "Tago Beat". Japanese Journal of Research in Music
Education. Ongakunotomo Press.
Murao, T. (1989). Bouncing Evoked by Controlled Cognition of Rhythmic Grouping.
Canadian Journal of Research in Music Education. Vol.30, May. 106-114
Nakai, S. (1975). An Introduction to Aesthetics. Asahi Press. Tokyo
Ogura, R. (1977). Musical Ear of the Japanese. Iwanami Press. Tokyo.
Yamada, T. (1989). Music Style and its Learning in Rhythmic Movement. Japanese
Journal of Research in Music Education. Japan Academic Society for Music
Education. Vol.19, No.1. 66-68
KEY WORDS: clapping behavior, cross-cultural, rhythmic movement, beating time
Financial Assistance
The Monbusho Research Fund (B) (2) funded the project that this study was conducted.
Back to index

The perception of dotted rhythms and the kerning illusion
Procedings paper
THE PERCEPTION OF DOTTED RHYTHMS AND THE KERNING ILLUSION

Emery Schubert
and
University of New South Wales
Emery Schubert
E.Schubert@unsw.edu.au
Phone: +61-2-9385-6808
Fax: +61-2-9313-7326
Background
Musicologists have addressed the issue of dotting ratios in performance practice by citing the commentary of contemporary writers such as
Couperin (1716), Quantz (c.1752) and C. P. E. Bach (1753), and by relying on their own experience, knowledge and intuitions (eg.,
Donington, 1989; Neumann, 1982, 1993). However, no performance practice researchers have attempted to determine the issue of the
perception of dotting empirically. Dotting ratio refers to the performance of a long note followed by a short note. For example, a mechanical
performance of a dotted quaver (dotted quarter note) followed by a semiquaver (sixteenth note) lasting for 1 beat is divided into the relative
temporal duration 0.75 of a beat and 0.25 of a beat respectively. In practice, however, this ratio is rarely found. The second authors' own
findings using digital sampling and analysis techniques has shown that across a sample of 30 different performances of the opening of
'Variation 7' of J. S. Bach's Goldberg Variations, dotting ratios were consistently greater than 0.81 (Fabian, 1998).
Some ambiguity in the actual measured ratios and the perceived dottedness was also noticed. Some performances sounded more dotted when
the duration of the third note in the group was shortened (Figure 1d, described in more detail later). The finding pointed to a possible illusion
which had not yet been discovered. Consequently, we designed an experiment to test whether there was an illusion of dottedness when the
third note was shortened, or 'kerned'. The methodology was based on an experimental psychology approach. Since this is not an approach
used in traditional musicology, the experiment was also designed to address the issue of the suitablity of the methodology in addressing
performance practice issues.
Figure 1. Transcription of four hypothetical performances of the right hand part of first two bars of 'Variation 7' from J. S. Bach's Goldberg Variations.
KO - no kerning
K1 - kerning of first note in group

K2 - kerning of third note in group
D1 - dotting ratio of 0.75 ('mechanical')
D2 - dotting ratio of 0.875 ('overdotted')
file:///g|/poster3/Schubert.htm (1 of 5) [18/07/2000 00:38:53]

IOI - 'Inter Onset Interval'
Kerning
The temporal gap between a dotted note and the short note which follows has been documented in the literature and is a known performance
technique. It is referred to as silence d'articulation (noun) after Quantz (Quantz 1752, Dolmetsch 1949). However, its use in the third note of
a group of three in 6/8 has not received attention in scholarly writing despite its prominence in the recordings analysed. To investigate the
effect of shortening the first or third note of the group of three notes in 6/8 metre, we defined the quantification of this auditory gap as the
'kerning' (verb) of the note. Mathematically:
Kerning = IOI - Duration
Where IOI is the interonset interval between the first and second note, and Duration is the length of time for which the first note is sounded.
Two examples of kerning are shown in Figure 1c and d.
Aim
In this paper we report the findings of a study in which participants made judgements about the perceived 'dottedness' of a 6/8 musical
pattern. In particular we investigated whether the participants were susceptible to the hypothesised 'kerning illusion', where the dotted pattern
in 6/8 is perceived as being more dotted if the third note (quaver) is kerned (eg., see Figure 1d) .
Method
The dotting ratio and kerning was manipulated in the context of a 6/8 metre, based on the opening of 'Variation 7', meaning that the metric
unit consists of an additional quaver as a third member of the dotted quaver-semiquaver pattern. Given the range of tempi found in real
performances, the tempo of the test stimuli was also manipulated to encompass the extreme tempos found in the sample initially analysed.
The design of the experiment was 2 x 3 x 2 and consisted of the manipulation of a MIDI sequence recorded from a student performance as
shown in Table 1.
Table 1. Independent Variable Manipulation
Independent Variable Levels Code Example in

Figure 1
Dotting ratio of 0.75 (mechanical) D1 a
ratio of 0.875 (overdotted) D2 b
Kerning no kerning K0 a
First note kerned K1 c
Third note kerned K2 d
Tempo Slow (crotchet = 80 bpm) 80
Fast (crotchet = 106 bpm) 106
Participants
40 people took part in the study. Most were music students at the University of New South Wales while others were friends and colleagues
of the authors. All were reasonably or very experienced listeners of Baroque music. Participation was voluntary.
Stimuli
Stimuli were produced by a harpsichord major music student performance on a Roland JV-35 touch sensitive keyboard recorded as a MIDI
sequence (using Cubase VST 4.1 software). The MIDI file was manipulated to produce the 12 test stimuli described in Table 1. These
sequences were converted to QuickTime sound files using the Harpsichord 1 sound on the Roland JV-35 General MIDI sound module.
Sound Files were presented to the participants and controlled by QMaker software (Schubert, 1999). Two additional 'real' recordings of the
same passage were converted to sound files: One by Ralph Kirkpatrick (1959) and the other by Gustav Leonhardt (1965)
Procedure
Each participant sat at a Macintosh Computer (PowerMac 8500) and completed a preliminary questionnaire. They received training

examples and a 'Quiz' to ensure that dotting was understood. The examples used for training were D1K0 compared with D2K0 (refer to
Table 1 for stimulus codes). In one case the pair was presented at the slow tempo (80) and in another case the pair was presented at the fast
tempo (106).
This was followed by the main experiment. Each of the 14 stimuli were presented in random order, twice. Participants were encouraged to
respond on first impressions. Each example could be played twice (maximum). All examples were presented over headphones. Three
responses were made to each example on a 9 point, bipolar scale. This report focuses only on the 'dottedness' response which was rated from
'very overdotted' (+4) to 'very underdotted' (-4)
Results
Repeated measures t-tests demonstrated no significant difference between first and second presentation for each stimulus (alpha = 0.05, df =
39). The data were then analysed in two ways, once with responses collapsed by dotting, tempo and kerning levels, and again by comparing
responses across each of the 14 examples (initial data retained, not averaged).
Analysis of variance produced significant differences along each of the three independent variables investigated with no significant two of
three way interactions (alpha = 0.05). Post-hoc Tukey HSD tests were performed to determine how the independent variables affected
dotting response.
Overdotted examples were rated as being more dotted than mechanical performance. This unsurprising finding provides support for the
validity of the experimental design. A more interesting result is that K2 examples (third note shortened) were rated as the most dotted
relative to K1 (first note shortened) and K0 (neither note shortened), supporting the authors' thesis that there exists an illusion of dottedness.
Finally, faster tempo examples were rated as more dotted.
A second analysis consisted of a Friedman K-Related samples test on difference in perceived dotting piece by piece. Again there was a
significant overall difference (at alpha = 0.5, Chi-Square = 40, df = 13). The significant contrasts for dottedness ratings are shown in Table 2.
All D2 (overdotted) examples received significantly higher mean ratings and were rated as significantly more dotted than the D1
(mechanical) examples. Third note kerned examples (K2) all received positive mean dottedness ratings, and were rated as more dotted when
all other variables were held constant. For example K2D1106 has a higher mean dottedness rating than its K0 and K1 equivalents. While
none of these K2 examples on their own were rated as significantly more dotted than their K0 and K1 counterparts, the grouped effect of the
first analysis points to an important trend.
Finally, the mixing of the real recordings (by Kirkpatrick and Leonhardt) among dottedness responses further supports the validity of the
design and the selection of dotting ratios because they were positioned near the expected location with respect to the other examples.
Discussion and Conclusions

This study points to the possible existence of an illusion which occurs in 6/8 dotted patterns. When the third note of the first dotted crotchet
beat is truncated or kerned, the listener perceives the group as sounding more dotted. The illusion could be due to the brevity of the third note
itself making the beginning of the next group sound delayed and hence more dotted. Alternatively, there may be some kind of high level,
backward temporal masking which interferes with the perception of the first note. Neither of these explanations provide a mechanistic,
functional nor structural explanation of the illusion. If the kerning illusion continues to be replicated, an associated challenge is to find a
tenable explanation of it.
Further research is required to demonstrate the reproducibility of the result. For example, it needs to be determined whether this effect is an
illusion or an ambiguous figure. One issue that requires addressing is whether the response of musically less sophisticated participants is the
same as those represented by the sample.
The expected perception of dotted stimuli, and the generalised responses to the 'real' stimuli support the validity and reliability of the
reductionist, controlled approach we chose in investigating this performance practice issue. Further work is now required to see how these
findings can be used to inform duple and quadruple metre performance practice theory. Also, the major issue of performance style needs to
be investigated, since an important aim of studying performance practice is to identify the underlying principals which determine a
stylistically appropriate performance (Fabian, 1998; Fabian Somorjay & Schubert, this volume). By the same token, the perception of dotting
of non-Baroque musical contexts, such as marches, dances and popular music, also provides a rich area for further research.
The discovery of the kerning illusion, if it is indeed an illusion. has important implications for musicology. First, it demonstrates that dotting
theory proposed in duple and quadruple metre cannot be generalised to compound metres. No past theory predicted anything like the kind of
dotting response we noticed (and supported experimentally) using the 6/8 pattern of 'Variation 7' of the Goldberg Variations. Second, the
perceptual approach to investigating musicological questions can help to inform and, we argue, drive musicological theory. Specifically, we
posit that perceptual, reductionist experimental designs provide an appropriate methodology from which musicological study, such as the
present case of performance practice issues, can be greatly enriched. It might be argued that the present study is far removed from music
because of the brevity and the highly controlled manipulation of the stimuli. While the criticism may be true, it is also true that the rather
individualistic approaches seen in traditional musicology lack the methodological rigour of the present approach. Indeed, the present
experiment aimed to address a question that arose from a thorough musicological investigation (i.e. the study of longer/complete and not at
all controlled stimuli). Further, we stand by the view that many kinds of methodologies should be embraced in developing deeper insights
into musicological issues. We argue that interdisciplinary methodologies should be used to inform one another, with the aim of finding
convergent evidence and substantial theoretical frameworks.
Acknowledgement
The authors are grateful to Kate Stevens and members of the Australian Music Psychology Society (AMPS) for their comments on an earlier
draft of this work.
References
Bach, C. P. E. (1753, Facsimile: 1969). Versuch über die wahre Art das Clavier zu spielen Leipzig: Hoffmann-Ebrecht
Couperin, F. (c.1716, Facs.: 1977). L'Art de toucher le clavecin Leipzig: Breitkopf und Härtel
Dolmetsch A. (1949). The interpretation of the music of the 17-18th centuries. London: Novello (1st Ed. 1915)
Donington R. (1989). Interpretation of early music. (1st Ed: 1963)
Fabian, D. (1998). J. S. Bach recordings 1945-1975: The Passions, Brandenburg Concertos and Goldberg Variations - A study
of performance practice in the context of the early music movement. Unpublished Doctoral Dissertation, University of New
South Wales, Sydney
Neumann, F. (1982). Essays in performance practice. Ann Arbour: UMI RP
Neumann, F. (1993). Performance practices of the 17-18th centuries New York: Schirmer
Quantz, Johann Joachim (1752). On playing the flute (Eng. trans. Reilly, E. R.) New York: Schirmer
Schubert, E. (1999). QMaker 1.02 [computer software]. Sydney, Australia. Author.
Sound Recordings
Kirkpatrick, R. (1959). CD re-issue1994 DGG Classikon 439 465-2
Leonhardt, L. (1965). CD re-issue 1995 Teldec DAW 4509-97994-2

Back to index

Dynamic Attending in Detection of Temporal Variability
Proceedings paper

Hasan Gürkan Tekman, Middle East Technical University
The dynamic attending theory (Jones, 1976; Jones & Boltz, 1989) is an approach to attention which puts emphasis on the
temporal aspect of the deployment of attention. According to this approach temporal structure of events entrain
oscillations in attention. These waves of attention have periods similar to the periods by which important stimuli appear
in the environment. Such important stimuli have usually been called accents. The accent structure of events is
hierarchical which has a correspondence in the hierarchical nature of the attentional entrainments. There are lower level
accents and lower level attentional periods within short time spans and also higher level accents and higher level
attentional periods with longer time spans.
There are two basic predictions from the theory of dynamic attending. First, events with regular accent structures should
yield better performance in perceptual and memory tasks because they provide an opportunity to direct attention to
important points in time (Boltz & Jones, 1986; Jones, Summerell, & Marshburn, 1987). Second, stimuli that correspond
with the high points of the waves of attention, that is, events that are congruent with the accent structure, should yield
better performance in perception and memory tasks compared to events that do not match the attentional periodicities
(Jones, Boltz, & Kidd, 1982).
In one experiment that I have executed the first one of these predictions was not supported (Tekman, 1996). The task was
a signal detection task that required listeners to detect temporal variations in sequences of tones. Some of the interonset
intervals (IOIs) in the sequence could be longer or shorter than the standard. A second variable was presence of intensity
accents. In some cases all tones in a sequence had equal intensity and in some cases some of the tones had higher
intensity. When higher intensity tones and deviant time intervals were present in the same sequence the higher intensity
tones always followed the deviant time intervals. Regularity of the temporal variations was the third independent variable
in this experiment. In some cases the deviant intervals were placed at equal time intervals and in some cases they were
placed randomly.
Although the primary interest in this experiment was on the effect of intensity accents on the detection of temporal
deviations the effect of the regularity manipulation was quite unexpected. Regular placement of deviant intervals resulted
in poorer discrimination than random placement. Furthermore, this difference increased by the introduction of the
intensity accents. This was contradictory with what one would expect according to the theory of dynamic attending.
A peculiarity of the stimuli in this experiment could possibly have caused the unexpected results. In this experiment there
was no compensation for the deviant time intervals. That is, if a sequence contained some longer IOIs it lasted longer and
if it contained some shorter IOIs it lasted shorter. Moreover, in the sequences with random placement of deviant intervals
longer or shorter intervals could follow each other in close succession, which was not possible with regular placement of
deviant intervals. In such cases part of the sequence would have deviated from the standard tempo of the sequences. That
is, a succession of longer IOIs would make the sequence locally slower and a succession of shorter IOIs would make the
sequence locally faster. Listeners could have made use of these changes in local tempo and found it easier to detect
temporal deviations in random rather than regular sequences.
One could test whether the unexpected results were due to such an artifact in a straightforward manner. If one
compensates for the deviant IOIs by subtracting or adding the corresponding time difference to the following IOI, then
presence of longer or shorter time IOIs (which would mean the same thing in this case) would not result in a change in
the tempo of the sequence.
Method
Participants
A total of 62 undergraduates of the Middle East Technical University took part in the two experiments reported.
Participants received extra credit in introductory psychology courses they were enrolled in. Thirty-two of the participants
took part in Experiment 1 and the remaining thirty-two took part in Experiment 2.
Stimuli

Participants listened to pure tone sequences. The standard unaccented version of the sequences consisted of twelve pure
tones of fixed frequency. Each tone had a duration of 210 ms. This duration contained a 10 ms rise time and a 10 ms
decay time. The tones were separated by silent intervals of 50 ms, resulting in IOIs of 260 ms. Pitches of the tones for a
sequence was selected randomly from among the chromatic pitches within a one octave range. Accented versions of the
sequences were prepared by increasing the intensity of four of the twelve tones by 4.5 dB. In the regular version of the
accented sequences two consecutive tones with higher intensity always had two tones with the standard intensity
between them. In the random version of the accented sequences the four tones in a sequence were selected randomly to
be the accented tones, with the restriction that they would not make up a triple rhythm.
The signal detection task also required sequences in which the IOIs were unequal. In these sequences four of the tones
came either slightly late or slightly early. In Experiment 1, in the case of positive deviations these deviant tones followed
the preceding tone with an IOI of 285 ms and the consecutive tone followed it by an IOI of 235 ms. Thus only the
deviant tone came late by 25 ms. In the case of negative deviations the deviant tone followed the preceding tone by an
IOI of 235 ms and the consecutive tone followed it by an IOI of 285 ms. Thus the tone came early by 25 ms. In the
accented sequences, the temporally deviant tones also had higher intensity. Both the accented and the unaccented
sequences had regular and random versions depending on the distribution of the deviant tones. In the regular sequences
each consecutive pair of deviant tones were separated by two tones with standard timing. In the random sequences four
tones were selected randomly to be the deviant intervals, with the restriction that two deviant tones would not follow
each other.
In Experiment 2, in the accented sequences the tones that followed the temporally deviant tones, rather than the
temporally deviant tones themselves, had higher intensity. Since positive deviations meant longer IOI preceding an
intensity accent, if it was present, a longer (285 ms) IOI was followed by a shorter (235 ms) IOI in sequences with
positive deviations. The shorter IOI was terminated by a higher intensity tone in the accented sequences. In the
sequences with negative deviations, on the other hand, a shorter IOI was followed by a longer IOI. The creation of the
regular and random versions of the sequences was identical to Experiment 1.
Apparatus
Participants were placed in a sound attenuated booth during the experiment. Creation and presentation of the stimuli, and
recording of responses were controlled by an IBM compatible computer equipped with a Creative SoundBlaster 16
Value sound card. Participants heard the stimuli through a Technics SU-V300 amplifier and Telephonics TDH-39P
earphones.
Procedure
Each participant took part in eight experimental signal detection sessions. The experimental sessions were preceded by
two practice sessions. Each session consisted of 72 trials. In each trial one sequence was heard. In 36 trials of a session
the duration of all silent intervals in the sequence were the same. In the remaining 36 trials the sequence contained
deviant intervals. Sequences with and without deviations in a session were ordered randomly. The task of the participants
was to determine whether the duration of all the time intervals separating consecutive tones were the same or not for
each sequence. The participant initiated the trials by pressing any one of the keys on the computer keyboard. Participants
responded by pressing one of two keys assigned to the "Same" and "Different" responses, respectively. Feedback about
the accuracy was given visually on the computer monitor after each response.
The eight experimental sessions were created by factorial combinations of three variables: (1) The sequences were either
unaccented or accented. (2) In case of unequal IOIs, the deviations were either positive or negative. (3) The deviant
intervals were arranged regularly or randomly. In each experimental session the sequences with equal IOIs and unequal
IOIs were similar in all other respects. The order of the eight experimental sessions was changed from participant to
participant according to a Latin square.
Each one of the two practice sessions also consisted of 72 trials. One practice session included unaccented sequences
only and the other included accented sequences only. All variations of these two types of sequences were sampled with
equal probability in the two respective practice sessions. The number of sequences with equal intervals and the number
of sequences with unequal intervals were equal in each practice session.
Results
Sensitivity was calculated for the eight conditions in Experiments 1 and 2 separately. Average sensitivities are given in
Figure 1. In both experiments the main effect of regularity was not significant [F(1, 31) < .01 for Experiment 1 and F(1,
31) = 2.73 in Experiment 2]. Thus, the alterations in the stimuli did not create an effect in favor of regularly distributed

deviation but it eliminated the effect in favor of randomly distributed deviations, which was observed in the earlier
experiment. The main effects, or lack thereof, had be considered in the light of a possible interaction of the effects of
regularity and accenting. In both experiments sensitivity was higher for regular than for random deviations with the
accented sequences but the difference was in the opposite direction with the unaccented sequences. This interaction
approached but fell short of significance in both experiments [F(1, 31) = 2.9, p = .098 for Experiment 1 and F(1, 31) =
3.88, p = .058 for Experiment 2].
Because of the similarity of the methods and stimuli the data from both experiments were analyzed in a single ANOVA.
In this analysis the only effect involving experiment that reached significance was the experiment by direction of
deviation interaction [F(1, 62 = 14.25, MSe = 0.48, p < .001]. The difficulty of the positive and negative deviations was
reversed across experiments. This is not surprising considering the sequences with positive deviations in Experiment 1
were identical to the sequences with negative deviations in Experiment 2 except for the placement of the higher intensity
tones, and vice versa.
In the analysis of the combined data from the two experiments the regularity by accenting interaction reached
significance [F(1, 62) = 6.72, MSe = 0.43, p < .05]. Thus, although regularity did not always lead to better performance
[F(1, 62) = 1.06 for the regularity main effect] presence of accents mediated such facilitation. This was consistent with
the dynamic attending theory.
Discussion
The unexpected negative effect of regularity in earlier work could be eliminated by changing the stimuli such that
detection of temporal irregularities depended on noticing temporal alteration of single sounds and not local changes in
tempo. It was found that combination of regularity with presence of accents provided a small advantage of regular over
random distribution of deviant time intervals. This effect was still small and the critical interaction required combining

the data from two experiments in order to reach significance. However, the finding was consistent with the theory of
dynamic attending. Addition of regularly distributed accents had a positive effect on detection of small temporal
variabilities and addition of irregular accents had a negative effect on it.
References
Boltz, M. & Jones, M.R. (1986). Does rule recursion make melodies easier to reproduce? If not, what does?
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory.
Jones, M. R. & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context.
Jones, M. R., Summerell, L., & Marshburn, E. (1987). Recognizing melodies: A dynamic interpretation. Quarterly
Journal of Experimental Psychology, 39A, 89-121.
Tekman, H. G. (August, 1996). Effect of regular and irregular accents on detection of variations in tone sequences.
Paper presented at the 4th International Conference on Music Perception and Cognition, Montreal, Canada.
Back to index

A Regression Analysis of Timing
Proceedings paper

Hermann Vetter
Introduction
Timing is an important aspect of a music performance. In fact, considering

piano music, besides timing and the intensity (velocity) of striking the keys
there is little left to make up a complete description of a performance - where
timing is understood to consist of the time points of striking the keys and
releasing their dampers, whether effected by releasing a key or releasing the
damper pedal. Now the striking of keys (the onset of tones) is the determinant
of the tempo of a performance, while the release of dampers determines the
legato or staccato character. Therefore the inter-onset time intervals (IO's)
characterise a performance with respect to tempo.
Timing in this narrower sense can be considered on several levels. The most
general level would be the average tempo (the reciprocal of the total time) of
a performance. Also, there may be tempo changes for sections of a piece (like a
new tempo indication, or piu or meno mosso). On an intermediate level there are
tempo deviations which affect a few successive notes in the same direction and
which are perceived as ritardando or accelerando. At last, as only modern
performance research has revealed, there are oscillating "micro" deviations
from the theoretical (mathematical) durations of successive tones which are
perceived not as tempo changes, but as making up the "correct" rhythm. These
micro deviations are present everywhere in a performance, and they have a
systematic character: they are repeated in parallel (analogous) places of a
piece, in repeated performances by the same player, and in performances by
different players (see, e.g., Gabrielsson 1987, 90 f.; Palmer 1989, 336 f.,
342). Therefore one can try to find rules for these deviations (see, e.g.,
Friberg 1991). So in the first 8 measures of Mozart's K. 331, a theoretical
duration relation of 3:1 tends to be sharpened - the second note is shortened
relative to the first note -, and a 2:1 tends to be flattened - the second note
is lengthened (Gabrielsson 1987, 83, 92 f.).
On what will the timing (and dynamic) properties of a performance depend? In

any case, on the player (and possibly a development of his style during his
lifetime), on environmental conditions (e.g., public or studio), and on local
and global properties of the piece. Using appropriate variables, one can try to
analyse, by means of a quantitative model, the style of particular
interpreters, or a "standard" style averaging over interpreters (see below). If
close relationships are found, there exists a possibility to produce synthetic
standard or particular-style performances. An important step in this direction
has been taken by Mazzola and Beran.
The Mazzola-Beran (RUBATO) approach
Mazzola and Beran (1998, 1999) analysed the IO's in the 28 performances of
Schumann's "Traeumerei" (op. 15, 7) provided by Bruno H. Repp. The properties
of the score to which they related the IO's were (1) tempo prescriptions
file:///g|/poster3/Vetter.htm (1 of 6) [18/07/2000 00:38:56]

(ritardandos and fermatas), (2) "suspensions" (appoggiaturas), and (3) metric,

motivic or melodic, and harmonic "weights" of the tones as furnished by
Mazzola's RUBATO software system for music analysis. For chords, which have a
unique onset time in the Repp data, the average of the weights of the
constituent tones was used (1999, 53); in addition, the harmonic weight was
used in the form of the maximum of the weights of the constituents (1999, 66
f.). These properties and IO's are those of the "tone events", i.e., the onsets
of some voice in the score, whether melody or accompaniment.
The weights are described by the authors as follows (1999, 52): "A note in the
score is metrically important if it is part of many [repeating metric
structures]. ... Essentially, a note is considered melodically important if it
is part of motifs that are similar to many other motifs that occur elsewhere in
the score. Finally, the harmonic analysis gives higher weights to notes that
are harmonically important in the sense of an extended classical theory of
harmony (Riemann theory)." Let us add that the characterisation of a motif
comprises only the "melodic" (pitch) and not the metric (duration) aspect
(1998, 46) which is treated separately - presumably in order to reduce somewhat
the enormous combinatorics involved in the motivic analysis. (A further
reduction of the melodic aspect from the size to the mere direction of the
intervals was envisaged by Mazzola on other occasions.) In fact, the
computation of the weights "involve[s] a large number of combinatorial
calculations. For example, the motivic calculations exceed any reasonable
amount of calculation if handled with ideal boundary conditions. The same is
true for harmonic weights." (1999, 78.)
The weights often vary strongly between successive events, and it was
considered appropriate by the authors to use them also in smoothed forms. The
value of a smoothed weight for an event was the mean of the weights within a
symmetrical window around the event (in fact, a weighted mean with the weight
decreasing linearly with increasing distance from the event at the centre of
the window - 1999, 60; 1998, 49 f.). Larger window widths correspond to
stronger smoothing. The authors applied widths of 8, 4, and 2 bars (1999, 67).
In addition, the first and second time derivatives of these weight functions
were used, as approximated on the basis of the finite time differences (1999,
67).
Thus each event was characterised by 48 quantitative values: 4 kinds of weights

(metric, melodic, harmonic(mean) and harmonic(max)), times 4 window widths
(including the degenerate window containing only the event itself), times 3
derivatives (the zeroth derivative - the weight function itself - and its 1st
and 2nd derivatives). These 48 values were supplemented by 10 values
representing the ritardando, fermata, and appoggiatura prescriptions in the
score (1999, 67). Thus a total of 58 independent variables for a regression
model with IO as the dependent variable was obtained - a total of 58
score-based properties of an event on which its IO as produced by the
interpreter may depend. However, only 57 of the independent variables were
"linearly independent" (1999, 68 - i.e., one was redundant in that it could be
expressed as a linear function of the others), so the model had df=57 "degrees
of freedom".
Besides the 28 performances (by 24 interpreters - two of them contributed 3

performances each) Mazzola and Beran (1998) considered the mean of all of them.
This seems appropriate since the individual performances are all quite similar
to this mean performance; their correlations with it range between 0.84 and
0.95. So there is something like a standard performance which averages out the
individual features of the 28 performances.
The success of a regression model can be expressed by the squared multiple

correlation (R²) between the dependent variable and the independent variables.
It measures the proportion of variance in the dependent variable that can be

attributed, according to the model, to the independent variables. For the mean
performance the authors obtained R²=0.84 (1998, 54). This means that the
similarity between the timing of the real performance and the version
constructed by the statistical model from the properties of the score was
R²=0.84 or R=0.92. For the 28 individual performances R² ranged between 0.65
and 0.85 (1999, 69; 1998, 54).
It is good to remember that some trivial success is certain for any model even
if there is no true relationship at all between the dependent and the
independent variables. This is so because the sample correlations fluctuate
around their true values, and the model exploits these random correlations. Let
n=sample size, then, in the multinormal case, the expected sample value of R²
when its true value is zero is given by E(R²)=df/(n-1) (see Kendall and Stuart,
1973, p. 354, formula (27.84)). In the present case there were n=212 events
(including the repetition of the first 8 bars - 1999, 69) to which the
regression model was applied, so with df=57 an R² of approximately 57/211=0.27
(or R=0.52 !) is to be expected in any case and would figure as completely
insignificant.
The formula shows that the trivial success in terms of R² can always be
increased by choosing an appropriate value of df (by using more independent
variables - no matter what they are like). For df=n-1 (i.e., if the number of
nonredundant independent variables is equal to the number of data points minus
one) we have E(R²)=1 (=R² in this case): such an excessively rich model always
fits exactly.
Motivation of the present analysis
The Mazzola-Beran approach requires the RUBATO software with the implemented
mathematical definitions of the weight functions, and it requires (as the first
author stated in a radio interview) hours or days of computing time even if it
is not "handled with ideal boundary conditions", i.e., if the length of the
metric and melodic patterns taken into consideration is severely restricted.
Therefore we wanted to test whether a similar R² can be obtained on the basis
of a simple surface analysis of the score which any musician can easily give,
and which requires only a negligible amount of computing time.
Method
Besides the ritardandos and the fermata prescribed in the score (excluding the
second fermata, the one on the final chord which, for want of a successor, does
not have an IO) we chose the metric figure of which an event is an element,
restricted to the two predecessors and two successors of the event. This is a
very narrow window as compared to the Mazzola-Beran windows of up to 8 bars.
Furthermore, the RUBATO weights reflect how often a metric or melodic pattern
is repeated in the whole piece, whereas our analysis never extends beyond the
second predecessor and the second successor of an event.
The metric description of these five events is a succession of five numbers

taking, for "Traeumerei", the values 1, 2, 3, 4 corresponding to a length, in
the score, equivalent to 1, 2, 3 or 4 quavers. (There are 4x4x4x4x4=1024
possible metric figures of this sort.) This characterisation of an event can
either be used as a qualitative classification, or it can be decomposed,
completely equivalently, into indicator variables taking the values 1 or 0 to
indicate presence or absence of a (five-digit) value of the classification
variable. The number of degrees of freedom for our regression model results
equivalently from the classification variable and the number of nonredundant

indicator variables. Now while Mazzola and Beran use quantitative variables
that can take an infinity of values, our indicator variables take only two
values and thus convey much less information, so that also in this respect they
are weak competitors of the Mazzola-Beran variables.
The next and last independent variable in our model is the interval which the
melody voice has taken in reaching the present tone. (If the melody voice held
the same tone from the preceding event, a zero interval was coded, as for the
case that the melody voice repeats the tone.) This again yields a qualitative
classification (decomposable into indicator variables), for we did not want to
prejudge a linear effect of the intervals.
Results
Our model was applied to a data set which (like the one used in Repp 1992) did
not contain the repetition of the first 8 bars, but instead the mean of the two
executions. Taking the mean reduces the residual variance; for independent
residuals, the variance is halved. In order to make the two models comparable,
we ran a simulated analysis with doubled residual variance for the first 8 bars
and two successive executions of these to obtain as many data points as
Mazzola-Beran and corrected values for our R². This correction strategy is
pretty radical in view of the fact that the two executions (mean performance -
including simulated identical executions for two pianists who did not repeat)
correlate no less than 0.987 (Repp 1992, 2552), which means that they were
practically identical and free of random errors, so that the residuals in the
regression models were in fact predominantly not random errors of repetition,
but systematic deviations from the model so that their variance would not have
to be doubled.
Using only the ritardandos and the first fermata (the second one is on the
final chord which, as remarked earlier, has no IO) led to R²=0.54 (uncorrected:
0.70). Adding the metric figure led to R²=0.83 (uncorrected: 0.87), and using
also the melody interval led to R²=0.90 (uncorrected: 0.91) with df=55, i.e.,
with 55 nonredundant independent (in our case, indicator) variables.
Mazzola-Beran had obtained R²=0.84 with df=57.
The uncorrected R² for the 28 individual performances were:
R²=0.46...0.79 (present analysis, rit. and ferm. only),
R²=0.66...0.88 (same, plus metric figure),
R²=0.77...0.90 (same, plus melody interval);
R²=0.65...0.85 (Mazzola-Beran).
Discussion
It appears that the success of the regression model based on the RUBATO depth
analysis of the score is surpassed by a model based on a very simple surface
analysis of the score and requiring only a negligible amount of computing time.
Since R² is remarkably high in both cases, it seems appropriate to discuss
possibilities of extending the analyses presented here to come closer to the
distant goal sketched in the last paragraph of the introduction.
The first step to be taken would be to collect many performances of further

pieces by the same (and also by different) interpreters, and to estimate
general and interpreter-specific score-based parameters. For the RUBATO

weights, new pieces do not introduce new independent variables. The question,
of course, is how far the weights have the same effects on timing in different
pieces; and ritardandos, say, will not always be executed in the same manner as
in "Traeumerei". For our model, on the other hand, new pieces will introduce
new metric figures and melody intervals till saturation is reached.
When pieces with widely differing tempos are considered, it cannot be assumed
that a metric figure defined by the same notes is performed in the same way
e.g. in largo and in presto. It will be necessary to introduce a variable for
the approximate absolute speed at which, say, quavers are played (or for the
metronome indication for quavers). This variable will again be defined as a
qualitative classification of the tempos (decomposable into indicator
variables) in order not to prejudge a linear effect of the overall tempo.
Even then the overall tempo of a piece will hardly suffice to determine the way
metric figures are executed. Of course, if the data base is large enough, the
surrounding metric figure upon which the IO of an event is made to depend can
be extended to include more than two predecessors and successors of the event.
And, of course, further properties of the score can be introduced, e.g., a
harmonic analysis of the chords, or progressions to different keys.
A vast field would be the inclusion of large-scale properties of a composition,

e.g., its partition into exposition, development section, and reprise, into
first and second theme, into phrases; its polyphonic structure; etc. Such
properties have been completely absent in our approach, and also to a
considerable degree in the Mazzola-Beran analysis which asks whether we have an
element of an "important" pattern (i.e., a frequent one within the whole
piece), but does not note whether it stands, e.g., at the end of a phrase.
Once a large variety of pieces (and interpreters) have been covered, one could
try to predict a new standard or particular performance, i.e., to generate it
on the basis of the parameters of the model as estimated from different pieces,
and to compare it, in terms of correlation, with the real standard or
particular performance. If many such attempts prove satisfactory, one might
dare to produce a synthetic performance of a piece not played by the
interpreter in question or by any one - if the piece is sufficiently similar to
those from which the parameters have been estimated. In principle it will also
be possible to generate a new style of interpretation (an artificial
interpreter) - if the parameters characterising existing interpreters are
sufficiently systematised so that meaningful interpolations or extrapolations
become possible.
The main practical problem of such an approach is to obtain the IO's. Disk
recordings are innumerable, but obtaining the onset times from wave data (as
done by Repp for "Traeumerei") is very laborious and does not seem to have been
satisfactorily automated thus far. Obtaining onsets from MIDI data is simple
and automatable, but the data base is scarce. With a recording device and
skilled volunteers, one can obtain such data. But if one wants to analyse
performances of well-known interpreters in non-wave format, the only - and
quite limited - source would seem to be the Yamaha disklavier recordings. They
are not in MIDI format, but the disklavier is reported to have a MIDI output.
Acknowledgements
I am very much indebted to Bruno H. Repp of Haskins Laboratories for providing

the timing data (obtained from wave recordings!) of 28 performances of
Schumann's "Traeumerei" by well-known pianists, and for some stimulating
comments on the conception of the research presented here.

References
Friberg, A. (1991): Generative rules for music performance: A formal

description of a rule system. Computer Music Journal 15, 56-71.
Gabrielsson, Alf (1987): Once again: The theme from Mozart's piano sonata in A
major (K. 331): a comparison of five performances. In A. Gabrielsson (ed.),
Action and Perception in Rhythm and Music. Stockholm: Publications issued by
the Royal Swedish Academy of Music, no. 55, 81-102.
Kendall, M. G., and A. Stuart (1973): The Advanced Theory of Statistics, vol.
2, 3rd ed. London: Charles Griffin.
Mazzola, Guerino, and Jan Beran (1998): Rational composition of performance.

In: Proceedings of the conference "Regulation of creative processes in music",
ed. R. Kopiez and W. Auhagen, Frankfurt and New York: Lang, 37-67.
-- -- (1999): Analysing musical structure and performance - a statistical

approach. Statistical Science 14, No. 1, 47-79.
Palmer, Caroline (1989): Mapping musical thought to musical performance. J.

Exp. Psychol.: Human Perception and Performance 15, 331-46.
Repp, Bruno H. (1992): Diversity and commonality in music performance: An

analysis of timing microstructure in Schumann's "Traeumerei". J. Acoust. Soc.
Am. 92, 2546-68.
Back to index

Recognition of Rhythmic Structure with a Neuro-Fuzzy System
Proceedings paper
Recognition of Rhythmic Structure with

a Neuro-Fuzzy System
Tillman Weyde
Universität Osnabrück
Forschungsstelle Musik- und Medientechnologie
tweyde@uos.de
Introduction
The recognition of rhythmic structures is a task that is mastered by music listeners every day without
effort. Musicians and music teachers easily realize and describe rhythmic structures. Yet it is not clear
how this task is accomplished. In Osnabrück at the Forschungsstelle Musik- und Medientechnologie
music tutorial programs are being developed. The development of these interactive music tutorial
programs brought the need for an automatic recognition of rhythmic structures, since many other
musical structures like melodies, cadences, and musical form incorporate rhythm. So a method for the
recognition of rhythmic structures is necessary for a wide range of interactive musical applications.
Most studies on rhythm perception, cognition and production focus on how a person would typically
perceive, play or memorize rhythmic structures. In interactive music applications the focus must be on
an highly flexible and fault tolerant reaction to user input. The question concerning user input is not
`How would a listener normally perceive this rhythm?' but `How can the system make sense of it?'.
Knowledge about rhythm perception is useful and necessary, of course, and should be used by the
system. But we do not have complete and exact knowledge of this field and the knowledge we have
needs to be used in a way that makes it work for the application task.
It is also desirable to make use of the knowledge of human experts. A recognition system should be
open to integrate explicit knowledge as well so learning from implicit knowledge encoded in examples
provided by experts. Fuzzy Logic systems provide a way of using incomplete and inexact knowledge
and if combined with neural networks it can learn to change its behaviour by examples. Compared to
neural networks a neuro-fuzzy system provides the additional advantage of making the change by
learning interpretable. It need not have the black box effect of a neural network.
file:///g|/poster3/Weyde.htm (1 of 11) [18/07/2000 00:39:01]

Concept
The aim is to develop a system that performs an analysis of a user's rhythmic performance as input
(per MIDI) for a given task. Output should be an assignment of structures and information on the
similarities and differences. The system should make use of results from music theory and empirical
findings. Based on these 'clues' the system should find the adequate matching of rhythmic structure.
Also it should be able to learn from examples provided by an expert e.g. a music teacher. Thirdly it
should have an open architecture for the integration of further dimensions (e.g. pitch) and further
implicit and explicit knowledge.
Such a system should have certain features in concern to rhythm recognition. It should be tolerant of
expressive or inprecise timing. Similar to beat-tracking systems it should follow changes in tempo. It
should also be capable of following structurally distorted patterns, like a score following system. Yet it
should not only find synchronization points but a structural description for the whole input.
Our way to achieve this is to combine a segmentation and assignment-process with an extraction of
parameters that are rated by a fuzzy-logic system. This system can be trained by an expert user by
providing better solutions where the system makes mistakes. For training the fuzzy-logic system is
transformed into a neural network and trained with a modified backpropagation algorithm.
Using and training the system involves five stages of processing:
● segmentation
● assignment of groups
● assignment of notes within groups
● rating the assignment
● optimize the rating process by examples
The overall rating can be interpreted as a measure for the similarity of rhythms. We have not yet
integrated an explicit model of beat or meter. If we have a musical beginner playing, cues for metrical
structure may be weak or inconsistent thus leading to inadequate results. So a system for this
application context should rather rely on the figural quality of the rhythmic patterns ([Bam80]).
Metrical information may help to improve system performance but is does not seem essential at the
current stage of development.
Segmentation and assignment
It is generally agreed on that perception and memory of sequences of discrete events are organized in
groups, like letters or phonems are grouped to words ([PP97], [Slo85], [Dem98]). Another piece of
evidence for this assumption is subjective rhythmisation. Even completely uniform isochronous
sequences of sounds are perceived as groups of tones with different accents ([Fra82], [Deu86]).
Grouping enables efficient and structured storage of rhythms in short and long term memory,
especially of repetitions. The segmentation into groups is essentially an automatic process that seems
to be ruled (among other factors) by physiological constraints for instance the capacity of sensory
memory, the numeral and timely capacity of the short term memory and the lower boundary of timely
discrimination of events ([Pöp89], [Sch97]). Groups as learned schemata can have an influence on the

perception of rhythms ([Swa86], [Bre90]). Although there is a considerable amount of research done
on grouping, a generally accepted model for the process of grouping auditory events has not yet been
developed.
Representation and preprocessing

A central issue in processing temporal data, especially music, with neural nets is the kind of
representation and the preprocessing needed. A linear spatial representation of time is possible but
leads to large numbers of neurons or very limited temporal resolution as well the problem of
recognizing patterns at different spatial locations. This approach was realized by [Lin90]. The
dilemma of this approach is that on the one hand the number of neurons in the neural net must not be
too large to ensure learning and functioning of the net, on the other hand a large number of neurons is
needed to achieve an sufficient representation in a discrete representation. For an discussion of neural
network designs for rhythm processing see [Rob96]. A dynamical approach that builds the relevant
temporal properties into the net-design is plausible since humans perform the rhythm recognition in
real time. There are many models of processing temporal data dynamically in neural networks. But all
have the problem of handling dynamic complexity and it is always difficult to interpret the network's
behaviour. So our treatment of the temporal context is done in the preprocessing and the system acts
only on the extracted parameters. This also facilitates the interpretation of training results.
To keep the learning process transparent and limit system complexity the representation of task and
input is static, i.e. the patterns are processed as a whole piece. But the system can be adapted to be
used in real time and for longer patterns by using a time window and 'chunking out' assigned groups.
Hierarchical levels
For the comparison of rhythms it is necessary to detect and qualify differences. There can be structural
differences, like missing events, as well as differing temporal or qualitative relations of events, like
longer notes, shorter notes or groups played faster or earlier or in differing order. We concentrate on
the onsets of the notes, since the length has less relevance for the rhythmical structure ([Han93],
[Slo85]). We have two levels of comparison:
● the comparison of rhythmic groups in respect to the temporal temporal structure of their
elements or - musically spoken - the comparison of rhythmic motifs
● the comparison of the temporal placement of groups meaning their oder and their relative
position or - musically - comparing the rhythmic structure of a phrase or a theme
There are of course more hierarchical levels up to a whole work or song ([LJ83], [Cla87]), but these
levels are not considered here.
Which grouping of events and which relation of groups is appropriate yields from matching the input
groups with the task groups. Since a group match affects the grouping of the other events the grouping
process cannot be separated from the matching process. Humans perform this matching task in real
time, patterns are matched to known patterns or stored as new patterns. How this process is performed
is not exactly known yet. Is clear that some kind of structuring has to take place in real time since
memory and processing capacity is limited ([Cow84], [Sch97]).

Neuro-fuzzy systems
As stated before the basis of rhythm production, perception and cognition is only partially known or
knowledge is vague, we know of some factors that they support certain effects but cannot quantify
them for arbitrary situations and cannot reduce them to a simple ìf - then' condition. But often we do
know tendencies that some factor influences a certain consequence to some extent. Especially in
rhythm we can not as reduce statements to binary logic as easily as in e.g. harmony since the values
dealt with are not discrete. In actual computation they are discrete but represented in high resolution so
as to eliminate influence of discretization. Even if we could find absolute values at which a rhythmic
interpretation switches over, these values would change with the context. So it seems more adequate to
find different parameters and model their dynamic interaction.
Fuzzy-Prolog and neural nets

An approach for this modeling is described by [NKK96] with Fuzzy-Prolog. Fuzzy-Prolog is a
programming language similar to Prolog. Operators are assigned to an evaluation function with values
in the interval [0,1]. There are several options for the evaluation function of disjuction and
conjunction, which can be described by t-norms and t-conorms. The functions used in our system will
be described later. For evaluation of the implication the Goguen-Implication is being used which is
defined as follows:
The assignment of truth values to a set of rules in the form and (facts) , that is not zero
only for a finite number of rules defines a Fuzzy-Prolog program. Other rules can be derived from
those by using the modus ponens generalized for Fuzzy-Prolog:
where and denote the truth values of and .
When trying to model how the rules contribute to a output in a Fuzzy-System, the rules must be
weighed individually. These weights can be ad hoc estimates, e.g. derived from literature, but they
have to be adjusted by trial and error. The idea is now to automate this process by optimizing the
weights using sample ratings. Here we can make use of the neural net paradigm which is structurally
similar to Fuzzy-Systems. It is shown in [NKK96] that Fuzzy-Prolog programs can be transformed
into feed-forward neural nets and trained with the backpropagation algorithm if they meet some
constraints which do not mean any real limitations of the system design.
The problem with training a ANN by sample expert ratings is that an consistent absolute rating of
examples is hard to achieve. The correct relations become apparent possibly only after comparing
many examples, which may make a re-rating of many examples necessary. A direct decision wich one
of two examples is better is usually easy to achieve. In this situation it is desirable to train the net by
relative ratings. This can be done for feed forward nets using gradient descent training if the net is
duplicated, one for each example of the pair. This method is described in detail in [Bra97].

The program RhythmScan
The program RhythmScan is an experimental implementation of our neuro-fuzzy system for rhythm
recognition. It is meant to be used by students and teachers. The student plays the input for a given
task on a keyboard. This can be a MIDI keyboard or the computer keyboard, but the latter provides no
velocity data which reduces the input by one dimension. The input is analyzed and the teacher
provides the corrections of system output for training the system. The program computes an analysis
of the input which states the assignment of groups, the assignment of notes within the groups and
information about the differences between the input and the task. If the teacher finds an inadequate
assignment of the rhythmic structure he or she can provide a better assignment which is stored as a
training sample pair. With these samples the program can train the neuro-fuzzy system. The training
optimizes the weights of the rules in the Fuzzy-Prolog program to give the assignments by the teacher
a better rating than those produced by the system before. If the program produces new mistakes after
the training, the teacher can again provide better assignments. By this procedure we hope to iteratively
improve the overall performance of the system. Also it might be of interest to examine the weight
settings produced by the training process since the weights can be interpreted directly to the role the
corresponding parameter plays in the (machine) recognition process.
Group assignments
The computation starts with the segmentation and assignment algorithm (SAA). The SAA first defines
coversets, i.e. sets of groups for the task and the input. Currently the only restriction on the structure of
the coversets is that the number of note must be at least 2 and 5 at the most. This is motivated by the
findings that groups in memory have at the most 7 to 9 elements, but already with more than 4 to 5
elements the sequence recall is getting less reliable[Mil56]. The preferred length is between 2 and 5
elements, especially in music longer groups are subdivided into smaller units. This number like all
constants in this system are tentative and still subject to experimental changes. It is desirable to
incorporate more constraints for segmentation, especially to reduce computation time but also to give
the system more clues as to which assignments are good ones. Still the groupings should not bee too
restricted to keep the system flexible.
Note assignments
With the coversets for both task and input a group of the task is assigned to each group of the input. It
is possible to assign an input group to no task group to deal with notes that are unrelated to input e.g.
produced by problems with the MIDI instrument or by gross errors. Then the notes within the groups
are assigned. Two constraints are imposed on the note assignment:
● every input note and every task note can only be assigned once
● serial order must be respected, i.e. if two input notes notes a and b and two task notes c and d
are assigned b-c and b-d then if b is after a, d must be after c.
The first constraint ensures that additional notes will be marked as such and not as timing deviations.
The second rule is obvious with plain rhythm. It might have to be changed when the system is used
with melodies where order information is provided by pitch.

The rules are similar to those outlined in [EW96]. Since they work well so far they have not been
included into training by the neural net yet. Four parameters are extracted for each input group:
● group correctness: It is allowed to leave up to two notes unassigned. These unassigned notes are
regarded as structural errors where different rules apply to errors that leave the rest of the notes
unchanged in structure or errors that involve shifting the following notes on the time axis. The
first type can occur for instance by hitting a wrong key and moving quickly to the correct one
without losing tempo while the second type is mostly a reading or memory error which usually
involves a metrical shift as well.
● group tempo: An input group's tempo is retrieved by calculating all tempo variants based on
taking two assigned notes to define the relation to the input pattern. All these tempo variants are
considered when searching the best assignment of notes within the groups. The quality of a
groups tempo is calculated from its stability in respect to the tempo of the last group (or the
given tempo for the first group) and its plausibility, i.e. if the tempo is within the range the range
of
● group precision: This value is calculated from the sum of squared deviations of the assigned
notes (cleared of tempo position differences of the groups)
● group position: The position relative to the expected beginning as calculated from the group
before or the initial tempo and position. Here metrical aspects are implicit if the task is
metrically structured.
The whole structure of coversets, group assignments and note assignments is called a coverset
assignment. When the best coverset assignment assignment is found we can determine the tempo per
group the deviation of the groups to their expected positions, the deviation of the notes. This
information can then be used to generate an adequate reaction for the user.
The Fuzzy-Prolog program

This is the set of rules used in the current version of the system. The variable ca denotes a coverset
assignment which consists of a list of group assignments.

Terms that do not appear on the left hand side of a rule are facts. Their truth values are calculated from
the input data. The facts in rule (6) aeqb( ), aeq2b( ), aeqb2( ), aeq3b( ) und aeqb3( ) compare
the tempo of the input group with the task group. Since errors may lead to double or half tempo or
even triple or third tempo the last four rules are introduced, but their weight is (initially and after
training) much lower than that of the rule checking for identical tempo.
The plausibility of a group's tempo (GTpoPlsbl) states whether it is within the range musically sensible
tempos. We use a tapezoid function which decreases above 200 and blow 60. This is a preliminary

solution. It might improve results to use the a more elaborate model like [Par94].
The precision value (GPrcsn) is calculated as the sum of squares of the deviations of the assigned
notes. The correctness (GCorrect) is computed from the error values for unassigned notes. The order
(COrder) of a coverset assignment is calculated by counting for groups that are missing, added or in
wrong order.
The the overall rating of a coverset assignment (ca) (3) is combined from the four parts tempo,
correctness, precision and order. The rating for the tempo is calculated from the corresponding group
ratings (4). Those are combined from stability and plausibility (5). Tempo stability is combined from
the conjunction of the comparisons of the five variants (6). The correctness (8), precision (7)and
relative position (10) are combined from the conjunction of the corresponding group values.
We use this evaluation functions for conjunction:
analogously for the disjunction:
These functions allow for an amount of compensation between the operands that can be adjusted by
the q parameter. For these functions we use a q value of 2. For , approaches min and
approaches max.
Net structure and training

Our Fuzzy-Prolog program includes the special case that for the rules 4, 7, 8 und 10 it is not known in
advance of how many terms the right-hand side consists, since patterns may be of variable length and
coversets can include different numbers of groups. We handle this by generating a special net for
every training sample with the corresponding number of neurons. The structure of the resulting net is
shown in figure 1 where `...' marks multiple neurons and multiple connections.

Figure 1: The structure of the generated neural network

In the realization of the the gradient descent with the backpropagation algorithm some alterations to
the standard case are necessary.
● The multiple connections share one weight, i.e. we calculate the delta for every neuron, but
before applying them we calculate the mean, since they have to be transformed back to the truth
value of one rule.
● To realize the training by relative ratings the corresponding weights in the two nets are shared.
● The the evaluation functions instead of weighted sums make a change in the standard
backpropagation algorithm necessary.
Discussion
The RhythmScan program is still work in progress and has to date not been tested thoroughly and
systematically. First experiments show some strengths and weaknesses. The assignment of notes
within the groups works well so far. The assignment of groups did not always lead to correct results,
which is why we used the neural net for this part of the system. False assignments could be eliminated
by training so far. It might of course be necessary to extract more parameters in the preprocessing that
can be used by the net. The principle of optimizing with relative ratings is absolutely necessary to
make optimization feasible. So the expert (music teacher) can enter his preferred assignment if the
system makes a mistake without having to take care of the consistency of ratings which is practically
impossible.
A problem is at the current stage of development that, due to combinatorical explosion in the

segmentation and assignment process, computation takes too long for the use in interactive systems for
all but very short patterns. Here optimization is necessary especially for the use in real time systems.
The system seems to reflect relevant aspects of rhythmic structure. It recognizes rhythmic structures
without explicitly modeling metrical aspects. Nevertheless a model of metrical aspects should be
integrated in the future. Feedback about deviations of structure and tempo are useful when the group
assignment is correct but greater reliability is needed for practical use. When that is achieved the
system should be able to aid learners in interactive music tutorials. Provided an increase in
computational efficiency the system could also be used for other tasks like automatic notation or meter
recognition if the task is replaced by a library of rhythmic or metrical patterns or an ad hoc pattern
generator. Although it is not primarily a model of perception it could be used for testing hypotheses
and discovering factors in grouping and timing of musical events.
References
Bam80
J. Bamberger. Cognitive structuring in the apprehension and description of simple rhythms.
Archives of Psychology, 48:177-199, 1980.
Bra97
Heinrich Braun. Neuronale Netze. Springer, Berlin Heidelberg, 1997.
Bre90
A. Bregman. Auditory scene analysis: The perceptual organization of sound. The MIT Press,
Cambridge, Mass., 1990.
Cla87
Eric F. Clarke. Levels of structure in the organization of musical time. Contemporary Music
Review, 2(1):211-38, 1987.
Cow84
Nelson Cowan. On short and long auditory stores. Psychological Bulletin, 96(2):341-370, 1984.
Dem98
Steven M. Demorest. The role of phrase groupings in children's memory for melodies. In
Suk Won Yi, Hee Sook Oh, Sang Wook Nam, Serin Kim, and Mee Bae Lee, editors,
Proceedings of the Fifth International Conference on Music Perception and Cognition, pages
75-80, Seoul, Korea, 1998. Western Music Research Institute, Seoul National University.
Deu86
Diana Deutsch. Auditory pattern recognition. In K. R. Boff, L. Kaufman, and J. P. Thomas,
editors, Handbook of Perception and Human Performance: Cognitive Processes and
Performance, volume 2. John Wiley and Sons, New York, 1986.
EW96
Bernd Enders and Tillman Weyde. Automatische Rhytmuserkennung und -vergleich mit Hilfe
von Fuzzy-Logik. Systematische Musikwissenschaft, IV(1-2):101-113, 1996.
Fra82
Paul Fraisse. Rhythm and tempo. chapter 6, pages 149-180. Academic Press, New York, 1982.
Han93
S. Handel. The effect of tempo and tone duration on rhythm discrimination. Perception and

Psychophysics, 54(3):370-382, 1993.

Lin90
C. Linster. A neural network that learns to play in different music styles. In Proceedings of the
International Computer Music Conference 1990, Glasgow, UK, pages 311-313, San Francisco,
1990. International Computer Music Association.
LJ83
Fred Lerdahl and Ray Jackendoff. A Generative Theory of Tonal Music. The MIT Press,
Cambridge, Mass., 1983.
Mil56
G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for
processing information. Psychological Review, 63(2):81-97, 1956.
NKK96
Detlef Nauck, Frank Klawonn, and Rudolf Kruse. Neuronale Netze und Fuzzy-Systeme.
Computational Intelligence. Vieweg, Braunschweig, 2 edition, 1996.
Par94
Richard Parncutt. A perceptual model of pulse salience and metrical accent in musical rhythms.
Music Perception, 11(4):409-64, 1994.
Pöp89
Ernst Pöppel. The measurement of music and the cerebral clock: A new theory. Leonardo,
22(1):83-89, 1989.
PP97
Aniruddh D. Patel and Isabelle Perez. Is music autonomous form language? a
neurophysiological appraisal. chapter 10, pages 191-216. Psychology Press, Hove, 1997.
Rob96
Simon Roberts. Interpreting rhythmic structure using artificial neural networks. Diss.,
University of Wales, 1996.
Sch97
Erich Schröger. Auditives sensorisches Gedächtnis: Elektrophysiologische Befunde. In Heinz
Mandl, editor, Bericht über den 40. Kongreß der deutschen Gesellschaft für Psychologie 1996
in München, Göttingen, 1997. Hogrefe.
Slo85
J. A. Sloboda. The Musical Mind: The Cognitive Psychology of Music. Oxford University Press,
Oxford, 1985.
Swa86
J. P. Swain. The need for limits in hierarchichal theories of music. Music Perception,
4(1):121-148, 1986.
Back to index

Masashi Yamada
Proceedings paper
TEMPORAL CONTROL MECHANISM IN EQUAL INTERVAL

TAPPING WITH METRONOME TICKING
Masashi Yamada
Department of Musicology, Osaka University of Arts
m-yamada@osaka-geidai.ac.jp
1. INTRODUCTION
We often perceive rhythmic movements in our behavior and environment. Fraisse (1982) reviewed
psychological experiments concerned with rhythmic perception and behavior. In this article, he defined a
cadence as the rhythm produced by the simple repetition of the same stimulus at a constant rate, and
designated it as the basis of all rhythms. He reviewed studies regarding spontaneous and preferred tempi
of cadence, and then discussed the organization of rhythms based on the accents which were added to the
cadence in intensity, duration etc. He finally showed performed tempi of many musical works correlate
with spontaneous and preferred tempi of the cadence.
Moreover, in the field of musicology, Cooper and Meyer (1960) defined a pulse as one of the series of
regularly recurring, precisely equivalent stimuli. They assigned the series of the pulse as the basis of
musical rhythms and analyzed the complex rhythmic structures of musical pieces. Fraisse (1982) and
Cooper and Meyer (1960) suggest that the rhythmic behavior of music is based on the behavior of the
cadence or pulses.
Equal interval tapping produces the cadence or pulses. Musha et al. (1985) requested non-musicians and
an amateur pianist to tap castanets at equal intervals under two conditions: Under one set of conditions,
the subjects synchronized their tapping to the ticking of a metronome (metronome tapping). Under the
other set of conditions, the subjects only listened to the metronome before tapping but not while tapping
(free tapping). They analyzed the temporal fluctuation observed in these tapping experiments using
Fourier analysis and determined the power spectrum of the fluctuation. As a result, in the case of free
tapping, the power of the fluctuation was small and constant in the high frequency region above 0.1 Hz,
while it increased as the frequency decreased in the low frequency region below 0.1 Hz. The amplitude
or the power of the fluctuation indicates the difficulty of temporal control and the frequency of 0.1 Hz
corresponds to a period of 10 sec. Therefore, the critical phenomenon in the spectrum of free tapping
implied that the temporal controllability was excellent for a period which was less than 10 sec, but for
periods over 10 sec, the controllability worsened as the period increased. On the other hand, for
metronome tapping, the power was large and constant in the high frequency region above 0.1 Hz, similar
to free tapping, while it never increased as the frequency decreased in the low frequency region below
0.1 Hz. The power of the low frequency components was still as large as the power of the high frequency
components, or the power decreased as the frequency decreased. Musha and his colleagues suggested
that the critical phenomena of 0.1 Hz indicated that temporal control in equal interval tapping is
governed by a memory of 10 sec.
file:///g|/poster3/Yamada.htm (1 of 6) [18/07/2000 00:39:05]

Masashi Yamada
Yamada (1996, 1998) pointed out that there was a possibility that the memory capacity corresponded not
to a real time of 10 sec, but to a given number of taps, because the tempi were limited to 300-500 ms/tap
in the experiments of Musha et al. (1985). Yamada made experiments of free tapping in various tempi
ranging from 180 to 800 ms/tap. He requested that the musicians tap in equal intervals without
metronome ticking, using the index or middle fingers of their right hands. As a result, the critical period
that was determined by the method of least mean square showed around 20 taps for all tempi and for all
subjects. ANOVA showed neither significant main effects nor the interaction with regard to the factors of
tempo and finger used. Moreover, he applied auto-regressive (AR) models to the temporal fluctuation of
free tapping. The best AR model was determined as the model that minimizes the value of Akaike's
Information Criteria (Akaike, 1969). The order of the best AR model also showed around 20 for all tempi
and for all subjects. Yamada (1996, 1998) concluded that the memory capacity, which governs equal
interval tapping, was not 10 sec, but 20 taps, i.e., the preceding 20 intervals of the tapping is preserved
and used to determine the interval of the present tap.
Musha et al. (1985) used non-musicians and an amateur musician as subjects, while Yamada (1996,
1998) used only musicians. Yamada and Tsumura (1997) investigated the temporal controllability in
equal interval tapping as a function of musical training. They used skilled pianists and novice pianists as
subjects. As a result, when they performed tapping with one finger, skilled and novice pianists showed
the same temporal controllability and they consistently showed the critical phenomenon of 20 taps in the
spectrum. However, when they used multiple fingers, there were significant differences between the two
groups: The temporal controllability of the skilled pianists was unchanged, while the temporal
controllability of the novice pianists significantly decreased. These results suggested that the critical
phenomenon of 20 taps, which was observed in single finger free tapping, correlated with a basic feature
of temporal control that did not change with musical training.
The series of experiments by Yamada and his colleague verified that the 20-tap memory associated with
free tapping existed with various tempi. On the other hand, Musha et al. (1985) made both free tapping
and metronome tapping experiments, but the tempi were limited to 300-500 ms/tap. Therefore, it is still
not clarified the control of equal interval tapping with metronome ticking in various tempi.
In the present study, we have made free and metronome tapping experiments in various tempi to estimate
the temporal control of metronome tapping with regard to the control of free tapping.
2. EXPERIMENTAL METHOD
Ten students from the Department of Musicology at the Osaka University of Arts were used as subjects.
While in a soundproof room, the subjects tapped an aluminum board on a table with the middle fingers of
their right hands with as an equal interval and intensity as possible. All subjects had experience in
playing the piano and other instruments, but only at intermediate levels. Each subject made equal interval
tapping in the tempi of 200, 370 and 800 ms/tap and the spontaneous tempo, i.e., the comfortable tempo
for the subject to tap in equal intervals. The subjects were instructed not to count numbers of taps or to
imagine music during the tapping.
In metronome tapping, subjects listened to metronome ticking during the tapping. On the other hand, in
the case of free tapping for the fixed tempi of 200, 370 and 800 ms/tap, they listened to metronome
ticking for 20 sec before each trial, but not while tapping. The subjects were not exposed to metronome
ticking in spontaneous tempo, free tapping.
The metronome ticking was produced by a computer system with a D/A converter of 48 kHz. Each tick
consisted of a 4000 Hz tone with the triangle time envelope of 6 ms. The metronome ticking was
presented through headphones at about 73 dB(A). Small speakers attached to the aluminum board
converted the pressure generated by the subjectâ€(tm)s finger to a voltage. The computer system
converted this voltage to numeric data with a 12 kHz sampling A/D converter and measured the

Masashi Yamada
inter-onset intervals (IOIs) of the tapping. The voltage was also used for each subject to monitor the
clicking sounds of his/her own tapping. The clicking sounds were monitored at about 73 dB(A) through
the same headphones that the metronome ticking was presented.
One trial of tapping consists of 1701 taps. In some cases, the IOI was not stable in the initial 100 taps,
therefore we used the stable IOI fluctuation of 1600 taps, from the 101st to the 1700th tap, for the
analysis in all tapping trials. In some cases of free tapping, the IOI showed the divergence phenomenon,
i.e., the IOI gradually increased or decreased, and the IOI values of the last part was quite different from
the values of the initial part. We defined the case in which the mean value of the IOI in the last ten taps
were different in more than 20% from the mean value of the 101st to the 110th taps as a failed trial.
The experiment was as follows: The experiment consists of four blocks. Each block corresponded to one
of the previously defined tempi. Each block consists of two phases. Each phase corresponded to each
condition (free or metronome tapping). The order of the blocks was randomized for each subject. The
order of the phases was also randomized for each block and for each subject with the exception of the
spontaneous tempo block. In the spontaneous tempo block, the free tapping phase was first performed,
then the average tempo of the spontaneous tempo was calculated and this tempo was used for the
following metronome tapping. In each phase, each subject carried out the trials until the number of
successful trials (not failed trials) reached seven. In free tapping phases, subjects carried out seven to 15
trials including failed trials, but there were no failed trials in the metronome tapping phases. A 5-10 min
rest separated the trials, and the subjects took a rest between phases and between blocks for at least 20
min. Each subject performed one to three phases a day and completed the entire experiment within seven
to ten days. 35 successful trials were obtained (five subjects, seven trials) for each tempo and each
condition by the process described above.
3. RESULTS AND DISCUSSION
The IOI was plotted as a function of the order of the taps. As mentioned above, because the IOI was not
consistently stable in the initial portion of the trials, the initial 100 taps in the IOI fluctuation were
eliminated. The fluctuation of the remaining 1600 taps was decomposed into Fourier components by
DFT with Hanning window, and the power spectrum was calculated for each trial. The power was
averaged over every 1/2-octave band, and then the resulting spectra were averaged over the same tempo
and the same condition on a logarithmic scale. Using this process, a smooth 1/2-octave band power
spectrum was obtained for each tempo and for each condition.

Masashi Yamada
Fig. 1 Power spectra of temporal fluctuation in free tapping (lines with filled marks) and metronome
tapping (lines with empty marks) for various tempi.
3.1. Free Tapping
The lines with filled marks in Fig. 1 show the power spectra of the temporal fluctuation for free tapping.
As can be seen, the spectral features are similar for all tempi. In the high frequency region above
approximately 80 cycles for 1600 taps, the power is constant or slightly increases as the frequency
increases. On the other hand, the power increases as the frequency decreases in the low frequency region
below 80 cycles. The power of a frequency component indicates the difficulty of temporal control for the
frequency and the correlation between the frequency, f [cycles] and period, p [taps] is, p = 1600 / f.
Therefore the spectral features show that the temporal control in free tapping is excellent for a short

Masashi Yamada
period which is less than approximately 20 taps, but the control becomes worse as the period increases in
the long period region above 20 taps. These spectral features are consistent with the 20-tap memory
shown by Yamada (1996,1998), although the critical period is not definitively shown in the present
study.
This uncertainty may correlate with the averaging process: Yamada (1996, 1998) calculated the power
spectrum for each subject. In these spectra, the critical phenomenon of the control was clearly observed
and the critical period was distributed around 20 taps ranging from 12 to 27 taps. In the present study,
these individual spectra were averaged into one spectrum. This averaging process may have dulled the
features of the spectrum and resulted in the uncertainty of the critical period.
The lines with filled marks in Fig.1 also show that the slower tempi resulted in a higher spectral position
on the power axis. Yamada (1998) showed that the coefficient of variation of IOI in free tapping was
consistently distributed around 4.3 % for various tempi. Because the mean IOI in a slow tempo is long,
the standard deviation and the power in the fluctuation are large. Therefore, the difference in spectral
position between different tempi for free tapping in Fig.1 is consistent with Yamada (1998).
In conclusion, the spectra for free tapping in Fig. 1 suggests that the temporal control of free tapping is
characterized by the 20-tap memory and a consistent coefficient of variation of IOI, as in the previous
studies by Yamada.
In the low frequency region below two cycles, i.e., the long period region above 800 taps, the power
seems to be reaching a plateau. This phenomenon requires further study conducting experiments in
which the tapping is observed for a longer period of tap numbers.
3.2. Metronome Tapping
The lines with empty marks in Fig. 1 show the power spectra for metronome tapping. There are no
significant differences between different tempi in the low frequency region below approximately 50
cycles. In this region, the slope of all spectra is steep, which implies that the power rapidly decreases as
the period increases. On the other hand, the spectral features are quite different between different tempi
in the high frequency region above 50 cycles. For example, the spectrum for the 200 ms/tap tempo shows
a decrease in slope above 50 cycles, whereas in the case of the 800 ms/tap tempo, the spectrum maintains
a steep slope up to 500 cycles, above which it then shows a decrease in slope.
The spectrum for the 800 ms/tap tempo is interpreted as follows: The power of the highest frequency
above 400 cycles is significantly larger than the other frequency components. This implies that virtually
the entire fluctuation consists of a few components of short periods below about four taps. In other
words, the metronome ticking suppresses the fluctuation of a long period above four taps. On the other
hand, in the case of the 200 ms/tap tempo, the fluctuation consists of many components in which periods
are 1-30 taps and the metronome ticking suppresses the fluctuation for long periods above 30 taps.
Now, the question is what mechanism(s) yield such differences between the different tempi. Let us
observe the correlation between the spectra of metronome tapping and free tapping for each tempo. In the
cases of the 200 and 370 ms/tap tempi and the spontaneous tempo, the spectral features show that the
power of metronome tapping increases as the frequency increases in the low frequency region. However,
once the power intersects with the power spectrum of free tapping, the power of the metronome tapping
coincides with the spectrum of the free tapping above the intersecting frequency.
The metronome and free tapping tasks are different with regard to the existence of metronome ticking.
Therefore, the excellent control exhibited in metronome tapping in the low frequency region corresponds
to the consistent suppression by the metronome for all tempi. This consistent suppression by the
metronome may itself result in a steep slope for a wide frequency range, as in the spectrum of the 800

Masashi Yamada
ms/tap tempo. In the metronome tapping task, this suppression mechanism is active. However, the 20-tap
memory mechanism that governs free tapping, is also active in the metronome tapping task. Figure 1
definitely shows that, for the 200 and 370 ms/tap tempi and the spontaneous tempo, these two
mechanisms are both active in metronome tapping, and the fluctuation of a frequency component is
determined by the mechanism which provides better control for the frequency.
In the case of the 800 ms/tap tempo, the power of metronome tapping is significantly larger than that of
free tapping in the high frequency region. This suggests that metronome tapping is governed only by the
suppression mechanism in the case of a slow tempo. Why the 20-tap memory mechanism is not active for
metronome tapping in the slow tempo of 800 ms/tap still remains to be studied.
In the low frequency region below two or three cycles, the power seems to be reaching a minimum value.
As with the power reaching a plateau in free tapping, this phenomenon also requires further study in
which the tapping is observed for a longer period of tap numbers.
4. CONCLUSIONS
In the present study, therefore it is confirmed that the temporal control of free tapping is governed by a
20-tap memory for various tempi. Moreover, it is shown that in metronome tapping, a suppression
mechanism by the metronome exists throughout various tempi. For fast and intermediate tempi, the
temporal control of metronome tapping is governed by both the suppression and the 20-tap memory
mechanisms. However, in the case of a slow tempo, the 20-tap memory mechanism does not govern
metronome tapping directly. This aspect of the 20-tap memory mechanism requires further study.
REFERENCES
Akaike, H. (1969). Fitting autoregressive models for prediction. Ann. Inst. Statist. Math. 21,243-247.
Cooper, G. W. and Meyer, L. B. (1960). The Rhythmic Structure of Music. Chicago, The Univ. of
Chicago Press, pp. 3-4.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.). The Psychology of Music. New York,
Academic Press, pp.149-180.
Musha, T., Katsurai, K. and Terauch, Y. (1985). Fluctuations of human tapping intervals. IEEE Trans.
Biomed. Eng. BME-32, 578-582.
Yamada (1996). Temporal control mechanism in equaled interval tapping. Appl. Hum. Sci. 15, 105-110.
Yamada (1998). Temporal fluctuation in musical performances -Fluctuations caused by the limitation of
performersâ€(tm) controllability and by artistic expression-. Proc. 5th Int'l. Conf. Music Percept. Cogn.,
353-358.
Yamada, M. and Tsumura, T. (1997). Do piano lessons improve basic temporal controllability of
maintaining a uniform tempo? J. Acoust. Soc. Jpn. (E) 19, 121-131.
Back to index

Realizability of Time-Difference Stereo Using the Precedence Effect
Proceedings paper

Masuzo YANAGIDA and Masanobu MIURA
Dept. of Knowledge Engng. and Computer Sciences, Doshisha University
1. INTRODUCTION
Sound images are localized employing level difference and/or time difference between sounds that reach the ears on both the sides.
The phenomenon that sound image is localized in the direction from which the first wavefront comes in case where many sound
waves having almost the same waveform come from various directions is named the "law of the first wavefront" or the "precedence
effect". It can be stated in short as: "the first incoming wavefront determines the directions at which the auditory events are formed".
The propriety of the precedence effect is easily understood because the shortest sound path from the sound source generally coincides
with the line connecting the sound source and the listener if they can be connected directly without any obstacles on the way and the
wavefront along that line reaches the listener first. After the first wavefront along the direct path, other wavefronts reach the listener
in succession traveling other paths of longer lengths such as ones reflected on walls or ceilings. If our auditory system doesn't have
the precedence effect, sound image localization would be difficult being disturbed by reflected sounds, sometimes they are stronger
than the direct sound depending on curvature of the reflector surface.
After the precedence effect was first reported by Wallach, Newman and Rosenzweig (1949), there have been many studies on the
effect such as qualifying the time difference required to cancel the level difference between two loudspeakers yielding a definite
amount of displacement of the sound image from the center of the two loudspeakers. It was evaluated as the time difference required
to retain the sound image just in the middle of the two loudspeakers put apart producing the same waveforms but different in level.
Evoked potentials for time delayed two channel correlated noise were measured by McEvoy et al (1991) and the positions of current
dipoles for stimuli having interaural time difference or level difference were estimated also by McEvoy et al (1994) employing
Magneto-Encephalography. As an application of the precedence effect, a voice-leading system (1997) is developed for an emergency
system to lead people to emergency exits.
Here, we call a stereophonic system using the precedence effect "Time Difference Stereo", and this paper discusses the possibility of
realizing the new stereophonic system showing data of perceptual experiments using music sounds produced by five instruments on a
MIDI synthesizer.
Although there have been many research works on precedence effect, most of them were basic studies (Blauert, 1983; Gourevitch,
1987; Gilkey & Anderson, 1997) such as confirming the effect or quantitative equivalence to level difference using click sounds or
speech. Applications of the precedence effect to practical stereophonic localization was once intensively studied (Snow, 1954), but
were abandoned because mainly of difficulties in recording feasibility at that time. Recent recording sometimes introduces time
difference among musical instruments besides level difference, and some of sound field synthesizers employ time difference for
obtaining particular sound effects such as echoes, but not for sound image localization. Studied in the present paper is perceptual
evaluation of practical performance of pure time difference stereo, i.e. two-channel stereophonic reproduction based only on
precedence effect using two loudspeakers excluding level difference between them.
Definite application of the time difference stereo is car-stereo. Inside the cabin of cars, mutual distances among loudspeakers and
listeners widely differ according to listeners' seat positions, so conventional level difference stereo does not work properly, or as
expected. Concretely speaking, it is impossible for more than one listeners inside car cabin to obtain the same stereophonic effect
through loudspeakers at fixed positions by the conventional level difference scheme. By time difference scheme, however, much the
same stereophonic effect can be expected at different seat positions provided that time difference necessary and sufficient for
obtaining stereophonic effects is larger than the time difference between the wavefronts from the loudspeakers on the right and left
sides at each seat position. Basic two-loudspeaker configuration is assumed in the present paper, as it suffices for time difference
method though multi-loudspeaker configuration is currently used for conventional level difference scheme.
The following section briefly summarizes studies on the precedence effect, then practical realization of the time difference stereo is
described in 3 exploiting the merits of recording with MIDI systems, and its performance is compared with level difference method
on MIDI quintet by piano, guitar, drums, flute and bass.
2. SOUND IMAGE LOCALIZATION BY THE PRECEDENCE EFFECT
Interaural time difference and level difference are the main factors of sound image localization for coherent sounds (Blauert, 1983).
There exist, however, lower and upper boundaries for time difference to yield sensation of localization. The lower boundary
corresponds to a just-noticeable change and has been revealed to be between 630us and 1ms for signals of equal level. The upper
boundary, however, is hard to be measured as it differs according to the signal itself. In case the time difference exceeds a certain
value, the second sound is perceived as an echo of the first sound. So, the upper boundary corresponds to the echo threshold and is
measured for various sorts of signals. It varies from 2ms(for clicks) to 20ms(for speech), but it sometimes increases up to 100ms for
sinusoidal waves.
There seems a kind of equivalence between time difference and level difference. It is quantitatively evaluated by "Compensation
file:///g|/poster3/Yanagida.htm (1 of 6) [18/07/2000 00:39:10]

Factor" or "Trading Ratio"us/dB) defined as the time difference required to compensate 1dB level difference. Roser surveyed the
literature up to that time and noted that the Compensation Factor ranges from approximately 2 to 200us/dB. Trading curves for
broadband clicks at various levels were drawn by David (1959), and those for low-pass band-limited clicks at 20dB(SL) were
obtained by Harris (1960). It can be stated that time difference gets more effective for localization as both the levels increase. It seems
to exist a certain frequency limit beyond that time difference does not work for sound image localization. Interaural interaction
between time difference and level difference is so complicated that it seems impossible to express the relation in one dimensional
scale.
3. TIME DIFFERENCE STEREO BASED ON THE PRECEDENCE EFFECT
Sound image localization in convenient stereophonic listening systems is realized basically by level difference between loudspeakers
though time difference inevitably enters in ordinary multi-point stereophonic recording even if you want to remove it. This paper
proposes a stereophonic system of purely time different scheme for a two-channel stereophonic reproduction for multi-listeners in a
very small space such as inside cars. Of course, ideal stereophonic effect is easy to be realized for all the listeners if all of them wear
headsets. As it is queer to observe or absurd to make all the members in a car wear headsets, ordinary listening through fixed point
loudspeakers is preferable in practical situations. The time difference stereo is expected to be more suitable for car stereo than the
conventional level difference stereo for stereophonic reproduction independent of listening positions.
For realizing the time difference stereo, time control is required for each instrument. That requirement makes it almost impossible to
apply the time difference scheme to the mix-down process of multi-track source recorded for each instrument in live performances,
because time differences among the instruments are inevitably included in each track and are not separately controllable. Feasible
realization is to use synthetic sounds for each instrument, i.e. to use MIDI synthesizers. Though it is reported that sounds by MIDI
instruments may contain uncontrollable time errors (Road, 1996) of at the worst 7 to 21 ms, each sound required for the time
difference stereo is one stream for a single sound, so dead copies of waveform for a sound of an instrument can be used as the
corresponding waveforms for both the channels given a specified time difference.
4. SIMULATION OF THE TIME DIFFERENCE STEREO
Investigated here is localizability of source instruments by a pure time difference scheme. Test signals employed in the present
experiments are music played by quintet so as to confirm localization performance of the proposed time difference method.
Localization performance is compared between the proposed time difference method and the conventional level difference method.
5. DESCRIPTION OF THE EXPERIMENT
A piece of music for five instruments is employed here as a stimulus piece. Test sounds lasted about 42 seconds in actual time and is
played on Flute, Electric Guitar, Vibraphone, Base Guitar and Drums, all generated by a MIDI synthesizer. Test sounds were
recorded on CD-R and presented to subjects through a pair of loudspeakers in an anechoic chamber.
Figure 1(a) shows the allocation of loudspeakers and a listener in the standard listening configuration, while (b) resembles the
listening situation in small cars. The figures above the semi-circle represent direction IDs used for responses by subjects. Subjects
were asked to answer the direction of each instrument by direction IDs, provided that they can listen to the test sounds as they want.
1(a) 1(b)
Recording for time difference stereo is made on the left and the right channels on a CD-R in the same level but having specified time
difference between the channels for each instrument. The time difference for each instrument is decided based on the distance
difference of assumed position of the instrument. As the time difference between the two channels for each instrument was given
exactly in integer multiples of the sampling interval, without time error attributed to MIDI sound production.
Table 1 shows the experimental design for the time differences assigned to each instrument for the configurations in Fig.1, where In
represents n-th instrument and Sn denotes i-th stimulus. The unit of the time in Table 2 is the sampling interval of CD, i.e. about
22.7us. Positive values mean that the sound is radiated from the left loudspeaker first, while negative values, from the right first,
compared to the reference timing assigned at the center.
Table 2 shows the reproduction level of loudspeakers for each instrument for the configurations in Fig.1. The unit of the level in
Table 2 is dB. Positive values mean that the level of presentation is higher than the reference level assigned for the center, while
negative values, lower. Table 3 shows two patterns of instrument assignment in the experiments.
Table 1: Time difference for each instrument.

I1 I2 I3 I4 I5
S1 10 5 0 -5 -10
S2 30 15 0 -15 -30
S3 50 25 0 -25 -50
S4 100 50 0 -50 -100
S5 200 100 0 -100 -200
S6 400 200 0 -200 -400
S7 800 400 0 -400 -800
ã€€ã€€ ã€€ I1~ I5ã€€ u? Instruments (unit u? 1/44100 sec?22.7us)
Table 2: Level difference for each instrument.
I1 I2 I3 I4 I5
S8 1 0.5 0 -0.5 -1
S9 4 2 0 -2 -4
S10 7 3.5 0 -3.5 -7
S11 10 5 0 -5 -10
S12 13 6.5 0 -6.5 -13
(unit u? dB)
Table 3: Patterns of instruments assignment
I1 I2 I3 I4 I5
Pattern 1 Fl Bs Gt Vib Ds
Pattern 2 Vib Ds Bs Gt Fl
6. COMMENTS ON STIMULI
Wavefronts
As the precedence effect is also called the "law of the first wavefront", which side of right or left ear the wavefront reaches first is
very important, though even in case of continuous waveforms the temporal correspondence between the signals to both the sides
might be sensed if the sounds are not exactly periodical. There arises a bit of anxiety concerning the building-up shapes of the sounds
employed here as the precedence effect may differ depending on waveforms of the wavefronts. The waveforms of the virtual
instruments employed in the present experiment differ significantly and they may effect on the results of the experiments, so we
provide the beginning parts of the waveforms of the instruments employed in the experiment in Fig.3 presuming future reference.
(a) Flute (b) Vibraphone (c) Guitar (d) Bass guitar (e) Drums

Shared Delay Scheme
In the beginning stage of the present study, we prepared stimulus sounds only by giving delays to the waveforms to be delayed
(Fig.2(a)). It resulted, however, in poor localization because the first waves of the five instruments are radiated exactly at the same
time from both the loudspeakers. Then, tested next was a pair of sounds having temporal precedence on one side. These sounds,
however, yielded feelings of asynchrony because of timing lags among five instruments in their building-up states (Fig.2(b)). So, we
changed to give delays between the right and left loudspeakers by putting delay of one half the designed delay time to the loudspeaker
to be delayed and another half to the other as haste radiation or negative delays (Fig.2(c)). This type of "shared delay" proved to yield
better "composite precedence effect" than simple delay-only scheme, because the first wavefronts of the instruments are radiated at
different time points, and that makes the first wavefront of each instrument be heard separate and clear.
Figure 2: Three methods to generate the test signals having d-point difference. (a) preceding (b) delayed (c) shared time difference
7. EXPERIMENTAL RESULTS
Figures 4 compares the performances of the proposed time difference method and the conventional level difference method for
loudspeaker-listener configurations depicted in Fig.2. The abscissa in the figures for the Time Difference method is the absolute
maximum time difference or sampling counts, while that for the Level Difference method is the maximum level difference, in dB,
both assigned to the left- or the right-most instruments. The ordinate denotes the average of the direction IDs that subjects answered.
Clearer source images are located roughly in the desired direction (not exactly, but in the direction of loudspeakers) by the proposed
method than by the level difference method even in cases where level of sounds from the loudspeaker located in direction of
presumed source is smaller than that from the other.
Comparing the performance of the two methods for configuration 1, or for standard listening configuration, the upper half of Fig.4
claims that the time difference method yields sharp source image displacements by small time difference as 50 samples, i.e. about
1ms, while the level difference method shows gradual move of instrument images increasing the level difference. The level difference
method can locate sources in any direction between the loudspeakers, but the time difference method locates sources only in direction
of either of two loudspeakers or just the center of them.

Figure 4: Performance comparison between Time-Difference Stereo and Level-Difference Stereo for configurations depicted in Fig.2.
Pnu?Instrument Assignment Pattern n (n=1~5). Cnu?Configuration Pattern n (n=1~5).
ordinateu?perceived direction( + =right, - = left)
abscissau?time difference(samples) for the left-most instrument in Time-Difference stereo and level difference(dB) for the left-most
instrument in Level-Difference stereo
In case of car cabin situation, as typically shown in the lower half of Fig.4, the time difference method shows far better performance
than the level different method. Though the source images by the level difference method are weighted to the direction of the nearer
loudspeaker, the time difference method locates the source images near the designed directions though relative sound level suggests
the other. Two figures in the bottom left of Fig. 4 state that the amount of time difference required for satisfactory localization will be
around 200 sampling intervals or more, i.e. 4 to 5ms.
Subjects perceive echo-like sensation by two sounds having time difference more than around 800 sampling intervals, i.e. 18ms,
particularly pulsive sounds such as percussions. Some subjects say that flute is difficult to localize, sometimes it moves after once
fixed.
8. CONCLUSIONS
It is confirmed that stereophonic effects are obtained by the proposed time difference method with delay of 1 to 8 ms and it yields better
localization performance than the conventional level difference method for car-cabin situation in particular. Although live recording for the
proposed method is problematic, time-difference recording is easily accomplished employing MIDI synthesizers. Moreover, conventional
2-channel stereo reproducing systems are fully compatible for playing the disks or tapes recorded in the proposed scheme.
Further studies are required on precedence effect on periodical signals such as synthetic flute sound, and localizability by the time different
method in arbitrary directions where any loudspeaker does not exist. Also to be studied are precedence effect on composite signals as little is
studied on signals other than sinusoidal waves. Though it is reported that the precedence effect is not obtained on sinusoidal waves beyond 1
kHz, a lot of frequency components generally exist in ordinary sounds including those generated from MIDI synthesizers. So, precedence
effect on composite sounds including higher frequency components is expected to be studied as basic research.
ACKNOWLEDGEMENT
The authors express thanks to Prof. Masayuki Morimoto, Kobe University, for his initial helps and useful comments to this work, which has
been partly supported by Grant-in-Aid for Scientific Research (C) [10680395]"Stream Segregation in Music", Ministry of Education,
Science and Culture, Japan.
REFERENCES
Blauert, J. (1983) Spatial Hearing. MIT Press, Cambridge.
David, E. E., Guttman, N., & van Bergeijk, W. A. (1959) Binaural interaction of high-frequency complex stimuli. J. Acoust. Soc. Am.,
Vol.31, pp.774-782.
Gilkey, R. H., & Anderson, T. R. (1997) Binaural and spatial hearing in real and virtual environments. pp.191-197.
Gourevitch, G. (1987) Directional Hearing. Springer Verlag, p.85-98.
Harris, G. G. (1960) Binauralinteraction of impulsive stimuli and pure tones. J. Acoust. Soc. Am., Vol.32, pp.685-692.
Ito, Y., Ishiyama, Y., Ishii, H., & Ogushi, K. (1994) Study on escape guidance with voice using precedence effect in two-dimensional space.

Proc. of Spring Meeting of ASJ, 2-5-6(in Japanese).
McEvoy, L.K., Makela, J. P., Hamalainen, M., & Hari, R. (1994) Effect of interaural time differences on middle-latency and late auditory
evoked magnetic fields. Hear. Res., Vol.78, pp.249-257.
McEvoy, L. K., Picton, T. W., & Champagne, S. C. (1991) The timing of the processes underlying lateralization: psychophysical and
evoked potential measures. Ear & Hearing, Vol.12, pp.389-398.
Road, C. (1996) The Computer Music Tutorial. MIT Press, Cambridge, Massachusetts, pp.1009-1010.
Snow, W. B. (1954) Effect of arrival time on stereophonic localization. J. Acoust. Soc. Amer., Vo. 26. pp.1071-1074.
Wallach, H., Newman, E. B., & Rosenzweig, M. R. (1949) The precedence effect in sound localization. Am. J. Psych. 62, 315-337.
Back to index

Children's Compositions: Understanding the Process and Outcomes
CHILDREN'S COMPOSITIONS: UNDERSTANDING THE PROCESS AND OUTCOMES.
Rationale: Although there is a growing body of literature which focuses upon issues surrounding
children's composition there is still much to be understood about this topical and, at times,
controversial area. This symposium brings together a number of researchers who have focused upon
developing both a theoretical and practical understanding in different but related ways. Particular
emphasis is placed upon investigating the musical and social psychological processes involved in
compositional activities.
Aims: The aim of this symposium is to present a number of complementary approaches to
understanding children's compositional activities
Speakers: Two of the papers investigate the nature of children's collaborative compositions while two
of the presentations discuss music activities that children undertake by themselves. Louise Morgan,
David Hargreaves & Richard Joiner focus upon the group dynamics (both social and musical)
involved when children work in mixed gender groups. Raymond MacDonald, Dorothy Miell and
Laura Mitchell outline musical and verbal coding systems that can be utilised for analysing the
processes occurring between children working in pairs, highlighting the importance of social factors
such as friendship. Frederick Seddon and Susan O'Neill will discuss computer based composition,
with an emphasis upon the impact that formal music tuition has upon process and outcome in
composition. Charles Byrne focuses upon computer technology that can be used to enhance children's
musical inventing skills. His paper outlines a theoretical context for Spider's Web Composing Lessons,
a world wide web based interactive teaching resource.
Back to index
file:///g|/Wed/Ccsymp.htm [18/07/2000 00:39:11]

Seddon
Proceedings paper
ADOLESCENT ENGAGEMENT IN COMPUTER-BASED COMPOSITION: AN ANALYSIS OF THE PROCESS OF

COMPOSITION
Frederick A. Seddon and Susan A. O'Neill, Department of Psychology, Keele University
Background. Past research suggests that during computer-based composition, musically trained adolescents experiment less with
possibilities offered by the computer and produce compositions with more 'fixed ideas' about creating music than untrained adolescents
(Scripp, Meyaard and Davidson, 1988; Folkestad, 1998). Although a recent study with 11 year olds found no difference in teachers' ratings
of computer-based compositions by children with and without experience of formal instrumental music tuition (FIMT), the two groups of
children appeared to adopt different strategies when engaged in the computer-based composition task (Seddon and O'Neill, 1999). Further
research is needed which examines these differences. Additionally, the growing use of computer technology in music education makes it
necessary for teachers to be aware of the different ways in which adolescents might engage with this technology based on their prior
musical experience. Thus, the present paper outlines the development of a method for investigating and analysing the compositional
process used by adolescents engaged in a computer-based composition task.
Investigation of the process of composition may be conducted by observing the composer while composing (Sloboda, 1985) or by the
composer giving a verbal report during or after composition (Richardson & Whitaker, 1996). Past studies have used either or both of these
approaches. Observation, even through the use of a video camera (Odman, 1992; Daignault, 1996) runs the risk of interfering with the
process of composition by producing a 'surveillance effect'(Hickey, 1997). Concurrent 'verbal report' runs the risk of interfering with the
composition process, and retrospective 'verbal report' relies extensively upon memory. In addition to these disadvantages 'verbal report'
assumes composers are: a) aware of the process they are engaged in and, b) able to articulate full and accurate accounts of the process.
These methodological disadvantages may be particularly significant in relation to novice or relatively inexperienced composers such as
adolescents.
Computer technology has made it possible to record the development of compositions by saving 'midi files' at different points in time
creating 'snapshots' of the composition process (Hickey, 1995; Folkestad, 1998). This data collection method is analogous to methods
adopted in a study by Getzels and Csikszentmihalyi in 1976 where photographs were taken every six minutes recording the development of
visual artwork. However, recording the composition process by saving 'midi files' may still omit important data (e.g., ideas experimented
with but never reaching the recording stage). Our aim was to develop a method of data collection which would reduce 'surveillance effects'
whilst at the same time enabling the full compositional process to be examined.
Method. The method involved asking participants to engage with a computer-based composition task after two scripted thirty minute
training sessions. The training sessions were scripted to control for variations in training and focused on how to use the composition
program but did not provide any instruction in the compositional process itself. No musical examples were given that could have implied
'correct models' to copy. Having been instructed to 'compose a piece that sounds good to you', participants worked with a Yamaha PSR 530
music keyboard connected via MIDI to a computer with a researcher modified version of Cubase Score composition software program
installed. Following on from the two training sessions, participants had three individual 30-minute composition sessions on three
consecutive days in order to complete their composition. The composition sessions took place in a room designated solely for the use of the
participant to ensure privacy. Participants were asked to choose three sounds from ten sounds available. During the composition sessions
all 'on screen manipulations' of the program were unobtrusively recorded to videotape through a 'video-card' installed in the computer. In
addition to this videotape data, 'midi files' were saved using different name references via the 'save as' method (e.g., David 1, David 2 etc.)
for each participant at the end of each composition session. This would make it possible to cross-reference the videotape date with the 'midi
files' at three specific points in time.
Data analysis. The method of analysis used is adapted from qualitative inductive analysis of text where what is important to analyse in the
data emerges through a process of inductive reasoning from the data itself rather than being grouped according to predetermined
hypothetical categories (Maykut and Morehouse, 1994). This method is based on the notion of grounded theory (Glaser and Strauss, 1967)
which is a theory 'inductively derived from the study of the phenomenon it represents' (Strauss and Corbin, 1990, p.23) and differs from
traditional scientific method where hypotheses are generated indicating the relevant data to be collected. These hypotheses are then tested
by mathematically analysing the data (Goertz and LeCompte, 1981). The substantial quantity of data collected in this study makes the
constant comparative method (Glaser and Strauss, 1967; Lincoln and Guba, 1985) the most appropriate method of inductive analysis for
our purposes. The constant comparative method of analysing qualitative data involves the identification of 'units of meaning' that emerge
from the data and are coded by comparison to similar 'units of meaning' into 'categories'. When applied to text this involves the transfer of
raw data such as audio-taped interviews into clearly readable text from which 'units of analysis' are sought and coded with reference to their
original source. These units of analysis are then grouped into categories containing units of analysis of similar meaning. If a unit of analysis
emerges that can not be included in an existing category then a new category is formed to accommodate it leaving room for continuous
refinement in the process of analysis. Repeated reviews of the data create a high degree of familiarity with it facilitating the categorisation
process. The next stage is to formulate a 'rule for inclusion' for each category which units of analysis must fulfil for inclusion in that
category. The rule for inclusion is a propositional statement drawn from the units of analysis already assigned to the categories. Assigning
units of analysis to a category is then achieved by units of analysis fulfilling the rule for inclusion in the propositional statement. Some of
these propositions will stand alone and some will require grouping together to become outcome propositions contributing to an
understanding of the investigation.
file:///g|/Wed/Seddon.htm (1 of 13) [18/07/2000 00:39:20]

Seddon
During an initial exploratory review of the video data the videotapes were transcribed into general descriptions of 'activities' engaged in.
This first transcription process lead to the formulation of 'analysis codes' to facilitate the identification and classification of observable
'activities'(see appendix A). A second transcription of the videotapes made with the aid of the 'analysis codes' resulted in 'coded transcripts'
with sequential numbered 'events' containing coded 'activities'. These 'activities' are analogous to 'units of analysis' in constant comparative
method.
'Activities' ('units of analysis')
These observable 'activities' fall into two styles 'on task' (when the participant is clearly engaged with the presented composition task) or
'off task' (when the participant is clearly following a different agenda). The main 'on task' activities were labelled as: 'playing the keyboard',
'recording with the keyboard', 'replay' and 'editing'. Descriptions for each of these activities and 'off task' activities are given below (in
Table 1).
Table 1
It is possible to engage in these' on task' activities with or without employing the replay facility available in the 'Cubase' composition
program. This replay facility enables previously recorded parts to be heard during 'on task' activities. The 'click' (an electronic metronome
device to assist performance in strict time) may be used as a substitute for replay of a previously recorded part if preferred. If replay (or
'click') facilities are engaged during 'on task' activities this is labelled as engaging with 'aural reference', if replay (or 'click') facilities are
not engaged during 'on task' activities this is labelled as engaging without 'aural reference'. The second transcripts were then coded using
the 'analysis codes'. The 'coded transcripts' containing sequentially numbered coded 'events' were also colour co-ordinated for the type of
instrumental sound used (see Table 2)
Table 2

Seddon
Coding of transcript
Instruments chosen: Sound one, bass and drums
Session one
Event I: PK/NAR
Playing keyboard on drum sound experimenting with different sounds.
Event 2: PK/NAR
Playing keyboard on bass sound experimenting.
Event 3: PK/NAR
Playing keyboard on sound one experimenting
Using 'cut and paste' techniques a 'construction of parts' document was produced from the 'coded transcripts' for each participant in order to
trace the development of each instrumental sound part in the sequential order of 'events' (see Table 3).
Table 3
Construction of parts
Sound one part
Session one
Event 3: PK/NAR
Playing keyboard on sound one experimenting.
Event 5: PK/NAR
Playing keyboard on sound one, playing theme that is eventually used.
Event 8: PK/AR
Playing keyboard on sound one with 'click'
Using 'midi files' a 'musical score' of each instrumental part (as it was saved at the end of each recording session) was produced to allow for
comparison and cross referencing with the 'construction of parts' document (see Example 1).
By cross referencing all three documents (coding of transcript, construction of parts and musical scores) it was possible to trace
sequentially the 'activities' involved in the development of each instrumental part to include not only musical material that was recorded
and retained but also musical material that had been discarded. This gave a very detailed record of the composition process for each
participant. Examination of these detailed records revealed emerging patterns of behaviour enabling the formulation of propositional
statements leading to 'rules of inclusion'. Typical examples of two of the emerging patterns are described below along with their
propositional statements and 'rules of inclusion'.
Example of 'Pattern A' composition strategy.
Instruments chosen are: piano, strings and cello.

Session one
Event no 1 (PK/NAR) the participant begins by playing the keyboard on piano sound. It is noted that during this event a melody theme
emerges that is recognisable as being 'related' to the final piano part. Two attempts are made to record this same theme and are deleted. The
third recording of the same theme (Event 9) survives, is volume edited at Event no 10. and is then as it appears in the final composition.
(see Ex. 1)
Ex. 1.

Seddon
Neither of the other two sounds (strings and cello) are engaged with until the piano part is completed. At Event 11 (RK/AR), the participant
begins recording the cello 'accompaniment' with aural reference to the piano part. This first recording of the cello part is made without prior
experimentation and is deleted. The keyboard is played on cello sound developing the same part. The participant chooses not to aurally
reference the piano part while developing the cello part. Event 15 RK/AR results in the 14 bar cello part recorded with aural reference to
the piano part. This part is subsequently reduced to 9 bars through a 'cut/delete' note edit of bars 9-14 (Event 17) and remains as in (Ex. 2)
until it is deleted at the beginning of session two (Event 43).
Ex. 2.
The string part is started at Event 18: PK/NAR (i.e., playing keyboard on string sound), as with the cello part the participant chooses not to
aurally reference the previously recorded parts. Three recordings of the same string part are made with aural reference to the piano and
cello parts and are subsequently deleted before the fourth recording of the same part is made. This recording is also made with aural
reference to the piano and cello parts. The fourth recording remains but the last three bars 13-15 are 'cut/deleted' in a note edit (Event 32).
The string part remains as in (Ex 3) until it is deleted in session two, (Event 54).
Ex. 3.

Seddon
Session two
After replay the cello part is deleted (Event 43) re-recorded this time four bars longer (bars 10-13). This extension to the part is made
without any prior keyboard playing activity. Two recordings are deleted before the third is accepted (Event 49) all recordings are made
with aural reference to the piano part but with the strings muted. The cello part is now as in (Ex. 4).
Ex. 4.
Comparison of (Ex. 2) with (Ex. 4) reveals that although in bars 1-9 the notation of the part appears different the sound remains unchanged.
The cello part remains unchanged from (Ex. 4) in the final composition.
Having completed the cello part the participant resumes work on the string part. The string part is deleted (Event 54) and re-recorded with a
note change and slight extension to the part. Two recordings are rejected before the third one is accepted (Event 58) see (Ex. 5). All
recordings are made with aural reference to the piano and cello parts.
Ex. 5.
Comparison of (Ex. 3) with (Ex. 5) reveals that although in bars 1-9 the notation of the part appears different the sound remains unchanged.
The string part remains unchanged from (Ex. 5) in the final composition. There are no changes made to the piano part in this session.
Session three
There are no events during this session that make changes to any of the parts. The session is spent almost exclusively in 'off task' activity in
particular playing recogniseable tunes e.g., 'We Three Kings', 'Little Drummer Boy', and 'Super Trouper'.
This typical example of 'Pattern A' is charaterised by its lack of experimentation. The melody appears during the first event and although
there are three recordings of the melody made these were made to correct performance errors. The recording of this melody which is

Seddon
completed from beginning to end in one recording shapes the whole composition from this point, all that remains is for the participant to
harmonise it. The participant accomplishes this harmonisation during the second session displaying little experimentation with possible
alternatives. The composition being completed by the end of the second session leaves the third session available for any chosen activity.
The participant chooses to engage with performance skills rather than engage with the creative possibilities made available by the
computer. It is interesting to note that aural reference to previously recorded parts is not made while playing the keyboard (which could
have offered an opportunity to experiment with alternatives) but aural reference is made during recording with the keyboard to guide
performance. This is exemplified by the sequence of coded events: PK/NAR- RK/AR- RP/AR followed by a considered response to the
outcome. The participant's ability to re-record the same parts on different occasions indicates that the parts remain in aural memory and
could indicate that aural reference was being made covertly rather than overtly during playing the keyboard events.
Propositional statement for 'rules of inclusion' for 'Pattern A'
'Pattern A' is characterised by the predominant use of 'practicing' techniques rather than 'improvising' techniques during 'playing the
keyboard' activities. Recognisable elements of the final composition appear early in the composition process after few ideas have been
introduced. Recording with the keyboard will mainly consist of recording similar ideas correcting performance errors rather than
experimenting with alternative ideas. Individual parts will be composed from beginning to end before engaging with other parts usually
completing the melody first. Aural reference is employed during 'recording' but not during 'playing the keyboard' activities. The participant
will be more likely to engage with 'off task' keyboard playing.
Example of 'Pattern B'composition strategy.
Instruments chosen are: sound one, bass and drums.

Session one
The first four events are playing keyboard events (PK/NAR) all three sounds are experimented with separately. The first surviving idea
appears on 'sound one' at Event 5 (PK/NAR). The tempo is changed at Event 7 (TC). The first 14 bars of the eventual 'introduction' on
'sound one' are recorded at Event 9 (RK/AR). This 'introduction' section is recorded with aural reference to the 'click' after some prior
practice. Recording the drum part begins at Event 11 (RK/AR) with aural reference to the 'click' and 'sound one'. The first recording is
deleted and the second (a 5 bar drum part) is recorded with the 'click' at Event 15. This short part proves to be an experiment which is later
deleted (Event 22). At this point the participant leaves the drum part and resumes work on the 'sound one' part extending it by three bars at
Event 26 (RK/AR). The part is extended by 'overdubbing' (recording a new section to a part by recording over it without erasing the
original part). The sound one part is now as in (Ex. 6).
Ex 6
After deleting the 5 bar drum part (Event 22), a period of experimenting on drum sound with and without aural reference to 'sound one' and
'click' results in a 25 bar improvised drum part (see Ex. 7). This part is 8 bars longer than the existing 'sound one' part and is recorded at
Event 33 (RK/AR) making aural reference to the 'sound one' and 'click during recording.
Ex. 7.

Seddon
No bass part is recorded during session one but two events Event 2 (PK/NAR) and Event 28 (PK/AR) reveal experimentation with and
without aural reference.
Session two
Bars 18-25 of the session one drum part (Ex. 7) are deleted after replay at Event 37. The deleted section (bars 18-25) of the original drum
part is replaced and extended to 39 bars by 'overdubbing' at Event 41 (RK/AR). Comparison of (Ex. 7) and (Ex. 8) reveals this change to
the drum part.
Ex. 8.

Seddon
After a brief period of experimentation with possible bass 'riffs', Event 43 (PK/NAR), a bass part is recorded with aural reference to the
'sound one' and drum parts, Event 46 (RK/AR). This recording of the bass part is from bar 21-30 but bars 28-30 are subsequently deleted in
a note edit, Event 48 (ED(N)/AR). A period of practice, Events 50 and 52 (PK/AR) where the participant is playing keyboard on bass
sound while the drum part is replaying, results in an extension by 'overdubbing' to the bass part from bar 28-43, Event 54 (RK/AR). Bars
36-43 of this recording had not been practiced prior to recording and were an improvised section which was subsequently deleted (Event
57). The bass part at the end of session two is bars 21-36, see (Ex. 9).
Ex. 9.
No change to 'sound one' part during this session.

Session three
A note edit is made to the drum part (compare bars 29-30, Ex. 8 to bars 29-30, Ex. 11) then the participant moves to work on the bass part.
This begins with playing the keyboard on bass sound, Event 63 (PK/AR), first experimenting, then practicing new ideas to record. The new
section of the bass part, bars 37-52 (see Ex. 10) is then 'overdubbed' making aural reference to the drum part (bars 37- 39) and extended
beyond the recorded drum part to bar 52.
Ex. 10.

Seddon
Participant then moves to work on the drum part by playing the keyboard on drum sound, Events 68 and 70 (PK/AR), with aural reference
the latest bass recording (see Ex. 10). Eventually the drum part is extended by 'overdubbing' to bar 51, with aural reference to the bass part
Event 71(RK/AR) see (Ex. 11).
Ex. 11.

Seddon
The final event (Event 73) is the recording of a solo 'coda' on sound one bars 52-65 (see Ex. 12).
Ex. 12.

Seddon
This typical example of 'Pattern B' is charaterised by the way the composition develops over all three sessions. The participant experiments
with alternative musical material for the instrumental parts employing aural referencing techniques. This is exemplified by the sequence of
coded events : PK/AR-RK/AR-RP/AR followed by considered response to the outcome. Musical material is reviewed and sections are
deleted to be replaced by different material indicating that the closure of the creative process is not reached (if at all) until late in the
composition process. On occasion the composition is extended by employing improvisation techniques with the previously recorded parts
providing the stimulus for the current improvisation. It is doubtful that these improvised parts remain in aural memory. The absence of any
'off task' behaviours and the final event being a 'recording with the keyboard' event is further indication that the creative process is ongoing
rather than completed.
Propositional statement for 'rules of inclusion' for 'Pattern B'
'Pattern B' is characterised by the predominant use of 'improvising' techniques rather than 'practicing' techniques during 'playing the
keyboard' activities. New ideas are experimented with throughout the process of composition. Recording will be used to 'capture'
improvisations that may or may not survive in the final composition. The composition will evolve in sections rather than each part being
through composed. Aural reference will be made during 'playing the keyboard' activities in addition to 'recording' activities.
A further three 'patterns' of composition have emerged from the data: Patterns 'C', 'D' and 'E'. Broad descriptions of these patterns appear
below but space restrictions do not allow for detailed examples of these 'patterns' like those provided for 'A' and 'B'.
'Pattern C'
As with 'Pattern A' individual parts are completed from beginning to end before engaging with the next part. The main difference being the
harmonic structure of the accompaniment is completed first with the melody composed to fall within the harmonic boundaries created by
the accompaniment.
'Pattern D'
The main focus of attention in this 'pattern' is to achieve synchronisation between all three parts in strict time. A melody is recorded from
beginning to end using performance skill. On discovering the high levels of performance skill required to exactly synchronise the
remaining parts to the original the remaining time is spent manipulating various editing techniques ('copy/paste') to ensure the timing of the
parts is exactly the same. This can lead to parts being identical except for timbre.
'Pattern E'
This 'pattern' is predominantly random in nature lacking any observable structure in either the use of the program or composition process.
Both the Cubase program and the available sounds are extensively experimented with. The 'composition' is formed by the expiration of
time allowed rather than a decision being made to 'save' what has been produced.
Implications The data collection methods described above address the data collection shortcomings of past studies. Detailed and complete
data can be collected with reduced 'surveillance effect' and without relying upon participants' memory, levels of awareness or articulation
skills. The proposed procedures for analysis provide a more appropriate method for this type of data than can be achieved solely through a
statistical analysis of time spent in each activity.
We are currently engaged in a study for which data has been collected. The data has been through the initial process of analysis described
in this paper. This study involves 48 adolescents (aged 13-14 years). Twenty five (Female=12, Male=13) had between 2-4 years prior
experience of FIMT and twenty three (Female=12, Male=11) had no prior experience of FIMT. A review of the initial non-coded
transcripts suggests some of the identified 'patterns' of composition were adopted by groups of participants identified by their prior
experience and/or gender (e.g., 8 of the 12 females with prior experience of FIMT adopted 'Pattern A' and 6 of the 8 participants who

Seddon
adopted 'Pattern B' had no prior experience of FIMT). These initial interpretations support the results of past studies identifying different
ways of composing using music technology. For example 'Pattern A' corresponds closely to 'supplementary use' (where equipment is used
as a tool for arranging music) and 'Pattern B' corresponds closely to 'integral use' (where equipment is used as an interactive medium and
plays an integral part in composition) (Folkestad, 1991). 'Pattern A' could be described as 'Horizontal' composition and 'Pattern B' could be
described as 'Vertical' composition (Folkestad, Hargreaves, and Lindstrom,1998).
The detailed coded analysis of data from all 48 participant is in progress and it is predicted that the results will further our understanding of
the extent to which: a) adolescents employ different 'patterns of composition' similar to those described above when engaged in
computer-based composition and, b) these 'patterns of composition' may be linked to prior experience of FIMT and gender.
References
Daignault, L. (1996). A study of children's creative musical thinking within the context of a computer-supported improvisational approach
to composition. Unpublished doctoral dissertation. Chicago, U.S.A.: Northwestern University.
Folkestad, G. (1991). Music composition in the upper primary school with the help of synthesisers-sequencers. (Report No. 1991:19),
Stockholm: Center for Research in Music Education.
Folkestad, G., Hargreaves, D. J., Lindstrom, B. (1998). Compositional strategies in computer-based music making. British Journal of Music
Education (1998) 15: (1), (pp 83-97).
Getzels, J., & Csikszentmihalyi, M. (1976). The creative vision: a longitudinal study of problem finding in art. New York: John Wiley.
Glaser, B.G., & Strauss, A.L. (1967). The discovery of grounded theory. Chicago, Il: Aldine.
Goertz, J.P. & LeCompte, M.D. (1981). Ethnographic research and the problem of data reduction. Anthropology and Education Quarterly,
12, pp.51-70.
Hickey, M. (1995). Qualitative and quantitative relationships between children's creative musical thinking processes and products.
Unpublished doctoral dissertation. Chicago, U.S.A.: Northwestern University.
Hickey, M. (1997). The Computer as a tool in creative music making. Research Studies in Music Education No.8 July 1997.
Lincoln, Y. & Guba, E. (1985). Naturalistic enquiry. Beverly Hills, CA: Sage.
Maykut, P., & Morehouse, R. (1994). Beginning qualitative research: A philosophic and practical guide. London: The Falmer Press
Richardson, C.P., and Whitaker, N.L. (1996). Thinking about think alouds in music education research. Research Studies in Music
Education, No. 6. June 1996, pp. 38- 49.
Odman, P. J. (1992). Didactical/phenomenological aspects of creative music making with the help of computers. In Datorer I
musikundervisningen (11-21), Stockholm: Center for Research Music Education.
Scripp, L., Meyaard, J., and Davidson, L. (1988). Discerning musical development: Using computers to discover what we know. Journal of
Aesthetic Education, 22 (1), 75-88.
Seddon, F.A., & O'Neill, S.A. (1999). An evaluation study of computer-based compositions by children with and without prior experience
of formal instrumental music tuition. Accepted for publication Psychology of Music January 1999.
Sloboda, J. A. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford University Press.
Strauss, A. & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.
Appendix A
Analysis codes:
'On task activities'
'Playing the keyboard' codes
PK/AR Playing the musical keyboard with aural reference to previously recorded part(s) or with 'click' (similar to playing with a
metronome) during replay, experimenting with and developing ideas or practising prior to recording.

Seddon
PK/NAR Playing the musical keyboard without aural reference to previously recorded part(s) or 'click' (either because this is the first
sound engaged with and 'click' has not been activated or previously recorded parts have been 'muted', or replay is not activated),
experimenting with and developing ideas or practising prior to recording.
'Recording with the keyboard' codes
RK/AR Recording using the musical keyboard with aural reference to either 'click' or a previously recorded part(s).
RK/NAR Recording using the musical keyboard without aural reference to either 'click' or a previously recorded part(s).
'Replay' Codes
('Specific' replays that take place immediately following editing procedures are included in 'edit' coding with and without AR respectively.)
RP/AR 'Global' replay with aural reference to previously recorded parts or 'click'.
RP/NAR 'Global' replay without aural reference to previously recorded parts or 'click'.
'Editing' Codes.
ED(N)/AR Edits performed to change notes (or a group of notes) in time and/or pitch, erase, insert, extend, or identify, with aural
reference to previously recorded part(s).
ED(N)/NAR Edits performed to change notes (or a group of notes) in time and/or pitch, erase, insert, extend, or identify, without
aural reference to previously recorded part(s).
ED(V)/AR Edits performed to change the volume of notes (or a group of notes) with aural reference to previously recorded part(s).
ED(V)/NAR Edits performed change the volume of notes (or a group of notes) without aural reference to previously recorded
part(s).
DP Deletes part
C/P Copy/paste of part.
TC Tempo change
'Off Task activities'
'Off task' behaviours codes
PI Period of inactivity (Mouse remains motionless for more than 5 seconds).
PPI Prolonged period of inactivity (Mouse remains motionless for longer than one minute).
PK/OT Playing the keyboard in 'off task' way, random, displaying 'frustration' or 'recognisable' tunes.
'Error' Codes
EUP Events resulting from 'errors' using the program, 'accidents' or actions exploring the program that have no obvious intent.
PM Program malfunctions resulting from unknown sources or misuse of the program.
PD Program defaults preventing participants actions (e.g., 'cut' mid bar or when program defaults to the start when record button is
pressed.)
Back to index

Probe tone methodology: The retrospective methodology
Proceedings paper
Dynamics of Tonality Induction: A new method and a new model

Carol Lynne Krumhansl and Petri Toiviainen, Cornell University and University of Jyväskylä
Tonality induction refers to the process through which the listener develops a sense of the key of a
piece of music. The concept of tonality is central to Western music, but eludes definition. From the
point of view of musical structure, tonality is related to a cluster of features, including musical scale
(usually major or minor), chords, the conventional use of sequences of chords in cadences, and the
tendencies for certain tones and chords to suggest or be "resolved" to others. From the point of view
of experimental research on music cognition, tonality has implications for establishing hierarchies of
tones and chords and for inducing certain expectations in listeners about how sequences will continue.
One method for studying the perception of tonality, the probe tone method, has been used extensively
and a new variant of it will be described here. In addition to experimental studies, considerable effort
has been spent developing computational models producing various symbolic and neural network
models. A new approach to computational modeling will be described, which lends itself to a dynamic
geometric representation of tonality perception.
Probe tone methodology: The retrospective judgment
The experimental method introduced in the Krumhansl and Shepard (1979) study is sometimes
referred to as the probe tone method. It is best illustrated with a concrete example. Suppose you hear
the tones of the ascending C major scale: C D E F G A B. There is a strong expectation that the next
tone will be the tonic, C, first, because it is the next logical tone in the series and, second, because it is
the tonic of the key. In the experiment, the incomplete scale context was followed by the tone C (the
probe tone), and listeners were asked to judge how well it completed the scale on a numerical scale (1
= very bad, 7 = very good). As expected, the C received the maximal rating. Other probe tones,
however, also received fairly high ratings, and they were not necessarily those that are close to the
tonic C in pitch. For example, the most musically trained listeners also gave high ratings to the
dominant, G, and the mediant, E. In general, the tones of the scale received higher ratings than the
non-scale tones, C# D# F# G# A#. This suggested that it was possible to get quantitative judgments of
the degree to which different tones are perceived as stable, final tones in tonal contexts.
The subsequent Krumhansl and Kessler (1982) study used this method with a variety of musical
contexts at the beginning of the trials. They were chosen because they are clear indicators of the key.
They included the scale, the tonic triad chord, and chord cadences in both major and minor keys.
These were followed by all possible probe tones in the chromatic scale, which the listeners were
instructed to judge in terms of how well they fit with the preceding context in a musical sense.
Different major keys were used, as were different minor keys, but the results were similar when
transposed to a common tonic. Also, the results were similar independent of which particular context
was used. Consequently, the data were averaged over these factors. We call the resulting values the
K-K profiles, which can be expressed as vectors. The vector for major keys is: K-K major = <6.35,
2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88>. The vector for minor keys is: K-K
minor = <6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17>.
We can generate K-K profiles for 12 major keys and 12 minor keys from these. If we adopt the
file:///g|/Wed/Krumhans.htm (1 of 10) [18/07/2000 00:39:30]

convention that the first entry in the vector corresponds to the tone C, the second to C#/Db, the third
to D, and so on, then the vector for C major is: <6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39,
3.66, 2.29, 2.88>, the vector for C# major is: : <2.88, 6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19,
2.39, 3.66, 2.29>, and so on. The vectors for the different keys result from shifting the entries to
appropriate number of places to the tonic of the key.
Krumhansl and Kessler (1982) then used these data to study how the sense of key develops and
changes over time. They used ten nine-chord sequences, some of which contained modulations
between keys. Listeners did the probe tone task after the first chord, then after the first two chords,
then after the first three chords, and continued until the full sequence was heard. This meant that 12
(probe tones) x 9 (chord positions) x 10 (sequences) = 1080 judgments were made by each listener.
Each of the 90 probe tone ratings were compared with the ratings made for the unambiguous
key-defining contexts. That is, each set of probe tone ratings was correlated with the K-K profiles for
the 24 major and minor keys. For some of the sets of probe tone ratings (some probe positions in some
of the chord sequences), a high correlation was found indicating a strong sense of key. For other sets
of probe tone ratings, no key was highly correlated which was interpreted as an ambiguous sense of
key.
Probe tone methodology: The concurrent judgment
As should be obvious from the above, the retrospective probe tone requires an intensive empirical
effort to trace how the sense of key develops and changes, even for short sequences. In addition, the
sequence needs to be interrupted and the judgment is made after the sequence has been interrupted.
For these reasons, the judgments may not faithfully mirror the experience of music in time. For these
reasons, we were motivated to develop an alternative form of the probe tone methodology. In this
method, which we call the concurrent judgment, the probe tone is presented continuously while the
music is played. The complete passage is sounded together with a probe tone. Then the passage is
sounded again, this time with another probe tone. This process is continued until all probe tones have
been sounded.
In our initial application of this method, the passage was J. S. Bach's Organ Duetto IV, BWV 805. Its
duration is slightly longer than three minutes. The piece contains an interesting pattern of modulations
including a repeated, highly chromatic passage. At the beginning of the session, the listener heard the
entire passage from beginning to end without any probe tone so that they could become familiar with
the piece. During each trial, the piece was repeated twelve times, each time with a different probe
tone. The probe tone was sounded over six octaves spanning the range of the piece, similar to a
'Shepard tone'. The order of the probe tones was determined randomly and was different for each
subject.
To reduce the effects of sensory dissonance, the probe tone was sounded only in the left ear, while the
music was sounded only in the right ear. To help listeners continue to attend to the probe tone, it was
pulsed at the beginning of each measure. Listeners were instructed to use a computer mouse to move a
slider left and right to indicate the extent to which the probe tone fit with the music. The left end of
the scale was labeled "Fits poorly" the right end of the scale was labeled "Fits well". The computer
program, written in MAX, recorded the position of the slider every 200 msec. Because the task
requires concentration, only highly trained musicians were run in this initial application.
A geometric map of key distances from the tonal hierarchies
Krumhansl and Kessler (1982) used the K-K profiles to generate a geometric representation of
musical keys. The basic assumption underlying this approach was that two keys are closely related to

each other if they have similar tonal hierarchies. That is, keys were assumed to be closely related if
tones that are stable in one key are also relatively stable in the other key. To measure the similarity of
the profiles, a product-moment correlation was used. It was computed for all possible pairs of major
and minor keys, giving a 24 x 24 matrix of similarity values showing how similar the tonal hierarchy
of each key is to every other key. The correlations between the C major profile and the 24 major and
minor keys, and the correlations between the C minor profile and all the 24 major and minor keys
were presented in Krumhansl (1990, p. 38). To give some examples, C major correlated relatively
strongly with A minor (.651), G major and F major (both .591), and with C minor (.511). C minor
correlated relatively strongly with Eb major (.651), C major (.511), Ab major (.536), and F minor and
G minor (both .339). The same transposition-shift principle can be used to find the correlations for all
pairs of major and minor keys.
A technique called multidimensional scaling was then used to create a geometric representation of the
key similarities. The algorithm locates 24 points (corresponding to the 24 major and minor keys) in a
spatial representation to best represent their similarities. It searches for an arrangement such that
points that are close correspond to keys with similar K-K profiles (as measured by the correlations). In
particular, non-metric multidimensional scaling seeks a solution such that distances between points
are (inversely) related by a monotonic function to the correlations. A measure (called 'stress')
measures the amount of deviation from the best-fitting monotonic function. The algorithm can search
for a solution in any specified number of dimensions. In this case, a good fit to the data was found in
four dimensions.
The four-dimensional solution located the 24 keys on the surface of a torus (generated by one circle in
dimensions 1 and 2, and another circle in dimensions 3 and 4). Because of this, any key can be
specified by two values: its angle on the first circle and its angle on the second circle. The result can
be depicted in two dimensions as a rectangle where it is understood that the left edge is identified with
the right edge, and the bottom edge is identified with the top edge. The solution obtained was similar
to that shown in Figure 1 (see below). As can be seen, the locations of the 24 keys are interpretable in
terms of music theory. There is one circle of fifths for major keys (...F#/Gb, Db, Ab, Eb, Bb, F, C, G,
D, A, E, B, F#/Gb..) and one circle of fifths for minor keys (...f#, c#, g#, d#/eb, bb, f, c, g, d, a, e, b,
f#,...). These wrap diagonally around the torus such that each major key is located near both its
relative minor (for example, C major and a minor) and its parallel minor (for example, C major and C
minor).

Figure 1. a) The configuration of a toroidal SOM trained with the 24 K-K profiles. b) the response of
one subject, displayed on the SOM, at a point with a clear tonality (at 9.5 measures); c) the response
of Model 1 at the same point as in b; d) the response of the subject at a point with a less clear tonality
(at 49 measures); e) the response of Model 1 at the same point as in d; f) the response of the subject at
a point with a weak tonality (at 89 measures); g) the response of Model 1 at the same point as in f;

Representing the sense of key on the torus

The continuous spatial medium in which the 24 major and minor keys are located affords representing
the changing sense of key in a graphical form. Krumhansl and Kessler (1982) used a technique called
multidimensional unfolding to do this. It is a method that is closely related to multidimensional
scaling. Multidimensional unfolding begins with a multidimensional scaling solution, in this case the
torus representation of the 24 major and minor keys. This solution is considered fixed. The algorithm
then finds a point in the multidimensional scaling solution to best represent the sense of key at each
point in time. Let P1 be the probe tone ratings after the first chord in a sequence; it is a 12-dimensional
vector of ratings for each tone of the chromatic scale. This vector is correlated with each of the 24
K-K vectors, giving a 24-dimensional vector of correlations. The unfolding algorithm finds a point to
best represent these correlations. Suppose P1 correlates highly with the K-K profile for F major and
fairly highly with the K-K profile for D minor. Then the unfolding algorithm will produce a point near
these keys and far from the keys with low correlations. Then the vector of correlations is computed for
P2 , and this process continues until the end of the sequence.
In this manner, each of the ten nine-chord sequence used by Krumhansl and Kessler (1982) generated
a series of nine points in the torus representation of keys. For nonmodulating sequences, the points
remained in the neighborhood of the intended key. For the modulating sequences, the first points were
near the initial intended key, then shifted to the region of the second intended key. Modulations to
closely related keys appeared to be assimilated more rapidly than those to distantly related keys, that
is, the points shifted to the region of the new key more rapidly.
Measurement assumptions of the multidimensional scaling and unfolding methods
The above methods make a number of assumptions about measurement, only some of which will be
noted here. The torus representation is based on the assumption that correlations between the K-K
profiles are appropriate measures of interkey distance. It further assumes that these distances can be
represented in a relatively low-dimensional space (four dimensions). This latter assumption is
supported by the low stress values (high goodness-of-fit values) of the multidimensional scaling
solution. It was further supported by a subsidiary Fourier analysis of the K-K major and minor
profiles, which found two relatively strong harmonics (see Krumhansl, 1990, p. 101). In fact, plotting
the phases of the two Fourier components for the 24 key profiles was virtually identical to the
multidimensional scaling solution. This supports the torus representation, which consists of two
orthogonal circular components. Nonetheless, it would seem desirable to see whether an alternative
method with completely different assumptions reproduces the same toroidal representation of key
distances.
The unfolding method also adopts correlation as a measure of distances from keys, this time using the
ratings for each probe position and the K-K vectors for the 24 major and minor keys. The unfolding
technique finds the best-fitting point in the four-dimensional space containing the torus. It does not
provide a way of representing cases in which no key is strongly heard because it cannot generate
points outside the space containing the torus. Thus, an important limitation of the unfolding method is
that it does not provide a representation of the strength of the key or keys heard at each point in time.
For this reason, we sought a method that is able to represent both the region of the key or keys that are
heard, together with their strengths.
SOM map of keys
The self-organizing map (SOM; Kohonen, 1997) is an artificial neural network that simulates the

formation of ordered feature maps. The SOM consists of a two-dimensional grid of units, each of
which is associated with a reference vector. Through repeated exposure to a set of input vectors, the
SOM settles into a configuration in which the reference vectors approximate the set of input vectors
according to some similarity measure; the most commonly used similarity measures are the Euclidean
distance and the direction cosine. The direction cosine between an input vector and a reference
vector is defined by
. (1)
Another important feature of the SOM is that its configuration is organized in the sense that
neighboring units have similar reference vectors. For a trained SOM, a mapping from the input space
onto the two-dimensional grid of units can be defined by associating any given input vector with the
unit whose reference vector is most similar to that particular input vector. Because of the organization
of the reference vectors, this mapping is smooth in the sense that similar vector are mapped onto
adjacent regions. Conceptually, the mapping can be thought of as a projection onto a non-linear
surface determined by the reference vectors.
We trained the SOM with the 24 K-K profiles. The SOM had a toroidal configuration, that is, the left
and the right edges of the map were connected to each other as were the top and the bottom edges.
The resulting map is displayed at the top of Figure 1. The configuration of the SOM is highly similar
to the multidimensional scaling solution (Krumhansl & Kessler, 1982) and the Fourier-analysis-based
projection (Krumhansl, 1990) obtained with the same set of vectors. Furthermore, Euclidean distance
and direction cosine used as similarity measures in training the SOM yielded identical maps.
Representing the sense of key on the SOM
In addition to this localized mapping, a distributed mapping can be defined by associating each unit
with an activation value. For each unit, this value depends on the similarity between the input vector
and the reference vector of the unit. Specifically, the units whose reference vectors are highly similar
to the input vector have a high activation, and vice versa. The activation value of each unit can be
calculated, for instance, using the direction cosine of Equation 1. Dynamically changing data from
either probe-tone experiments or key-finding models can be visualized as an activation pattern that
changes over time. The location and spread of this activation pattern provides information about the
perceived key and its strength. More specifically, a focused activation pattern implies a strong sense
of key and vice versa.
Tone transitions and key-finding
All the key-finding models presented to date are static in the sense that they ignore the temporal order
of tones. The order in which tones are played may, however, provide additional information that is
useful for key-finding. This is supported by studies on both tone transition probabilities (Fucks, 1962;
Youngblood, 1958; Knopoff & Hutchinson, 1978) and perceived stability of tone pairs in a tonal
context (Krumhansl, 1979, 1990). Fucks (1962) found that, in samples of compositions by Bach,
Beethoven, and Webern, only a small fraction of all the possible tone transitions were actually used
(the fractions were 23, 16, and 24 percent, respectively). Furthermore, Youngblood (1958) showed
that, in a sample of 20 songs by Schubert, Mendelssohn, and Schumann, there is an asymmetry in the
transition frequencies in the sense that certain tone transitions were used more often than their

inversions. For instance, the transition B-C was used 93 times, whereas the transition C-B was used
only 66 times. A similar asymmetry was found in the study on perceived stability of tone pairs in a
tonal context by Krumhansl (1990). The study showed that, after the presentation of a tonal context,
tone pairs that ended with a tone that was high in the tonal hierarchy were given higher ratings than
their inverses. For instance, in the context of C major, the ratings for the transitions B-C and C-B were
6.42 and 3.67, respectively.
Determining tone transitions in a piece of polyphonic music is not a trivial task, especially if one aims
at a representation that corresponds to perceptual reality. Even in a monophonic piece, the transitions
can be ambiguous in the sense that their perceived strengths may depend on the tempo and may vary
from one individual to another. Consider, for example, the tone sequence C4-G3-D4-G3-E4, where all
the tones have equal durations. When played slowly, this sequence is heard as a succession of tones
oscillating in pitch. With increasing tempi, however, the subsequence C4-D4-E4 becomes
increasingly prominent. This is because it is segregated from the stream of tones due to the temporal
and pitch proximity of its members. With polyphonic music, the ambiguity of tone transitions
becomes even more obvious. Consider, for instance, the sequence consisting of a C major chord
followed by a D major chord, where the tones of each chord are played simultaneously. In principle,
this passage contains nine different tone transitions. Some of these transitions are, however, perceived
as stronger than the others. For instance, the transition G-A is, due to pitch proximity, perceived as
stronger than the transition G-D.
It seems thus that the analysis of tone transitions in polyphonic music should take into account
principles of auditory stream segregation (see Bregman, 1990). Furthermore, it may be necessary to
code the presence of transitions on a continuous instead of a discrete scale. In other words, each
transition should be associated with a strength value instead of just coding whether that particular
transition is present or not. Below, a dynamical system that embraces these principles is described. In
regard to the evaluation of transition strength, the system bears a resemblance to the model of
apparent motion in music presented by Gjerdingen (1994).
Pitch transition model
Let the piece of music under examination be represented as a sequence of tones, where each tone is
associated with pitch, onset time, and duration. The main idea of the model is the following: given any
tone in the sequence, there is a transition from that tone to all the tones following that particular tone.
The strength of each transition depends on three factors: pitch proximity, temporal proximity, and
duration of tones. More specifically, a transition between two tones has the highest strength when the
tones are proximal in both pitch and time as well as have long durations. These three factors are
included in the following dynamical model.
Representation of input. The pitches of the chromatic scale are numbered consecutively. The onset
times of tones having pitch are denoted by , , and the offset times by , ,
where is the total number of times the kth pitch occurs.
Pitch vector . Each component of the pitch vector has non-zero value whenever a tone
with the respective pitch is sounding. It has the value of 1 at each onset at the respective pitch, decays
exponentially after that, and is set to zero at the tone offset. The time evolution of is governed by
the equation

, (2)
where denotes the time derivative of and the Dirac delta function (unit impulse function).
The time constant has the value of . With this value, the integral of saturates at
about 1 sec after tone onset, thus approximating the durational accent as a function of tone duration
(Parncutt, 1994).
Pitch memory vector . The pitch memory vector provides a measure of both the
perceived durational accent and the recency of notes played at each pitch. In other words, a high value
of indicates that a tone with pitch and a long duration has been played recently. The dynamics of
are governed by the equation
(3)
The time constant determines the dependence of transition strength on the temporal distance
between the tones. In the simulations, the value of has been used, corresponding to typical
estimates of the length of the auditory sensory memory (Darwin, Turvey & Crowder, 1972; Fraisse,
1982; Treisman, 1964).
Transition strength matrix . The transition strength matrix provides a measure of the
instantaneous strengths of transitions between all pitch pairs. More specifically, a high value of
indicates that a long tone with pitch has been played recently and a tone with pitch is currently
sounding. The temporal evolution of is governed by the equation
. (4)
In this equation, the non-linear term is used for distinguishing between

simultaneously and sequentially sounding pitches. This term is non-zero only when , that is,
when the most recent onset of pitch has occured more recently than that of pitch . The term
weights the transitions according to the interval size. For the parameter , the value
has been used. With this value a perfect fifth gets a weight of about 0.37 times the weight of a
minor second.
Dynamic tone transition matrix . The dynamic tone transition matrix is obtained by
temporal integration of the transition strength matrix. At a given point of time, it provides a measure
of the strength and recency of each possible tone transition. The time evolution of is governed by
the equation
(5)

where the time constant is equal to , that is, .

To examine the role of tone transitions in key-finding, we developed two key-finding models. Model
1 is based on pitch class distributions. Model 2 is based on tone transition distributions. Below, a brief
description of the models is given.
Key-finding Model 1
Model 1 is based on pitch class distributions only. It uses a pitch class vector , which is similar
to the pitch vector used in the dynamic tone transition matrix, except that it ignores octave
information. Consequently, the vector has 12 components that represent the pitch classes. The pitch
class memory vector is obtained by temporal integration of the pitch class vector according to
the equation
. (6)
Again, the time constant has the value . To obtain estimates for the key, vector is
correlated with the probe-tone rating profiles for each key.
Key-finding Model 2
Model 2 is based on tone transitions. Using the dynamic transition matrix , it calculates the
octave-equivalent transition matrix according to
(7)
In other words, transitions whose first and second tones have identical pitch classes are considered
equivalent, and their strengths are added. Consequently, the direction of transition is not taken into
account. To obtain estimates for the key, the pitch class transition matrix is correlated with the
matrices representing the perceived stability of two-tone transitions for each key (Krumhansl, 1990).
Sample results
Figure 1 shows some sample results from one of the participants in the experiment, a highly trained
musician. This musician is a graduate student of composition with more than twenty years
performance experience on the piano and some additional years on other instruments. Figure 1 b
shows the results for the listener at measure 9.5. A V-I cadence in A minor has just occurred and the
melody contains a descending diatonic line ending on a half-note A, followed by a tonic - leading tone
- tonic alternation. This is the conclusion of the opening passage played by the left hand only and the
right hand joins at this point in time. As can be seen, the sense of tonality is strongly focused on A
minor. Figure 1 c shows the results for Model 1 which are highly similar, again with a strong focus on
A minor. (Model 2 results were in general highly similar to Model 1, agreeing with the subject
slightly more than Model 1. Because of issues about how best to visualize the results of Model 2, we
show only Model 1 here.) Figure 1 d, e shows the results at measure 49. The right hand contains what
would be a tonic - leading tone - tonic in E major and E minor; the mode is ambiguous because both
G and G# appear. This leads to an ambiguity that spreads to other close related keys which contain the

other chromatic tones, C#, D#, and A#, that appear in this passage. Figure 1 f, g show the results at
measure 89. As can be seen, no clear tonal focus is found. The music is highly chromatic; of the 12
tones of the chromatic scale, all but G# appears in the three preceding three measures. Thus, these
results suggest that both listeners and the algorithm can generate musically interpretable, and highly
dynamic representations of tonality.
References
Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: M.I.T. Press.
Darwin, C. J., Turvey, M. T., & Crowder, R. G. (1972). An auditory analogue of the Sperling partial
report procedure: evidence for brief auditory storage. Cognitive Psychology , 3, 255-267.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.),The psychology of music.. San Diego, CA:
Academic.
Fucks, W. (1962). Mathematical analysis of the formal structure of music. I R E Transactions of
Information theory , 8, 225-228.
Knopoff, L. & Hutchinson, W. (1978). An index of melodic activity. Interface, 7, 205-229.
Kohonen, T. 1997. Self-organizing maps.. Berlin: Springer-Verlag.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford.
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological Review, 89, 334-68.
Krumhansl, C. L. & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a
diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5,
579-94.
Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms.
Treisman, A. M. (1964). Verbal cues, language, and meaning in selective attention. American Journal
of Psychology, 77, 206-219.
Youngblood, J. E. (1958). Style as information. Journal of Music Theory, 2, 24-35.
Back to index

Dr Nicola Dibben
MAKING MUSIC MEAN
Dr Nicola Dibben
n.j.dibben@sheffield.ac.uk
Background:
In very broad terms, two received views of music and meaning can be identified.
The first is that meaning is inherent in musical material. This approach can be
seen in the work of music psychologists on emotional responses to music and the
perception of meaning. The second view is that the meanings attributed to music
are wholly constructed - an approach that can be found in recent musicological
writings.
Aims:
This paper argues against both an entirely immanent view of musical meaning,
and a naively constructivist account, and puts foward an alternative which
attempts to capture the mediating role of musical material conceived as
socially and historically constituted.
Main contributions:
I argue that music is made to mean through a range of processes (e.g. discourse
about music such as journalistic and musicological writing, the values created
for music by its use in advertising and its marketing, the rituals and
practices which accompany music performance and consumption, etc). The way in
which music is made to mean is not simply a free-for-all in which "anything
goes" but a process in which meanings are created, shared, sustained, and
appropriated. The role of musical material in this is that it bears the traces
of its history of use, and thus embodies social sediment in its material form.
Implications
This view of musical meaning recognises the mutuality of listener and musical
material; it avoids essentialising meaning as inherent in the sounds of music
but recognises the role of compositional material as socially and historically
formed; and acknowledges the role of a wide range of social processes in the
construction of meaning in music. Empirical work which already begins to
explore meaning and music in this way is presented and the implications for
future empirical research are outlined.
Back to index
file:///g|/Wed/Dibbenab.htm [18/07/2000 00:39:31]

Barriers on the road toward a theory of music perception
Proceedings paper
Towards an on-line model of music perception

by
Dirk-Jan Povel and Erik Jansen
Nijmegen, The Netherlands
1 Introduction
1.1 Aim
1.2 Theoretical framework
1.3 Influential earlier studies
1.3.1 Van Dyke Bingham (1910)
1.3.2 Cuddy, Cohen, & Mewhort (1981)
2 Tracing the perceptual mechanisms in music processing.
2.1 Introduction
2.2 Experimental studies
2.2.1 Series containing diatonic and chromatic tones
2.2.2 Series only containing diatonic tones
2.2.3 Accentuation patterns
2.2.4 Harmonic factors
3 Conclusions
4 References
1. Introduction
1. Aim
In the past fifty years a great amount of studies have revealed a large number of factors that play a role
in the perception of music (Krumhansl, 2000). However, knowing what factors play a role is only a
first step towards understanding music perception. Real insight presupposes a theory that specifies
how these factors function in the processing of music, or more precisely a theory that specifies what
transformations are performed on an input leading to a mental representation. Only such a theory can
make specific predictions about how a concrete series of tones is perceived. Frameworks for a theory
of music perception have been proposed by Deutsch & Feroe (1981) and Lerdahl & Jackendoff (1983).
Although experimental evidence has been reported supporting these frameworks, no concrete
predictions can be derived from these theories.
file:///g|/Wed/Povel.htm (1 of 9) [18/07/2000 00:39:33]

The goal of this study is to develop a computational model, based on a set of assumptions, that
captures the on-line processing of music. The model construes music perception in terms of 1) the
activation of pertinent musical knowledge stored in the listener's long term memory, and 2) the
application of perceptual mechanisms that organize the elements in the input into a coherent mental
representation. The viability of the model is investigated in experiments that examine how perception
evolves while the stimulus is presented incrementally, by studying goodness judgments and the
expectations that arise in the process. The model we are developing mainly pertains to the stage in
which the elements in the input is transformed into a mental representation.
The points of departure of this study are: 1) a theoretical framework and 2) two earlier experimental
studies.
2. Theoretical framework
First we present a global outline of a model of music perception that schematically represents the
primary processes in music perception. See Figure1.
First, the scheme indicates that music perception is a process in which two types of information
interact: bottom-up information, consisting of the series of pitches presented to the listener
(represented as f1, f2, f3,... in the figure), and top-down information represented by all knowledge
relevant to music perception stored in long term memory. Second, the scheme conveys the incremental
character of music perception by the cyclic pattern in which pitch input is entered sequentially and a
succession of processes is executed repeatedly. Third, three groups of processes are displayed, all
relying on information stored in long term memory (LTM) denoted by the arrows, and each generating
different perceptual products. The first group of processes relates to the establishment of the
interpretative frames required: key-inference and meter-inference respectively. The second group of
processes is concerned with encoding in which series of pitches in the input are grouped into chunks
on the basis of structural regularities. In the third group of processes, the chunks generated in the first
encoding phase are integrated into even larger chunks, leading to a complete mental representation of
the input. Next we shall describe these processes in more detail.
The process of music perception may be conceived as the mapping of the input on musical knowledge
stored in the long term memory of the listener, and as the application of perceptual mechanisms to an
input consisting of a sequence of pitches. The aim of the process is to transform a series of
unconnected pitches into an integrated mental representation in musical terms. A sequence of sounds
which is conceived musically (rather than linguistically or otherwise), will be mapped on two
dimensions: the pitch-height dimension yielding the pitch of the sound, and the key-dimension
yielding the attribute of scale-degree. The key-dimension is the hierarchically organized mental tone
space in which the relations between tones and chords are specified (Krumhansl, 1990). It serves as an
interpretational frame that supplies the musical function of the sounds. As soon as the pitches of a
sequence have activated a specific key, they are identified as tones in a scale. All tones in a key are
associated with a certain degree of 'stability' (e.g., the first tone of the scale is the most stable tone, the
last tone, the 'leading tone', the least stable), and with a tendency to resolve to other tones (Cooke,
1959; Povel, 1996; Zuckerkandl, 1956). Thus, in making a musical interpretation of a tone series, the
tones function simultaneously in these two dimensions. Each dimension plays a specific role in the
formation of musical percepts:

Figure 1. A global model of music processing

The main characteristic of the pitch-height dimension contributing towards melody formation is
obviously pitch, especially pitch proximity. Melodies tend to proceed by step, rather than by leap, thus
forming smooth contours; segmentation of tones is based in part on the principle of proximity: tones
relatively close in pitch will form perceptual clusters or groups.
The characteristics of the key-dimension contributing towards melody formation are related to the
properties of scales and chords. The availability of a scale enables to describe a sequence of
consecutive scale tones as a 'run' using the next relations between the tones (the sequence C D E F G
for instance can be represented as 4N(C): start with C and add four times the next (N) element). The
key-dimension also allows the description of a series of tones as a chord. For instance, the first three
tones of the series C E G A B C, may be recognized as a major triad and encoded as such. Another
principle associated with the key-dimension is that of anchoring. This mechanism, first described by
Bharucha (1984), links or 'anchors' an unstable tone to a more stable tone. Anchoring is based on the
notion of a hierarchical relationship between the tones in a key, with for instance the diatonic tones
(the tones of the scale) being hierarchically higher than the chromatic (non-scalar) tones (Lerdahl,
1988). Tones lower in the hierarchy, the less stable tones, are attracted by tones higher in the
hierarchy, the more stable tones, (Povel, 1996; Zuckerkandl, 1956). Bharucha (1984, 1996) has shown
that a tone can only be anchored to a tone close in pitch (1 or 2 semitones) that follows the tone to be
anchored, usually but not necessarily, the immediately succeeding tone. Phenomenally, the anchored
tone is perceived as an ornament of the tone to which it is anchored.
Within the theoretical frame proposed, the process of music perception can now be conceived as
follows: Given a tone sequence presented in a specific tonal and metrical context, perceptual

mechanisms are applied to the input establishing relations between the elements in the input and lead
to a representation in terms of clusters of tones. Thus we assume that the aim of a listener is to
generate a mental description or code that encompasses as much as possible all elements in the input.
If this process is successful, the listener will have the impression that (s)he understands the input, that
it make sense musically. If, conversely, a listener does not succeed in finding such relations, no
coherent musical percept will result.
3. Influential earlier studies
1. Van Dyke Bingham (1910)
Two studies have played a role in shaping this research. The first is Van Dyke Bingham (1910)
who studied the factors that determine whether or not a tone series is perceived as a melody. As
an example he describes two sequences respectively containing the pitches: c' e' g' e' f' d' c' and
c' f' d' g' e' f' c'. The first of these was judged by listeners as a coherent sequence in which the
sounds seem to follow each other naturally, thus forming an esthetic unity, i.e. a melody. The
second sequence, however, was judged to be a non-melody. Van Dyke Bingham asserts that the
concept of tonality plays a decisive role in the processing of tone series as expressed in his
definition of the term:
'By a tonality is meant a group of mutually related tones, organized about a single tone, the
tonic, as the center of relations. Subjectively, a tonality is a set of expectations, a group of
melodic possibilities within which the course of the successive tones must find its way, or suffer
the penalty of not meeting these expectations or demands of the hearer and so of being rejected
as no melody.' (p. 36-37)
From this study we borrowed some of the theoretical ideas proposed above, as well as the idea
for a response, asking people whether a tone series can be conceived as a melody.
2. Cuddy, Cohen, & Mewhort (1981)
The second study is the seminal article by Cuddy, Cohen & Mewhort (1981) in which the authors studied the
perception of tone sequences having "varying degrees of musical structure". Starting from the prototypical
sequence: {C4 E4 G4 F4 D4 B3 C4}, they constructed a set of sequences by altering one or more tones thereby
gradually degrading the "harmonic structure", contour complexity, and excursion size (interval between first
and last tone). From the results of Experiment 1 in which subjects judged the "tonality or tone structure" of
32 seven-tone sequences, 5 levels of harmonic structure were constructed by combining 3 rules: 1)
diatonicism (a series either or not consists of only diatonic tones); 2) leading-note-to-tonic ending; 3) the
extent to which a sequence follows a I - V - I harmonic progression. These levels of harmonic structure were
factorially combined with 2 levels of contour complexity and 2 levels of excursion, yielding 20 stimuli. The
stimuli were recognized under transposition (Experiment 2), and the tonal structure of the stimuli was rated
in Experiment 3. Findings indicate that the ratings were mostly influenced by the factor harmonic structure
and less by contour and excursion.
The importance of this study is twofold: the use of similarity judgments and goodness ratings to measure the
perception of tone sequences, and its aim to discover factors that play a role in the perception of tone series.
Yet the study is limited in a number of respects. First, the concept of harmonic structure is rather ambiguous:
the ordering of the 5 levels of harmonic structure is not theoretically but empirically determined. This means
that it is unclear how the three rules precisely determine the variable harmonic structure. Second, it is not
clear what the subjects actually judged: besides being asked to judge the tonality or tonal structure, they were
instructed 'to reserve the highest ratings for sequences with "musical keyness" or "completeness" and to
assign lower scale values to sequences that contained "unexpected" or "jarring" notes.' (p. 875). Thus it
seems likely that the subjects have judged how well the tone series sounded as a melody; this is supported by
a study of Smith & Cuddy (1986) that obtained comparable results when listeners rated the same sequences
on "pleasingness". Third, although the rules affect the 20 sequences used in the study, it is unclear to what
extent the rules can be generalized to other tone sequences. For instance, the sequences {C4 E4 G4 F4 B3 D4
C4} and { C4 E4 G4 B3 D4 G4 C4}, violating the leading-tone-to-tonic-ending will probably be rated about as
high as the prototypical sequence { C4 E4 G4 F4 D4 B3 C4} which obeys that rule; and the sequence {C4 E4

F#4 G4 F4 D4 B3 C4} violating the rule of diatonicism (but allowing anchoring) will probably be rated much
higher than the sequence {C4 E4 G4 F4 D#4 B3 C4} used in the study. These examples do not undermine the
general finding that harmonic structure plays a role, but indicate that the definition of harmonic structure in
terms of stimulus characteristics is still incomplete. Finally, although the study shows that perception is
strongly influenced by the presence of detectable structure in tone sequences, it does not indicate the concrete
processes that are performed on the input resulting in a mental description.
2. Tracing the perceptual mechanisms in music processing.
1. Introduction
As stated before the general aim of our research is to understand the on-line processes that a listener
performs when perceiving music. This processing is conceived as the application of mechanisms that
combine elements in the input into larger chunks. The concrete goal is to develop a computational
model that describes how mechanisms are applied to the input leading to a more or less successful
mental description of the input. The success of the undertaking is determined by how well predictions
derived from the model are borne out in experiments.
Thus the specific goal of this study is to understand why some tone series are perceived as a melody
and other series are not. On the assumption that a tone series is considered a melody if the perceiver
can create an efficient code that includes possibly all tones of the series, the challenge for the approach
is to discover all perceptual mechanisms that listeners use in coding music.
There are a number of tasks one may use to examine the perception of tone series. Listeners may be
asked to judge the goodness or pleasantness (tonal structure etc.) of a series, or they can be asked to
indicate whether the series contains jarring notes. In other tasks subjects judge the similarity between
notes (for which a transposition paradigm may be used), or indicate which tones they expect at
different moments in the series. In our experiments we have used goodness judgments and expected
continuations. These experiments are described below.
2. Experimental studies
1. Series containing diatonic and chromatic tones
In a few experiments (Povel & Jansen, 1998) we studied the perception of a series of tone
sequences consisting of a subset of all orderings of the collection {C4 E4 F#4 G4 Bb4}. The
presentation of a tone sequence was preceded by the chords C7 - F to induce the key of F-major.
Based on a pilot study in which subjects judged how well fragments of the tone sequences
sounded as a melody, it was hypothesized that a tone series is judged a melody if either one or
both of the mechanisms chord recognition and anchoring can be applied to the series. Chord
recognition is the mechanism that describes a series of tones as a chord, and anchoring is the
mechanism that links a tone to a (chord) tone occurring later in the series. Applied to the stimuli
used in the experiments a sequence of tones may be conceived as a chord, namely C7, which is
feasible when the F#4, which does not belong to the chord, can be "anchored" to a subsequent
G4. Anchoring (Bharucha, 1984) may either be immediate when the G follows the F# like in the
tone series {C4 E4 F#4 G4 Bb4}, or more or less delayed when one or more tones intervene
between the F# and G as in the series {E4 F#4 C4 G4 Bb4} or {Bb4 F#4 E4 C4 G4}.
This hypothesis was tested in two experiments using a paradigm in which the participants heard
stepwise lengthened fragments (beginning with a fragment of length three) and rated the
melodic goodness of the fragment (Experiment 1) or played a few tones that completed the
fragment (Experiment 2). It was found that goodness ratings were highest if the fragment only
contained elements of the C7 chord, lower if the F# was immediately followed by the G, still
lower if one tone intervened between F# and G, and lowest if the G preceded the F#.
Unexpectedly, it was found that series in which two or three tones intervened between F# and G,
were rated higher than those with only one tone between F# and G. As in these series the
non-fitting F# occurred relatively early and the last three or four tones formed a C7 chord, this
finding was tentatively explained by assuming that goodness ratings are mainly based on the

most recent tones heard. Listeners' expectations collected in the second experiment corroborated
the above findings: series that activate the chord C or C7 (according to the hypothesis) tended to
be continued with the tone F, whereas series ending with the tone F# tended to be continued
with the tone G, later followed by the tone F.
Overall, the results support the hypothesis that the coding of these sequences was based on the
application of the mechanisms chord recognition and anchoring. As the interaction between the
two mechanisms is still not quite understood, we decided to subsequently study tone sequences
only containing diatonic tones.
2. Series only containing diatonic tones
In this experiment 20 subjects rated the goodness of 60 orderings of the collection {D4 E4 F4 G4
A4 B4} on a 5-point scale. In the experiment each series was preceded by the chords G7 and C
to induce the key of C-major and presented at a different pitch height. To explain the results a
number of computational models were developed based on a set of general assumptions
concerning music perception and a number of specific assumptions regarding the processing of
music. The general assumptions were: The pitches that are the basic constituents of a tone
sequence can be conceived in two ways: 1) As a sequence of pitches forming a contour the
sequential regularities of which can be described in a code; 2) As a sequence of tones conceived
within a key as a result of which the tones acquire the perceptual attributes stability and
expectation. These assumptions lead to the hypothesis that a tone sequence will be judged a
melody if the listener can mentally construct a code that includes all tones and in which the
raised expectations are resolved.
Regarding the coding aspect we assume a number of mechanisms that organize elements in the
input into higher order mental units such as: runs, chords, trills, motives, ornaments etc.
The expectations that are created when the input is interpreted in a key, are described in terms of
vectors. A vector has a direction, that points to some future musical unit, and a magnitude
representing the strength of the expectation. Specific assumptions regarding vector assignment
are: 1) vectors may be created by all mental units in which the listener codes the input, e.g. tones
and chords; 2) vectors are assigned by reference to the currently activated region. For example
in the series {B4 F4 G4 D4 A4}, the first four tones will induce the chord G7, as a result of which
the tone A will get a vector pointing towards the closest most stable element in the G7 chord,
namely G. Specific assumptions regarding vector resolution are: a vector will resolve
(disappear) 1) with time (the magnitude decreasing with some time function); 2) if the expected
tone occurs either immediately or after some delay; 3) if the vector-carrying tone is integrated in
a code (e.g. if the series {D4 E4 F4 G4 A4 B4} is conceived as a run, only the last tone B4 will
carry a vector).
Based on these assumptions, a model was developed that describes the coding of the tone series
in terms of runs and chords, and the resolution of expectations in terms of the logic of the
succession of recognized chords. The model was implemented as follows: Neighboring tones
having an interval of 1 or 2 semitones are chunked into runs, while the remaining tones are
recognized as triads on one of the seven scale degrees. Several assumptions regarding chord
recognition were made as a tone series may in principle allow for several harmonic
interpretations. For instance, the series {E4 A4 F4 D4 B4 G4} presented in the key of C-major,
may activate several chords: vi, (E4 A4); IV, (A4 F4); ii, (A4 F4 D4); vii, (F4 D4 B4); V, (D4 B4
G4); V7, (F4 D4 B4 G4); and V9, (A4 F4 D4 B4 G4). Chord recognition was implemented as
follows: 1) three different subsequent tones always lead to the unique identification of a chord;
2) two tones forming an interval of a fifth also always lead to the identification of a unique
chord; 3) a series of two different tones forming an interval of a third is interpreted as the major
triad (I, IV, or V) in which that third occurs; 4) vii is interpreted as V.
Finally, the logic of chord order is based on Piston's (1941/1989) Table of usual root
progressions, in which three categories of progressions were distinguished occurring in

descending order of frequency. Examples of progressions in the three categories are

respectively: I - V; I - vi, and I - iii.
All these assumptions were incorporated in a series of computational models that describe the
incremental processing of a tone series by specifying which chords are activated at each point in
the series and computing the degree of logic of the chord progression. Depending on the weights
assigned to each of the parameters the models explain 45% - 62% of the variance.
The attempt to design a model that completely specifies all steps in the incremental processing
of the tone series used in this study has been most instructive. Among other things it shows the
considerable complexity of the process due to the parallel operation of the different mechanisms
and their intricate interactions. As a result of this a fairly large number of assumptions are
needed to develop a computational model that render concrete predictions about the processing
of these apparently simple tone sequences. It should be noted that one of the reasons why we
need so many assumptions is that the stimuli used are musically quite ambiguous: although the
sequences are supposedly conceived in a specific key, still several alternative chordal
interpretations are possible (as shown in the example above). We shall return to this issue later.
Besides the large number of assumptions needed, there is the problem that the reliability of the
responses provided by the subjects is relatively low (average correlation between subjects being
.33), indicating that different subjects may have based their judgments on different aspects of
the tone series. This poses a threat to the potential explanatory value of the model.
3. Accentuation patterns
In this study we examined whether inter-subject-correspondence increases if besides the key
also a metrical interpretation is induced. To this purpose we presented the same 60 tone series
from the previous study in four different metrical conditions. In condition 1, the first and fourth
tone of the tone series were stressed in order to induce a three-part metric pattern. In condition
2, tones 1, 3, and 5 of the tone sequence were stressed inducing a two-part metric pattern
starting on a downbeat. In condition 3, a two-part metric pattern starting on an upbeat was
induced by stressing tones 2, 4, and 6 of the sequence. In condition 4, finally, none of the tones
were stressed; stimulus presentation in this condition was therefore the same as in the previous
study. Twenty subjects participated in the experiment. Surprisingly, it was found that the
agreement between respondents was not higher for conditions 1, 2, and 3 than for condition 4
(mean correlations for the 4 conditions were respectively: .25, .29, 32, and .35). At present we
have no explanation for this puzzling result, but the low correlations between subjects is rather
disquieting. (See Jansen & Povel, 1999)
4. Harmonic factors
In the previous studies we have examined the perception of tone series that contained both steps (intervals of
1 or 2 semitones) and leaps (intervals larger than 2 semitones). We have seen that the processing of these
series presupposes the operation of several perceptual mechanisms, notably chunking into a run, anchoring,
and chord recognition. Because it is not yet completely understood how these mechanisms interact, several
ad-hoc assumptions had to be posited making the modeling of the complete process rather problematic.
Therefore we decided to study the perception of sequences only containing leaps, assuming that in these
sequences only the mechanism of chord recognition will take effect. This allows us to study the mechanism
of chord recognition in isolation.
In an experiment, 32 six-tone sequences were constructed in which a segmentation into two groups of 3 tones
was induced, each group consisting of one of the three triads I, IV and V. Chord progressions were formed
by 4 different combinations of these triads. Contour complexity of the sequences was also manipulated.
Listeners rated these tone sequences on a 7-point scale for melodic goodness. The results indicate that
goodness responses are determined by the usualness of the perceived implied harmonic progression and the
contour complexity of the sequences. This leads to the conclusion that the mental representation of these
sequences consists of a description of its underlying harmony as well as its sequential structure. (See Jansen
& Povel (2000) for details)
3. Conclusions

The studies described in this paper lead to the following conclusions.

1. In the procedure applied in the studies reported above we have attempted to discover the perceptual
mechanisms that operate in different sets of well-defined stimuli. Next the mechanisms are incorporated in
computational models that describe the transformations performed on the elements in the input. Predictions
derived from the model are subsequently tested in experiments. Although the enterprise so far has not been
completely successful, we have gained a considerable insight in the working of a number of mechanisms and
in the ways they interact. We are confident that this approach will ultimately lead to a greater insight in how
music is processed in the experimental settings used.
2. One of the responses we have used, the judgment of goodness, may not be the most suited to examine the
perception of tone series because it tends to yield rather unreliable data. Although we initially thought that
this response is directly related to the process of perception, it turns out that because of its relative vagueness,
listeners seem to use different criteria in their replies. However, this is a general problem in music perception
studies: it is difficult to ask questions that directly relate to concrete aspects of the stimulus. The expectations
that arise while listening to music may be more suited as a firsthand indicator of the processes involved.
3. As shown above, the single tone sequences used pose a problem in that they tend to be musically
underspecified and therefore ambiguous. It should be noted that such sequences are not representative for
actual music in which the interpretative context is usually much richer thereby reducing the amount of
ambiguity. Therefore it may seem to be better to study more realistic musical samples. From an experimental
viewpoint, however, using existing music pieces is not a solution, because they do not allow the systematic
manipulation of the variables involved. The ambiguity of the single tone series can be reduced by presenting
them in a context that unambiguously establishes the musical frames of interpretation: key and meter.
4. The starting point of this study has been to develop a model of music perception that describes the
transformations that are performed on the input leading to a mental representation. The tacit assumption
underlying this goal namely that it is feasible to develop one comprehensive theory of music perception may
not be realistic. Given that music is a highly complex multidimensional event, it seems likely that the context
in which music is listened to greatly determines what aspect(s) the listener attends to. This could imply that
separate models of music processing have to be developed for the various tasks studied.
1. References
Bharucha, J. J. (1984). Anchoring effects in music: The resolution of dissonance. Cognitive Psychology, 16,
485-518.
Bharucha, J.J. (1996). Melodic Anchoring. Music Perception, 13, 383 - 401.
Cooke, D. (1959). The Language of Music. Oxford: Oxford University Press.
Cuddy, L. L., Cohen, A., & Mewhort, D. J. (1981). Perception of structure in short melodic sequences. Journal of
Experimental Psychology Human Perception and Performance, 7, 869-883.
Deutsch, D., & Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological
Review, 88, 503-522.
Jansen, E. L., & Povel, D.J. (1999). Mechanisms in the perception of accented melodic sequences. Proceedings of
the 1999 Conference of the Society for Music Perception and Cognition. Evanston, Ill. p. 29.
Jansen E. L., & Povel, D. J. (2000). The role of implied harmony in the perception of brief tone sequences. This
proceedings.
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. New York: Oxford University Press.
Krumhansl, C. L. (2000). Rhythm and Pitch in Music Cognition. Psychological Bulletin, 126, 159 - 179
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
Piston, W. (1941/1989). Harmony. London, Victor Gollancz Ltd.

Povel, D. J. (1996). Exploring the elementary harmonic forces in the tonal system. Psychological Research, 58,
274-283.
Povel D. J., & Jansen, E. (1998). Perceptual Mechanisms in Music Perception. Internal Report NICI.
Smith, K. C., & Cuddy, L. L. (1986). The Pleasingness of Melodic Sequences: Contrasting Effects of Repetition
and Rule-familiarity. Psychology of Music, 14, 17-32.
Van Dyke Bingham, W. (1910). Studies in melody. Psychological Review, Monograph Supplements. Vol. XII,
Whole No. 50.
Zuckerkandl, V. (1956). Sound and Symbol. Princeton University Press.
Back to index

EMOTIONAL EXPRESSION IN SPEECH AND MUSIC:
DIFFERENT CHANNELS, SAME CODE?
Patrik N. Juslin
Uppsala University
Box 1225
SE -751 42 Uppsala
SWEDEN
Background: A number of philosophers, psychologists, and natural scientists have speculated that speech and music
share a common origin - a notion that implies that the two modalities should have much in common. However, despite
considerable interest in this issue, there has been little empirical evidence to support such cross-modal parallels. This is
unfortunate since evidence of cross-modal parallels could offer a partial explanation of why music is perceived as
expressive of emotion.
Aims: This paper eavesdrops on the results from a systematic review of studies of emotional expression in speech and
music performance. The principal aim of the review was to explore the extent to which there are cross-modal
similarities between speech and music performance by integrating the results from a large number of empirical studies
in both domains.
Main Contribution: The results show that there are many parallels between speech and music performance with regard
to (a) accuracy, (b) coding, (c) code usage, (d) cue intercorrelations, (e) gender differences, and (f) the use of
expressive contours. However, the results also show that many of the acoustic cues remain to be studied systematically,
and that the relationships among acoustic cues and emotions are not consistent across different conditions.
Implications: The results support the often suggested hypothesis that speech and music share a common origin. A
theoretical explanation of the obtained results is provided and implications for future research are discussed. It is
argued that cross-modal comparisons yield insights that would be difficult to obtain from studying the two domains
separately.
Back to index
file:///g|/Wed/Juslin2.htm (1 of 2) [18/07/2000 00:39:34]

file:///g|/Wed/Juslin2.htm (2 of 2) [18/07/2000 00:39:34]

From: Marc Bangert
HOW TO GET A PIANO INTO YOUR HEAD - EFFECTS OF PRACTICE ON CORTICAL AND
SUBCORTICAL REPRESENTATIONS OF THE SOUNDING KEYBOARD
Marc Bangert
Marc.Bangert@hmt-hannover.de
Background:
Deliberate lifetime practice seems to promote a joint mental representation of

ear and hand, as musicians know: Silent dexterity drills produce "audible tones
inside the head", and sounding music goes "straight into the fingers".
Aims:
With a uniform paradigm (dissociating auditory and motor features of playing)

we tested professional pianists, and beginners during their first weeks of
practice. We combined a set of methods appropriate to elucidate both cortical
and putative subcortical contributors to an audiomotor corepresentation.
method:
Dissociation paradigm: Subject's task was either to (1) listen to piano tones
passively, (2) press mute piano keys, or (3) practice on a modified piano with
randomly re-assigned key-to-pitch coupling. Data acquisition: Cortical:
32-channel DC-EEG (task-related slow potentials, event-related
desynchronisations, coherences). Subcortical: Classical conditioning of the
eyeblink reflex on particular notes and motor transfer. Behavioural: Detailed
performance analysis based on MIDI.
Results:
After practice, cortical auditory and sensorimotor areas are jointly activated
for purely auditory as well as for mute motor tasks. In addition, a right
dorsolateral prefrontal area engages in this corepresentation in beginners and
experts but not in controls who practiced on the manipulated piano in a way
that they could not establish a mental "map" of the keyboard. The eyeblink
experiment revealed interindividually heterogeneous results, but subcortical
audiomotor integration seems to be possible after years of training. The
manipulation experiments suggested a correlation between the flexibility to
re-learn a shuffled keyboard and individual practice habits (jazz vs. classic).
Conclusions:
In musicians the entire auditory and motor network is activated regardless of

input modality and attention. This corepresentation is not the final outcome
resulting from years of practice, on the contrary, it is established during the
very first minutes of training, consolidated as a matter of weeks and may
provide the basis for any virtuosity that can be achieved later on.
Supported by the Deutsche Forschungsgemeinschaft (SPP 1001)
file:///g|/Wed/Bangerab.htm (1 of 2) [18/07/2000 00:39:35]

From: Marc Bangert
Back to index
file:///g|/Wed/Bangerab.htm (2 of 2) [18/07/2000 00:39:35]

Motivated by these considerations, we have recently carried out a s...ve upon children's musical creativity (Miell & MacDonald, in press)
Proceedings Paper
An Empirical Investigation of the Social and Musical Processes Involved

in Children's Collaborative Compositions.
Raymond A.R. MacDonald, Department of Psychology, Glasgow Caledonian University,
Cowcaddens Road, Glasgow G4 0BA, UK.
Telephone: + 44 (0) 141 331 3971
Fax: + 44 (0) 141 331 3636
Dorothy Miell, Department of Psychology, The Open University, Walton Hall, Milton Keynes MK7
6AA
Telephone: + 44 (0)1908 654 546
email: D.Miell@Open.ac.uk
Laura Mitchell, Department of Psychology, University of Strathclyde, George Street, Glasgow GI

1QE, UK.
Introduction
This paper reports two studies which have investigated the impact of social factors have upon
children's musical creativity. We have been concerned to explore these social factors since making
music is so essentially a social process, particularly in the collaborative settings of UK classroom
music lessons, with the interaction between children both affecting and being affected by the evolving
music. In exploring the processes involved we have drawn on the literature from social and
developmental psychology, particularly the growing literature on collaborative learning, which
emphasizes the importance of quality communication between children and the importance of
agreeing shared goals and ways of working together (Kruger, 1993; Rogoff, 1990). However, most of
the studies conducted on collaborative learning are of children's maths and science work, and there are
few empirical studies of children's collaborations on more open-ended tasks such as music
composition or creative writing (see Johnston, Crook & Stevenson, 1995 for an exception). A crucial
aspect of studying collaborative music making is that music affords a channel of communication other
than verbal interaction (Morgan, 1999). In addition, there is a reciprocal interaction between the
ongoing musical and verbal communication between children in this context (MacDonald, Miell &
Morgan, in press)
A key variable investigated in the studies reported here is the effect of an existing relationship
between the children on the way in which they communicate and work together. We expected that,
given the need for quality communication and for establishing a 'shared social reality' (Rogoff, 1990)
in order to achieve successful collaboration, working with a friend would be particularly helpful for
children. This might be expected to be particularly the case in creative, open-ended tasks where the
children have to not only work together on the task itself (eg composing a piece of music), but also
have to define the goals of their work and negotiate with each other without there being a 'right
file:///g|/Wed/MacDonal.htm (1 of 8) [18/07/2000 00:39:37]

answer' to guide them, as well as stimulate and build on each other's creative input. Such interactional
work is, we hypothesised, more likely to be achieved successfully when a child is working with
someone they have an established friendship with, where they have experience of working, talking
and playing together successfully.
Study One
In this study, 10-11 year old children were asked to compose a piece of music entirely of their own
and in a style of their choosing to reflect the theme of 'the rain forest'. The children all began their
involvement with the project by attending a workshop with one of the researchers during which they
experimented with different instruments, rhythms, dynamics etc and discussed ways in which
compositions can be developed and different effects achieved. The experimental sessions involved the
children working on their compositions in same sex pairs and they were given 15 minutes to complete
the task, using a full range of instruments typically available to them in school music lessons (tuned
and untuned percussion and keyboards). Half the children worked on the task with one of their best
friends while the other half of the children worked on the task with a child from a different class who
they would have known by sight but who was not a friend.
We were interested in both the nature and quality of the interactive process as well as in the quality of
the musical end product, and with this in mind we videotaped all the composition sessions and also
recorded onto an audiocassette each pair's final performance of their composition. All the verbal
utterances and musical motifs from the videotapes were transcribed and the talk was then coded in
accordance with a system introduced by Berkowitz et al. (1980) and developed by Kruger (1992).
This coding system divides utterances into 'transactive' and 'non-transactive' types. Transactive
communication is defined as communication which builds upon and extends ideas that have already
been voiced (either by the self or the partner) and the presence of transactive communication has been
shown to be a key factor in good quality collaboration. We adapted this verbal coding system to allow
us to also code the music played by the children as either transactive or non-transactive and to track
the occurrence and elaboration of each musical motif throughout the composition session
(MacDonald, Miell & Morgan, in press). The final compositions were rated for quality by a teacher
from another school who worked from the audiotape of each composition and was unaware of the
hypothesis of the experiment, the experimental conditions and all details of individual pairs. She rated
the compositions using a set of marking scales developed by Hargreaves, Galton & Robinson (1996).
The results of this study highlighted the impact that social factors such as friendship have upon both
the process and outcomes of children's collaborative compositional work. Looking first at the outcome
measure, the teacher rated the compositions produced by friends as of significantly higher quality than
the compositions of children who had been working in non-friendship pairs. Having established this
difference in the overall quality of the music produced, we then turned to the measures of the
processes involved in the talk and music of the interaction to see if there were also differences there
which related to the outcome scores. We found that both the musical and verbal communication styles
of the friendships pairs were qualitatively different from those of the non-friends. The friends both
spoke and played more music in total than the non-friends, but also had a different pattern of
interacting within these overall differences in amount. The friendship pairs used proportionally more
transactive communication in both the verbal and the musical domains than the non-friends. This
meant that the friends were building on, extending and elaborating on each other's ideas, expressed in
both the talk and music, and developing their compositions by this gradual process of offering and
refining suggestions. This style of interaction was found to be significantly positively related to the
teacher's higher score for these pairs, suggesting that the presence of more transactive communication

was what led to the higher quality compositions from the pairs of friends. In contrast, non-friends
were more likely to spend their time in the session experimenting with the instruments for themselves
and did not offer up or develop ideas together in the same way. The smaller amount of talk which they
produced was characterised by information giving and simple, unelaborated agreements and
disagreements with each other. Sometimes the music seemed to be played to cover their
embarrassment and the lack of talk between them.
Thus it appears from this study that social factors such as friendship are key variables that influence
the nature of children's interactions - in both the verbal and musical domains. The musical coding
scheme which we developed allowed us to track interactive processes expressed musically as they
occurred in the composition sessions and holds great promise for future studies of other groups and
pairs collaborating to compose and improvise.
Study Two
A second study was designed to extend the first. Two key issues were identified as important for
further investigation. The first was to explore the extent to which the friendship effect found in the
first study might generalise to other settings. In particular, given the finding by Azmitia &
Montgomery (1993) that working with a friend mainly helped children when they tackled difficult
tasks, we wanted to vary the difficulty of the task. One way of changing the difficulty was to change
the level of structure and guidance given to the children, so that children would find they had fewer
choices and decisions to make for themselves. The participants in this case were limited to using only
a keyboard and to starting their composition with a middle C (instructions derived from a study by
Kratus (1989) looking at the composition process in 7 year old children). In order to see whether the
friendship effect was also found in younger children, this second study also involved 8 year old as
well as 11 year old children. As in the first study, children worked in same sex pairs of friends or
non-friends, with one child in each pair being more experienced musically than the other. Again as in
the first study, the interactions between the pairs and the outcome of the collaborations were
examined. Dialogue between the pairs of friends and their musical interaction were analysed using
measures of transactive and non-transactive communication. The musical processes used by the pairs
were also examined. A school music teacher finally graded each composition.
Results highlighted that older children and those working with a friend took part in more transactive
communication - both in their dialogue and in the music that they played. At 11 years old it appeared
that there were no differences between the friends and non-friends in either the amount of transactive
communication or the scores received for the compositions, whereas at 8 years old the differences
between friends and non-friends were more apparent. Compositions by younger children paired with a
friend were given a higher score by the teacher and used more transactive verbal and musical
communication. In the analysis of the musical processes used, it also became clear that 8 year olds
paired with a friend were able to organise their time to include sufficient quantities of development
and rehearsal of their piece in a manner similar to that of the older children. The 8 year olds paired
with a non-friend, however, were found to spend most time in individual exploration of sounds or
silence.
The 8-year-old children paired with a non-friend took part in considerably less transactive dialogue
than the older children and than those of the same age paired with a friend. At 11 years of age, little
difference could be seen between the discussion style of the friends and non-friends, both taking part

in high amounts of transactive and useful non-transactive dialogue such as making proposals. In this
age group, scores given to the compositions were similar for the two groups, suggesting that the ways
of working together on a structured music task were of a similar nature by this age level whether one
is working with a friend or acquaintance, although we had observed differences in the previous study
where the task was unstructured. At 8 years of age, however, the relationship with the partner appears
to make more impact on the type of discussion that took place. At this age level, those paired with
friends took part in more other-oriented transactive communication - in terms of their statements,
questions and responses. The group scoring the lowest final composition score, the 8-year-old
non-friends, were found to use the least amount of dialogue overall, in both transactive and
non-transactive categories. Their counterparts of the same age paired with a friend, meanwhile, were
discussing the task in a style much closer to that of the 11-year-old participants.
In order to find out whether the musical interaction between the children matched the differences
found in the verbal interaction, an analysis of the amounts of transactive and non-transactive musical
communication was then carried out. In comparing the two age groups, it became clear that the
younger children had spent significantly more time playing 'music for self' - experimenting
individually without any apparent attempt to communicate with the partner. The older children had
also taken part in a significantly greater amount of musical repetition, rehearsing their composition
more thoroughly. The variable of friendship also revealed expected differences such as the
non-friends playing more 'music for self', and the friends using a greater number of other-oriented
transactive musical motifs. Although these effects for friendship and age separately were found, it was
again the interaction between these two variables that clarified the different ways of working in the
pairings.
As found in the verbal interaction, little differences in musical communication could be seen between
the friends and non-friends aged 11. In the 8-year-old group, however, the amount of transactive
musical communication was found to be more than double that of the non-friends. At the same time,
the musical activity in the friends of 8 years consisted of greater music for self, and less musical
repetition. Whereas the older children and the younger children paired with friends spent time
working on one another's suggestions, therefore, this group devoted most time to experimenting with
sounds individually, and without aiming to rehearse and close on one musical product.
The musical techniques used by each child were analysed using the categories devised by Kratus
(1989). In his study he examined the minute-by-minute development of the children's music as it fell
into 4 categories: repetition, development, exploration and silence. The same analysis was conducted
on the music played by the various pairs in the present study. The older children structured their time
to include greater repetition, which gave a greater opportunity for rehearsing their composition. The
8-year-olds, meanwhile, were found to have longer periods of silence and of musical exploration -
playing for themselves rather than their partner. The younger children spent more time in complete
silence with no musical or verbal interaction going on, and when actually playing were more likely to
be experimenting with new musical ideas for themselves. It was apparent that in the younger age
group it was the non-friends who took part in a significantly greater amount of exploration. This
group was also found to spend the least amount of time on repetition.
The analysis of the processes used at different times during the 10-minute period furthermore revealed
the planning of time by children in each age group and type of friendship pair. The pattern of playing
over time for the 11-year-old friends and non-friends could be seen to be generally similar, with
repetition as the predominant technique, increasing over time in order to allow for sufficient rehearsal
before the end. Exploration during this time decreased gradually, and development increased around
the middle of the session yet decreased later in order to make way for further repetition as the end

product was finalised.

The 8-year-old children working with a friend appeared to use their time in a manner similar to the
older children, with only a slightly greater amount of silence distinguishing their use of the processes.
As exploration and on-task talk decreased from their high levels in the first few minutes, repetition
gradually increased to become the predominant process. The 8-year-old non-friends, however, as
might be expected from the previous analyses, could be seen to plan their time more poorly, with
large amounts of silence and little repetition or development.
The findings from the current analysis using Kratus's categories are therefore in line with those of his
1989 study, which had concluded that young children (7 year olds in his study, 8 year olds here) spent
most time devoted to musical exploration compared to the equal division of time between exploration
and repetition observable in our 11 year old participants. Again, in ratings given to the second
performance of each composition, the current findings also backed those of Kratus, with older
children able to replicate their composition more accurately, probably due to the increased attention
given during the composition time to repeating the piece. The most noticeable difference, however, to
Kratus's suggestion of the link between age and compositional ability was in the 8-year-old children
paired with a friend. These pairs were found to use their time in a manner much closer to that of the
older children, developing and repeating their piece in a way which led them to replicate their piece in
a similar way to the 11 year olds. The 8-year-old non-friendship pairs meanwhile were found to use
their time in an unstructured, unplanned manner that appeared to lead to their poor ability to replicate
their composition accurately.
In comparing the current findings to those of Kratus, the theory of 'product' or 'process' orientation
appears to be of particular significance. Kratus concludes that the 7-year-old children's compositional
process consisted of trying one musical idea after another, their orientation appears more towards the
process of making sounds rather than the actual production of one replicable melody. As musical
exploration took precedence, therefore, no closure on the musical product was able to occur. Older
children, however, undertook a process similar to that of adult composers, structuring their time to
include sufficient amounts of development and rehearsal. Kratus therefore defines this as product
oriented - aiming towards the refinement of one musical product.
It is clear from the current study that the process of working with a friend allowed the 8 years old
children to be product oriented, a feature generally not observed when children of this age work
individually on a composition. Whereas the 8 year olds working with a non-friend partner spent their
time exploring new sounds, it appears that those working in a friendship pair came to realise quickly
that rehearsal would be a necessary activity in order to complete the task successfully. The 8-year-old
friends therefore took part in an amount of repetition similar to that of the older children.
Unlike the observation of children working individually at composition in Kratus's study, the work of
Azmitia and Montgomery (1993) aimed at similar goals to the current study, examining why
collaboration between friends might lead to increased success. Focusing on scientific reasoning skills,
their findings suggested that the greater mutuality and involvement between friends led to greater
support between the pair. This proposal is therefore backed by the current findings of greater amounts
of transactive discussion between friends, particularly of the other-oriented variety where the children
makes elaborations, responses to and revision of a partner's ideas rather than their own. A higher level
of self-oriented transacts also between friends, however, relates to the suggestion by Newcomb and
Brady (1982) that friends are more aware of the need to justify and explain one's ideas to a partner. It
appears, therefore, that those children working with a friend felt more at ease being engaged with each
other in the task, taking time to discuss aspects of their work more thoroughly.

One aspect of the collaborative work of friends suggested by Azmitia and Montgomery (1993) was
that of greater checking and evaluating of solutions, a feature which in fact became apparent when
carrying out the study. Watching the friends at work together even at this point revealed that a
noticeable part of their discussion revolved around requesting to each other that they practice again,
going over fine points quickly in order to do so. The findings of the current study particularly relating
to the 8-year-old children were therefore found to support this theory. Whereas 8 year olds working
with a non-friend took part in little rehearsal of their composition, those working with a friend used
repetition within their 10 minutes at a level close to that of the 11 year olds. Relating to the suggestion
of Nelson and Aboud (1985) that criticism of the partner's work allows friend to give a superior
performance, it was also confirmed that friends do check and make suggestions on the partner's work
to a greater extent, taking part in greater verbal and musical transactive other-oriented suggestions.
A further proposal from the work of both Azmitia and Montgomery (1993) and from Newcomb and
Brady (1982) is that differences in performance between friends and non-friends are more likely to
emerge in challenging tasks rather than straightforward ones. This can be seen as one explanation for
the advanced interaction and superior performance of the younger friends, in that due to the younger
children having had less musical, and indeed less collaborative problem solving experience in general,
the task obviously presents a more challenging assignment for them than for the older children. For
the older children, we would suggest that no differences were found between friends and non-friends
in this study since the greater degree of structuring to the task made it easier than the more open ended
task set in Study 1, where differences between 11 year old friends and non-friends were found. In
Study 1, the children had to make a number of choices and decisions for themselves and it was
suggested that the friends were more successful since they had a more developed style for working
together and making such decisions collaboratively. With this more structured task, the advantage of
being with a friend was less apparent and the effect was not observed (as indeed is often the case in
studies of structured maths and science tasks with 11 year olds).
Hartup (1996) suggested that the important feature of friends' collaborative work was the ability to
establish 'joint productive activity' and this is backed up by several features of the children's
interactions in the current study. It became clear when conducting the study that the 8 year olds
working with a non-friend partner required much greater amounts of prompting in order to become
involved in the task. Even after this, it appeared that the non-friends at this age were unable to move
from one stage of the process to the next with ease, choosing to maintain a high level of
experimentation throughout the 10 minutes, or indeed waiting in silence for periods of time until one
partner could make a suggestion. Non-friends, then, had to struggle to try and establish a way of
working together before any productive activity could take place (particularly as here when no
structure was provided in the task instructions) - a feature which comes naturally to friends with a
history of such interactions.
Despite the findings of this and previous studies of the effective verbal interaction style of
collaborating friends, the fear of off-task talk between friends often appears to stop teachers from
pairing children in this way. Previous work by Miell and MacDonald (in press) and Hartup (1996),
however, found that contrary to expectations, friends spent less time in off-task talk than non-friends.
Although it was observable when carrying out the current study that friends appeared often more
tempted to play a tune they knew to entertain their friend, the overall lesser amount of general
exploration by the friends suggests that this did not affect the productiveness of the collaborations.
Although, therefore, more off-task talk was found between the friends, and in particular those of 11
years old, this was compensated for by the much greater amount of on-task talk by the friends. It
appears in the case of the 8-year-olds that more off-task talk in the friends did not mean that they were
communicating less effectively, merely that the non-friends generally failed to communicate with one

another at all.
A clear picture of the effect of friendship in these two groups has emerged, revealing that being in a
friendship pair does allow 8 year olds to engage in transactive discussion, musical interaction and
effective use of musical processes in a way that their non-friend same age counterparts cannot, and
that older children show the same pattern when working on a less structured task. These results
contrast with suggestions by Harrison and Pound (1996) that setting composition tasks to younger
children may curb enthusiasm and imagination, it appears rather that collaborating with a friend
allowed 8 year olds to use their friendship as a resource in encouraging increased motivational,
organisational and imaginative ability.
Conclusions
The two studies reported here were designed to investigate the social processes involved in children's
creative collaborations. The studies focused upon the process and outcomes of both the musical and
verbal interactions. The first study highlighted that when children aged 11 work with someone they
know well they produce proportionally more transactive communication. That is, communication that
developed upon ideas previously proposed. This result was evident in both in the verbal and musical
domains In addition, the compositions produced by pairs of friends are rated as being of a higher
quality than the compositions produced by children working with some they did not know. In study
two it was found that children aged eight produced similar findings to the first study when they were
working on a more structured musical tasks. No differences of this nature were found between
friendship and non friendship pairs of children aged eleven in this study. It is suggested that
differences in the nature of the musical tasks employed help explain the differences in the findings of
the two studies.
REFERENCES
Azmitia, M. & Montgomery, R. (1993) 'Friendship, transactive dialogues, and the
development of scientific reasoning.' Social Development, 2 (3), 202-221
Berkowitz, M.W., Gibbs, J.C. & Broughton, J. (1980) 'The relation of moral judgement
disparity to developmental effects of peer dialogue'. Merrill-Palmer Quarterly, 26,
341-357
Gottman, J. (1983) 'How children become friends'. Monographs of the Society for
Research in Child Development, 48(3).
Hargreaves, D.J., Galton, M.J. & Robinson, S. (1996) 'Teachers' assessments of primary
children's classwork in the creative arts.' Educational Research, 38 (2), 199-211
Harrison, C and Pound, L (1996) Talking Music: Empowering children as musical
communicators. British Journal of Music Education, 13 (3)
Hartup, W.W. (1996) 'The company they keep: friendships and their developmental
significance' Child Development, 67, 1-13
Johnson, P.G., Crook, C.K. & Stevenson, R.J. (1995) 'Childs play: Creative writing in
playful environments' in H.C. Foot, C.J. Howe, A. Anderson, A.K. Tolmie & D.A.
Warden (Eds) Group and Interactive Learning. Boston: Computational Mechanics
Publications.

Kratus, J.K. (1989) A time analysis of the compositional processes used by children aged
7 to 11. Journal of Research in Music Education, 37, 1, 5-20
Kruger, A.C. (1992) 'The effect of peer- and adult-child transactive discussions on moral
reasoning'. Merrill-Palmer Quarterly, 38, 191-211
Kruger, A.C. (1993) 'Peer collaboration: conflict, co-operation or both?' Social
Development, 2 (3), 165-182
MacDonald, R.A.R., Miell, D. & Morgan, L. (in press, 2000) Social processes and
creative collaboration in children. European Journal of the Psychology of Education.
Miell, D. & MacDonald, R.A.R. (in press, 2000). Children's creative collaborations: The
importance of friendship when working together on a musical composition. Social
Development
Morgan, L. (1999) 'Children's Collaborative Music Composition: Communication
through Music'. Unpublished dissertation, University of Leicester, UK.
Nelson, J. & Aboud, S. (1985) 'The resolution of social conflict between friends' Child
Development, 56, 1009-1017
Newcomb, A. F, and Brady, J.E (1982) Mutuality in boys' friendship relations. Child
Rogoff, B. (1990) Apprenticeship in thinking: Cognitive development in social context.
Oxford University Press: Oxford.
Back to index

EFFECT OF HARMONIC RELATEDNESS
Proceedings paper
More about the (weak) difference between musicians' and nonmusicians' abilities to process
harmonic structures
Bigand, E.*, Poulain, B.*, D'Adamo, D.*, Madurell, F.** & Tillmann, B.***
* LEAD-CNRS, Université de Bourgogne, France
** Music Department, Université Paris IV Sorbonne
*** Dartmouth College
Musicians and non musicians often behave similarly when they are required to perform musical tasks
that are not more familiar for the former than for the later. The purpose of the present study was to
further investigate this issue by using an on-line experimental paradigm designed to assess the
influence of cognitive and sensory components on the development of harmonic expectancy.
Experiment 1 involved a harmonic priming task in short contexts. The aim was to assess the
contribution of musical expertise on the differentiation between regular and less regular resolutions of
a diminished chord. In one condition, a prime chord (say a B diminished chord) was followed by a
target that was either one of the four possible resolutions of the diminished chord (C major chord for
example) or a less legal resolution (C# for example). Interestingly, less legal targets share more
component tones with the prime than do legal targets. According to sensory priming, illegal targets
should be processed more easily than legal ones due to the shared component tones. According to
Western musical rules, the processing of the target chord should be more facilitated (more accurate
and faster) for legal targets than for illegal targets. The critical point of the study was to investigate
the contribution of sensory and cognitive components of harmonic priming as a function of the extent
of musical expertise. Sensory priming was expected to predominate over cognitive priming in non
musicians, and a reverse tendency was expected in musicians.
Method. Participants performed a harmonic priming task by providing a simple perceptual judgment
on the target chord. Following Bharucha and Stoeckig (1987), the required perceptual judgment
concerned the sensory consonance of the target. For the purpose of the experimental task, half of the
target chords were rendered dissonant by adding either an augmented octave or an augmented fifth to
the perfect major triad. The velocity of this added tone was adjusted in order to render the dissonance
moderately hard to perceive. Participants had to judge as quickly and as accurately as possible
whether the second chord of the pair was consonant or dissonant. This perceptual judgment do not
require participants to pay attention to the harmonic relationship between prime and target. However,
we expected that this judgment will be more or less easy depending on this harmonic relationship.
Results. Participants with high level of musical training and participants with no musical training
processed the target chord more easily in the legal resolution condition than in the illegal condition.
This outcome suggests that for both musicians and non musicians cognitive components predominate
file:///g|/Wed/Bigand.htm (1 of 3) [18/07/2000 00:39:38]

over sensory components in a harmonic priming.
The goal of Experiment 2 was to extend this conclusion to longer harmonic contexts. Harmonic
priming has been shown to occur in chord sequences of different lengths for both musicians and non
musicians. In Bigand & Pineau (1997), the target chord was easier to process when it acted as a tonic
rather than as a subdominant chord in the preceding context. Bigand and Pineau's experiments were
not designed to definitely contrast the respective contribution of sensory and cognitive priming. The
chord sequences used in our present study attempted to address this issue. In two conditions (no
sensory priming conditions), the target chord never occurred in the preceding context. As a
consequence, the context never contained a tonic or a subdominant chord. Nevertheless, we expected
a stronger priming effect on the (implied) tonic chord than on the (implied) subdominant chord. In
two further conditions (sensory priming conditions), the subdominant chord occurred one or two times
in the context. The prior occurrence of the subdominant chord should increase the influence of
sensory priming. If long chord priming primary depends on a sensory component, the processing of
the subdominant chord should be easier than the processing of the tonic target chord. If long chord
priming primary depends on a cognitive component, the processing of the subdominant chord should
remain more difficult than the processing of the tonic target. Once again, the critical point of the study
was to assess the influence of these sensory and cognitive components as a function of the extent of
musical expertise.
Method and results. The method was identical to that of Experiment 1. The results demonstrated a
strong harmonic priming effect for the tonic chord. This priming effect was strictly unchanged when
comparing the sensory priming conditions to the no sensory priming conditions. Once again, non
musicians showed highly similar patterns of results.
Following the same rationale, we investigated in Experiment 3 the influence of horizontal motion on
the processing of the target chord. There was main effect of horizontal motion (with target chords
being more difficult to process in the bad voice leading condition), but no influence of horizontal
motion on harmonic priming. Irrespectively of the voice leading, tonic target chords were easier to
process than subdominant target chords. This finding was observed for both musicians and
nonmusicians.
Conclusions. These experiments provided evidence that for both musicians and non musicians the
processing of subtle changes in harmonic structure involves a sophisticated cognitive component that
does not depend on the extent of musical expertise.
References.
Bharucha, J. J. & Stoeckig, K. (1987). Priming of chords: Spreading activation or
overlapping frequency spectra? Perception and Psychophysics, 41, 519-24.
Bigand, E., & Pineau, M. (1997). Global context effects on musical expectancy.
NOTE:
This research was supported by the International Foundation for Music Research.

Back to index

Deriving Meaning from Sound: An Experiment
Proceedings paper
Deriving Meaning from Sound: An Experiment in Ecological Acoustics

Sean W. Coward and Catherine J. Stevens
Macarthur Auditory Research Centre, Sydney (MARCS)
University of Western Sydney
While speech and music perception have received considerable attention in psychology (Deutsch, 1999; Jusczyk, 1997), relatively little
is known about the way in which humans perceive other environmental sounds. Recent literature within the field of ecological acoustics
has focused on the way in which the soundwave contains meaningful information for the listener (Ballas & Howard, 1987; Ballas, 1993;
Gaver, 1986, 1989, 1993a, 1993b; Heine & Guski, 1991; Jenison, 1997; Pressing, 1997; Rosenblum, Wuestefeld, & Anderson, 1996;
Stoffregen & Pittenger, 1995; Warren & Verbrugge, 1984). For example, Gaver (1986) proposed that acoustic properties of the sound
signal convey information that enables identification of an associated event. Sound-event mappings that express consistent information
regarding the source are termed nomic, whereas symbolic mappings consist of the pairing of unrelated dimensions. Gaver (1986)
predicted that the redundancy of information expressed within nomic mappings results in an intuitive association, and that this initial
advantage aids learning relative to the learning of symbolic mappings.
Surprisingly, few of today's 'informative' sounds would appear to build on the inherent meaning in nomic mappings. The ring of a
telephone, the buzz of an alarm clock, and the wail of an ambulance siren seem to be designed to gain attention, but may require an
additional cognitive step to link sound and meaning. Although ultimately effective, symbolic mappings of this kind may be relatively
inefficient and require an unnecessary period of learning. Accordingly, the aim of the present study was to conduct a systematic,
experimental investigation of the relative ease of learning nomic and symbolic sound-event mappings. A review of the relevant literature
begins with an outline of Gaver's theoretical framework for the field of ecological acoustics.
Everyday versus Musical Listening
Gaver (1993b) distinguished between two types of auditory perception that reflect the attentional focus of the individual. Musical
listening is the experience of perceiving properties of the proximal stimulus as it reaches the ear. Sounds are perceived in terms of pitch,
loudness, rhythm, and other acoustic components typically analysed by psychologists and psychophysicists. This auditory experience is
common during the perception of music. As an example, the sound of a melody is usually appreciated with reference to the rhythm and
pitch variations that signify the song.
The study of musical listening has received considerable attention in psychology (Deutsch, 1999; Tighe & Dowling, 1993). As a result
we understand a good deal about the way humans perceive acoustic features. However, listeners do not always focus on the proximal
stimulus but on the distal stimulus. For example, while listening to a melody it is also possible to determine the musical instrument
responsible. Contemporary psychological theories are less adept at explaining how the listener identifies the source of the sound as a
guitar.
Gaver (1993b) addressed this issue with the notion of everyday listening. During the everyday listening experience the distal stimulus is
the focus of perception. Rather than perceiving the rhythm and melodic contour of sound, the individual perceives the event responsible
for producing the soundwave. According to Gaver, perception of the event is made possible by detecting consistent causal relationships
as described by physical law. For instance, the action of plucking a metal string suspended over a resonant cavity produces a specific
pattern of air disturbances that can only be produced by a constrained number of objects and events. An individual engaged in everyday
listening is consequently able to perceive the strum of a guitar.
While everyday or musical listening can be applied during the perception of any sound, the majority of psychological research on
audition has studied the proximal stimuli as described by psychoacoustics (e.g. Rasch & Plomp, 1999). This empirical emphasis on
musical, at the expense of everyday, listening is most likely a product of the assumption that auditory stimulation requires processing
before it becomes informative. However, proponents of ecological acoustics counter this assumption, arguing that the structure of the
soundwave conveys meaning for the listener. The present study provides the first experimental examination of Gaver's conception of
everyday and musical listening, with the prediction that meaningful information regarding the sound source is accessible only during the
everyday listening experience. This hypothesis was tested by systematically manipulating instructions to encourage either everyday or
musical listening, a technique that has been shown to influence the expectations and performance of listeners (Ballas & Howard, 1987).
Learning Sound-Event Mappings
Within Gaver's (1993a, 1993b) framework, the structure of a soundwave does not specify one certain source, but rather constrains the
range of events that could have produced the sound. As a result, the specifics of the sounding event presumably have to be learned. The
association of a signal (sound) to a referent (event) can be conceptualised as a mapping (Familant & Detweiler, 1993). A nomic mapping
file:///g|/Wed/Coward.htm (1 of 7) [18/07/2000 00:39:40]

associates a sound to an event with which it is causally linked (Gaver, 1986). For example, the use of the sound of fire to indicate the
event of burning is a nomic mapping as the two components are expressing the same event. In contrast, symbolic mappings are arbitrary
associations that rely on social convention for meaning (Gaver, 1986). In this way a fire alarm may be used to signify the event of fire.
According to Gaver (1986) the experience of everyday listening allows certain sound-event mappings to be learned more easily than
others. Nomic mappings involve complete congruence between acoustic information and the causally related occurrence, with the two
components expressing the same event. This redundancy occurs as the feature sets of the signal and referent are identical (Familant &
Detweiler, 1993), resulting in an implicit association. Research has shown that even young children find it relatively easy to map sounds
to an appropriate event with only one trial (Jacko & Rosenthal, 1997). By contrast, the acoustic properties of the sound in a symbolic
mapping bear no resemblance to the event it is meant to convey. Such an association must therefore be learned through contiguous
exposure. Consequently, Gaver (1986) provided the, as yet untested, hypothesis that nomic mappings are more easily learned than
symbolic mappings. As the implicit associations within nomic mappings are only evident during everyday listening, these learning
advantages are not likely to be realised during the musical listening experience.
Gaver (1986) was quick to point out that nomic mappings merely provide an initial associative advantage that is not maintained after a
mapping has been well learned. Speech is an obvious example of the efficient learning of symbolic relations. Nonetheless, there is great
potential to explore the efficacy of non-verbal nomic mappings in conveying information quickly and efficiently. This issue has
important design implications for auditory icons, which are natural sounds used to communicate information (Gaver, 1989). It is
therefore hypothesised that the learning advantage of nomic over symbolic mappings manifests during the initial learning trials; with
exposure, nomic and symbolic mappings become equivalent. The latter hypothesis will be tested by comparing performance in
immediate and delayed test conditions.
Identification of Nomic and Symbolic Mappings
One challenge for the field of ecological acoustics is to develop a taxonomy of nomic sound-event mappings. The majority of studies in
ecological acoustics have approached this task by examining the perception of complex sounds within their natural environment (e.g.
Vicente & Burns, 1996). Research has shown that both adults (Lass, Eastham, Parrish, Scherbick, & Ralph, 1982) and children (Jacko,
1996) are capable of accurately identifying the source of environmental sounds. One assumption of these studies is that the soundwave
produced during each event is unique and must be learned by the individual. Such studies imply that a taxonomy of nomic mappings
requires as many sounds as there are objects and events. While such empirical endeavors provide significant contributions to our
understanding of the organism-environment interaction, they fail to isolate and identify potential invariant mappings between specific
acoustic features and the corresponding properties of the sound source (Aslin & Smith, 1988).
In an experimental study addressing this topic, Warren and Verbrugge (1984) found that the differentiation of breaking and bouncing
glass objects was dependent on temporal properties of the sound. Their analysis suggested that sound-source perception may be reduced
to invariant temporal and spectral components. Specific acoustic features are therefore considered indicators of particular properties of
the sounding object.
Gaver (1993b) adopted this reductive approach by speculating that causal relations exist between certain acoustic properties and source
parameters, and that such mappings are invariant across sound-producing objects and events. For example, Gaver (1993a) proposed that
the pitch of a sound is an invariant indicator of the size of the sound-producing object. Accordingly, sound-producing events should
reflect the size of the sounding object via the pitch of the resulting sound. Larger objects presumably vibrate with larger oscillations and
consequently produce a sound of lower pitch than do smaller objects. Gaver (1989) utilised the proposed nomic mapping between pitch
and size within the computer interface SonicFinder, and this mapping has been recommended by others (Hereford & Winn, 1994).
The implication of a more reductive, psychophysical approach is that a finite set of invariant relations may be used to produce an infinite
number of nomic sound-event mappings. Gaver (1993a) advanced this notion by developing a number of algorithms capable of
synthesising complex sounds. One such algorithm, when entered into an appropriate sound synthesis program, replicates the sound of a
bar being struck with a mallet. By changing specified parameters within this equation it is possible to alter the perceived properties of the
bar. For instance, changing the value of the partial frequencies alters the perceived length of the bar, whereas changing the damping
constant results in a perceived change of material. As a result, a large number of objects and interactions can be synthesised by altering
certain parameters within a single algorithm.
Overview of the study
Despite being widely cited throughout the ecological literature, the rigorous psychophysical analysis performed by Warren and
Verbrugge (1984) is extremely rare within ecological acoustics (Heine & Guski, 1991). The possibility of compromising the ecological
validity so cherished within the ecological paradigm means that highly controlled laboratory experiments are in the minority (Shepard,
1984). However, the relative paucity of controlled experimentation has restricted causal explanations to the imprecise level of
unspecified stimulus interactions. Therefore, this study attempted to employ the precision of psychophysical reductionism while
maintaining some ecological validity.
The present experiment used the algorithm for struck bars as developed by Gaver (1993a). According to Gaver, pitch forms a nomic
mapping with bar length, whereas damping, which is proposed to indicate the material of a bar, could be used in a symbolic mapping
with bar length. Two pretests were conducted that attempted to test these predicted relations. The pretest experiments were based on the
speeded classification task of the Garner interference paradigm (Garner & Felfoldy, 1970). In the Garner methodology, two dimensions
are paired, with attributes on the second dimension either varied orthogonally or held constant (Melara & Marks, 1990). If participants
classify the first dimension more slowly when the irrelevant dimension is varied then it can be assumed that the two dimensions interact
in perception (Melara, 1989).

The purpose of this procedure is to identify integral dimensions that combine to produce a unitary perception. Nomic mappings may be
thought of in terms of the association of two perceptually integral dimensions, as both the signal and referent depict the same event. As
predicted, pitch and bar length were found to be perceptually integral, suggesting that they are nomically related. Damping and bar
length were confirmed as a symbolic mapping.
Acquisition of nomic and symbolic mappings was tested using a variation of the paired-associate learning task employed by Leiser,
Avons, and Carr (1989). There are several advantages provided by this design that are relevant. Most importantly, participants learn the
correct sound-meaning associations 'online' during an experiment via feedback: participants were required to guess the required
combinations at first exposure. This feature is useful for examining Gaver's (1986) claim that the intuitive association of sound and the
circumstances of production should be straightforward in nomic relationships.
As with all learning tasks, it is imperative to control the features of the associated referent to minimise confounds. For this reason bar
length was represented in numerical terms, with a three-digit measurement in millimetres used to distinguish among the different lengths.
It was concluded that these numbers were unlikely to differ in terms of familiarity, semantics, phonology, imagery, complexity, or
difficulty. While the selection of numerals to indicate length may be considered somewhat abstract, Pansky and Algom (1999) used the
Garner interference paradigm to demonstrate that numerical magnitude and physical size are perceptually integral dimensions.
Aim, Design, and Hypotheses
The aim of the present study was to examine the relative ease of learning nomic and symbolic sound-event mappings. The experiment
employed a 2X(2X2) factorial design: the two mapping levels of nomic and symbolic, an immediate and delayed test phase, and the
between-subjects factor of everyday versus musical listening. The dependent variable was the percentage of correct responses, a measure
widely considered to be a valid indicator of learning in humans (Brand & Jolles, 1985; Greene, 1988; Lachner, Satzger, & Engel, 1994;
Savage & Gouvier, 1992). The main hypothesis under investigation was that nomic mappings are more easily learned than symbolic
mappings. In an important qualification to this prediction, it is hypothesised that these learning advantages manifest only in the
immediate phase of the everyday listening condition. The experimental can therefore be sub-divided into four specific predictions:
Hypothesis 1: Learning in the nomic condition is superior to learning in the symbolic condition during the immediate phase within the
everyday listening group.
Hypothesis 2: Learning in the nomic condition is equivalent to learning in the symbolic condition during the delayed phase within the
everyday listening group.
Hypothesis 3: Learning in the nomic condition is equivalent to learning in the symbolic condition during the immediate phase within the
musical listening group.
Hypothesis 4: Learning in the nomic condition is equivalent to learning in the symbolic condition during the delayed phase within the
musical listening group.
Method
Participants
Participants were 40 students from the University of Western Sydney, Macarthur. Participation was voluntary and the only requirement
was that individuals had self-reported normal hearing and, for control purposes, no formal training in music.
Materials
Auditory stimuli. For the nomic variable of frequency, a scale consisting of ten sounds was constructed. An estimation was then made of
the bar lengths (measured in millimetres) necessary to produce these frequencies, when both bar width and thickness were held constant,
using a wooden xylophone as a guide. Another ten sounds were then produced for the symbolic category of damping. All sounds were
found to be distinguishable during pilot testing.
Visual stimuli. Prior to hearing the struck bar, participants were presented with a green circle positioned in the centre of the screen. This
served as a focus for the mouse pointer and ensured the pointer was equidistant from all numbers at the start of each trial. At the same
time as the sound was presented, five numbers were displayed in a circular configuration surrounding the area previously occupied by
the circle. The positions of the numbers were altered in each block to control for spatial organisation.
Apparatus. The experiment was designed and conducted using Powerlaboratory (Chute & Westall, 1996) and was run on one of two
Apple Macintosh computers; a Power Macintosh 7300/200 and a Power Macintosh G3.
Procedure
Prior to testing, participants were given a brief summary of the experimental procedure. The participants were prompted to position the
mouse on their favoured side and were fitted with headphones.
The experimental session was then initiated, with a more detailed set of instructions provided on screen. The instructions varied
according to the type of listening that the condition was attempting to induce. Participants assigned to the everyday listening group were
told that they would be presented with the sound of a struck pipe, and that this sound would be paired with the length of the pipe in
millimeters. Those in the musical listening condition were told that they would hear a sound and that it would be associated with a label.
The paired-associate learning task required participants to select (using the mouse) the appropriate number after hearing a sound.

Participants were given a brief practice phase to familiarise themselves with the procedure. After making each selection, the participant
was provided with feedback that evaluated their choice as either correct or incorrect, before providing the correct association.
Each of the five sounds was presented twice in random order within each block, and each mapping condition consisted of five blocks.
The order of mapping condition was counterbalanced across groups. Evidence of retention and the effects of prior learning were assessed
by repeating the test one week later (Barnard, Breeding and Cross, 1984). The duration of each session was 30 minutes.
Results
Four orthogonal planned comparisons were performed to test the experimental hypothesis (Howell, 1997). Alpha was set at .046 to
adjust for familywise error (Shavelson, 1998).
Hypothesis 1 stated that nomic mappings are learned better than symbolic mappings in the everyday listening-immediate test phase, and
the mean percentage of correct responses for the nomic condition (M = 65.70%, SD = 9.82%) was found to be significantly greater than
the symbolic mean (M = 55.80%, SD = 13.50%), t(19) = 3.21, p = .005. Hypothesis 2 predicted that the difference in learning
performance between nomic and symbolic mappings would not be maintained during the everyday listening-delayed test phase, with the
related comparison illustrating no difference between the nomic (M = 69.60%, SD = 8.80%) and symbolic scores (M = 70.50%, SD =
12.92%), t(19) = .32, p = .75. Hypothesis 3, stating that there is no difference in learning performance between nomic and symbolic
mappings in the musical listening-immediate condition, revealed no significant difference between the nomic (M = 62.80%, SD =
13.78%) and symbolic scores (M = 58.60%, SD = 16.28%), t(19) = 1.13, p = .27. Finally, Hypothesis 4 predicted no difference in ease of
learning between nomic and symbolic mappings in the musical listening-delayed phase, and the comparison failed to detect a significant
difference between the nomic (M = 67.80%, SD = 16.58%) and symbolic scores (M = 70.10%, SD = 12.02%), t(19) = .74, p = .47. A
graphic representation of the learning curves is provided in Figure 1.
Figure 1. Mean percentage of correct responses for each mapping condition in the everyday and musical listening groups. Standard error
of the mean is shown.
Discussion
The results support the four experimental hypotheses. Specifically, nomic mappings were learned significantly better than symbolic
mappings, but the advantage was restricted, as predicted, to the immediate phase of the everyday listening group. Figure 2 shows that the
largest discrepancy in performance between nomic and symbolic mappings occured on the first block of the everyday listening
condition. This observation endorses the notion that nomic mappings were more intuitive than symbolic mappings.

An ecological perspective provides one explanation of these findings. The nomic mapping of pitch-size afforded useful information to
participants about the length of the struck bar. A biologically-based explanation is that humans, possibly through evolution, have adapted
to associate, relatively easily, the pitch of a sounding object with size. Such a combination represents a nomic mapping because it
conforms to unchanging physical laws or states of affairs. These conditions have accompanied humans throughout history, and
phylogenetic imprinting of these laws may provide the basis for direct event perception. However, the present experiment used adult
participants highly familiar with relations between sounds and object size and, as a result, cannot rule out the possibility that the
facilitatory effects of nomic mappings are the result of experience.
Implications for Design of Auditory Icons
The present study demonstrates that adults can determine the relative length of a struck bar from the acoustic quality of pitch.
Importantly, this finding need not be restricted to impact sounds. Gaver (1993a, 1993b) suggests that more complex sound-producing
events may be reducible to a series of impacts. For example, Gaver proposed that scraping involves multiple impacts as the moving
object falls into depressions and hits raised ridges. As a result, the current research findings should generalise to a large number of
events.
The learning advantages of nomic mappings are not necessarily confined to pitch. Damping indicates the material of a struck object, and
amplitude, with its perceptual correlate of loudness, affords information about the proximity of an event and the force of the interaction
(Gaver 1993a). Warren and Verbrugge (1984) have illustrated the role of temporal properties during event perception. The challenge for
ecological psychology is to investigate the extraction of such relevant features from the soundwave and to map each component to the
invariant information that it affords. The resultant taxonomy of sound-event mappings would provide a framework for understanding
realised and potential meaning inherent in the acoustic array.
Results of the present study also imply that nomic mappings, as illustrated with pitch and object size, provide significant initial learning
advantages over symbolic mappings. Although the advantages are realised only during the initial stages of learning, such intuitive
mappings may minimise the need for training. If invariant relations are mapped together then the meaning of an auditory icon should be
obvious, learned quickly, and resistant to extinction.
Limitations and Future Directions
Criticisms of the current experimental research may question extrapolation to perception in the real world. For instance, the reductive
psychophysical analysis employed in the present study reduced the auditory stimuli to mere caricatures of everyday sounds. Importantly,
identification of the elements crucial for event perception was what prompted Gaver (1993a) to devise algorithms for synthesising
auditory stimuli. His reasoning was that if an artificial sound manages to produce an accurate identification of the desired
sound-producing event, then the essential spectral components have been included. It is also likely that the proposed one-to-one
relationship between pitch and object size is more complex in real world settings. Frequency is known to reflect the material and shape
of an object, as well as its size (Gaver, 1993a). However, when the attributes of shape and material are held constant, there is probably a
direct relationship between pitch and size (Gaver, 1993a). Thus, despite the risk of corrupting the natural listening experience, a
controlled laboratory experiment was considered appropriate for examining the predictions of the current study.
Further exploratory, empirical research is still required in the relatively new field of ecological acoustics. First, invariant acoustic
properties need to be examined and mapped to source related information. For instance, Gaver's (1993a) predictions regarding
sound-event mappings remain to be investigated. Pilot testing during operationalisation of the nomic and symbolic categories during the
present study employed the Garner interference paradigm, which appears to hold potential for the future investigation of sound-event
mappings. Second, while the present results are consistent with notions of direct perception and preparedness, they provide definitive
evidence for neither. Developmental and even comparative studies are needed to identify the basis for the advantage of nomic mappings.
Finally, it would be of significant theoretical interest to examine the possibility of generalising these concepts to linguistic and musical
domains. Interestingly, research into infant-directed speech suggests that prosodic cues, such as melodic contour and rhythm, rather than
semantic content, communicates intent to young infants via direct manipulation of instinctive physiological responses (Fernald, 1989). It
would be most interesting to examine whether invariant acoustic relations maintain their affordances in various auditory contexts from
language to music.
References
Aslin, R. N., & Smith, L. B. (1988). Perceptual development. Annual Review of Psychology, 39, 435-473.
Ballas, J. A. (1993). Common factors in the identification of an assortment of brief everyday sounds. Journal of
Experimental Psychology: Human Perception and Performance, 19(2), 250-267.
Ballas, J. A., & Howard, J. R. (1987). Interpreting the language of environmental sounds. Environment and Behavior, 19(1),
91-114.
Barnard, W. A., Breeding, M., & Cross, H. A. (1984). Object recognition as a function of stimulus characteristics. Bulletin
of the Psychonomic Society, 22(1), 15-18.
Brand, N., & Jolles, J. (1985). Learning and retrieval rate of words presented auditorily and visually. The Journal of General
Psychology, 112(2), 201-210.
Chute, D. L., & Westall, R.F. (1996). Fifth generation research tools: Collaborative development with Powerlaboratory.
Behavior Research Methods, Instruments, and Computers, 28, 311-314.

Deutsch, D. (Ed.). (1999). The Psychology of Music (2nd ed.). San Diego: Academic Press.
Familant, M. E., & Detweiler, M. C. (1993). Iconic reference: Evolving perspectives and an organizing framework.
International Journal of Man-Machine Studies, 39, 705-728.
Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the melody the message? Child
Garner, W. R., & Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing.
Gaver, W. W. (1986). Auditory icons: Using sound in computer interfaces. Human-Computer Interaction, 2, 167-177.
Gaver, W. W. (1989). The SonicFinder: An interface that uses auditory icons. Human-Computer Interaction, 4, 67-94.
Gaver, W. W. (1993a). How do we hear in the world?: Explorations in ecological acoustics. Ecological Psychology, 5(4),
285-313.
Gaver, W. W. (1993b). What in the world do we hear?: An ecological approach to auditory event perception. Ecological
Psychology, 5(1), 1-29.
Greene, R. (1988). Stimulus suffix effects in recognition memory. Memory & Cognition, 16(3), 206-209.
Heine, W., & Guski, R. (1991). Listening: The perception of auditory events? Ecological Psychology, 3(3), 263-275.
Hereford, J., & Winn, W. (1994). Non-speech sound in human-computer interaction: A review and design guidelines.
Journal of Educational Computing Research, 11(3), 211-233.
Howell, D. C. (1997). Statistical Methods for Psychology (4th ed.). Belmont: Duxbury Press.
Jacko, J. A. (1996). The identifiability of auditory icons for use in educational software for children. Interacting With
Computers, 8(2), 121-133.
Jacko, J. A., & Rosenthal, D. J. (1997). Age-related differences in the mapping of auditory icons to visual icons in computer
interfaces for children. Perceptual and Motor Skills, 84, 1223-1233.
Jenison, R. L. (1997). On acoustic information for motion. Ecological Psychology, 9(2), 131-151.
Jusczyk, P. W. (1997). The Discovery of Spoken Language. Cambridge: MIT Press.
Lachner, G., Satzger, W., & Engel, R. R. (1994). Verbal memory tests in the differential diagnosis of depression and
dementia: Discriminative power of seven test variations. Archives of Clinical Neuropsychology, 9(1), 1-13.
Lass, N. J., Eastham, S. K., Parrish, W. C., Scherbick, K. A., & Ralph, D. M. (1982). Listeners' identification of
environmental sounds. Perceptual and Motor Skills, 55, 75-78.
Leiser, R. G., Avons, S. E., & Carr, D. J. (1989). Paralanguage and human-computer interaction. Part 2: comprehension of
synthesized vocal segregates. Behaviour & Information Technology, 8(1), 23-32.
Melara, R. D. (1989). Dimensional interaction between color and pitch. Journal of Experimental Psychology: Human
Perception and Performance, 15(1), 69-79.
Melara, R. D., & Marks, L. E. (1990). Dimensional interactions in language processing: Investigating directions and levels
of crosstalk. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(4), 539-554.
Pansky, A., & Algom, D. (1999). Stroop and Garner effects in comparative judgement of numerals: The role of attention.
Journal of Experimental Psychology: Human Perception and Performance, 25(1), 39-58.
Pressing, J. (1997). Some perspectives on performed sound and music in virtual environments. Presence, 6(4), 482-503.
Rasch, R., & Plomp, R. (1999). The perception of musical tones. D. Deutsch (Ed.) The Psychology of Music (2nd ed.) (pp.
89-112). San Diego: Academic Press.
Rosenblum, L. D., Wuestefeld, A. P., & Anderson, K. L. (1996). Auditory reachability: An affordance approach to the
perception of sound source distance. Ecological Psychology, 8(1), 1-24.
Savage, R., & Gouvier, W. (1992). Rey auditory-verbal learning test: The effects of age and gender, and norms for delayed
recall and story recognition trials. Archives of Clinical Neuropsychology, 7(5), 407-414.
Seligman, M. E. P. (1970). On the generality of the laws of learning. Psychological Review, 77(5), 406-418.
Seligman, M. E. P. (1971). Phobias and preparedness. Behavior Therapy, 2, 307-320.
Shavelson, R. J. (1988). Statistical Reasoning for the Behavioral Sciences (2nd ed.). Boston: Allyn & Bacon.
Shepard, R. N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiving, imagining,

thinking, and dreaming. Psychological Review, 91(4), 417-447.
Stoffregen, T. A., & Pittenger, J. B. (1995). Human echolocation as a basic form of perception and action. Ecological
Psychology, 7(3), 181-216.
Tighe, T. J., & Dowling, W. J. (1993). Psychology and Music: The Understanding of Melody and Rhythm. Hillsdale:
Erlbaum.
Vicente, K. J., & Burns, C. M. (1996). Evidence for direct perception from cognition in the wild. Ecological Psychology,
8(3), 269-280.
Warren, W. H., & Verbrugge, R. R. (1984). Auditory perception of breaking and bouncing events: A case study in
ecological acoustics. Journal of Experimental Psychology: Human Perception and Performance, 10(5), 704-712.
Back to index

ARTIST: A CONNECTIONIST MODEL OF MUSICAL ACCULTURATION
Proceedings paper

Frederic G. Piat
Image Video and Multimedia Systems Laboratory,
Dept. of Electrical and Computer Engineering,
National Technical University of Athens,
Zographou 15773, Greece
(+301) 772-2521
piat@image.ntua.gr
http://www.image.ece.ntua.gr/~piat/
Table of Contents
● Background
● Aims
● Main Contribution
❍ The Model ARTIST
❍ Melodic Expectancies
❍ Stylistic Expectancies
❍ Implications
❍ References
❍ Figures
❍ Back to ICMPC6 index
Background
It is often not until we are faced with an unfamiliar musical style that we fully realize the importance of the musical mental schemata gradually acquired through our past listening experience. These cognitive structures automatically intervene as music is heard, and they are necessary to build integrated and organized perceptions from acoustic
sensations: without them, as it happens when listening to a piece in a musical style foreign to our experience, a flow of notes seems like a flow of words in a foreign language, incoherent and unintelligible. The impression is that all pieces or phrases sound more or less the same, and musical styles such as Indian Rags, Chinese Guqin or Balinese
Gamelan are often qualified as being monotonous by Western listeners, new to these kinds of music. This happens to experienced, musically trained listeners as well as to listeners without any musical experience other than just listening. Thus it is clear that the mental schemata required to interpret a certain kind of music can be acquired through
gradual acculturation (Francès, 1988), which is the result of passive listening in the sense that it does not require any conscious effort or attention directed towards learning. This is not to say that formal training has no influence, but only that it is not necessary and that exposure to the music is sufficient.
Becoming familiar with a particular musical style usually implies two things: 1) The memorization of particular melodies 2) An intuitive sense of the prototypicality of musical sequences relative to that style (i.e., the sense of tonality in the context of Western music). These underlie two kinds of expectancies, respectively melodic and stylistic
expectancies. Melodic (also called 'veridical') expectancies rely on the listener's familiarity with a particular melody and refer to his knowledge of which notes will be coming next after hearing part of it. Stylistic expectancies rely on the listener's familiarity with a particular musical style, and refer to his sense of the notes that should or will probably
follow a passage in order for the piece to fit well in that style. These expectancies can be probed in different ways, for instance with Dowling's (1973) recognition task of familiar melodies interleaved with distractor notes, and Krumhansl and Shepard's (1979) probe-tone technique, respectively.
<-- Back to Table of Contents>
Aims
Some connectionist models of tonality have been proposed before but they are rarely realistic in that they often use a priori knowledge from the musical domain (e.g., octave equivalence) or are built without going through learning (Bharucha, 1987; extended by Laden, 1995). This paper presents an Artificial Neural Network (ANN), based on a
simplified version of Grossberg's (1982) Adaptive Resonance Theory (ART) to model the tonal acculturation process. The model does not presuppose any musical knowledge except the categorical perception of pitch for its input, which is a research problem in itself (Sano and Jenkins, 1989) and beyond the scope of this paper. The model gradually
develops through unsupervised learning. That is, it does not need any other information than that present in the music to generate the schemata, just like humans do not need a teacher. Gjerdingen (1990) used a similar model for the categorization of musical patterns, but did not aim at checking the cognitive reality of these musical categories. Page
(1999) also applied successfully ART2 networks to the perception of musical sequences. The goal of the present paper is to show that this simple and realistic model is cognitively pertinent, by comparing its behaviour with humans' directly on the same tasks. As mentioned in the previous section, these tasks have been chosen because they are robust,
having stood the test of time, and because they reflect broad and fundamental aspects of music cognition.
Main contribution
The Model ARTIST
The ART2 self-organizing ANN (Carpenter and Grossberg, 1987) was developed for the classification of analogue input patterns and is well suited for music processing. It seemed a bit more complex than what is needed here, and a few simplifications were made to build the present model, ARTIST (Adaptive Resonance Theory to Internalize the
Structure of Tonality). It is made up of 2 layers (or fields) of neurons, the input field (F1) and the categories field (F2), connected by synaptic weights that play the role of both Bottom-Up and Top-Down connections. Learning occurs through the modification of the weights, that progressively tune the 'category units' in F2 to be most responsive to a
certain input pattern (the 'prototype' for this category). The weights store the long-term memory of the model.
❍ Input Layer
The neurons in F1 represent the notes played. For now the model will be tested only with conventional Western music, so an acoustic resolution of one neuron per semitone is sufficient to code the musical pieces used. This is the only constraint applied to the model imposed by assuming Western music, and this can easily be
overridden simply by changing the number of input nodes. Bach's 24 preludes from the Well-tempered clavier were used for learning. The notes they contain span 6 octaves. With 12 notes per octave, 72 nodes are needed to code the inputs. The activation of the inputs is updated at the end of every measure. Each note played within the
last measure activates its corresponding input neuron proportionally to its loudness (or velocity; notes falling on beats 1 and 3 were accentuated) and according to a temporal exponential decay (activation is halved every measure). Before propagating the activation to F2, the activation in F1 is normalized. Each prelude was transposed in
the 12 possible keys, so 288 pieces were available for training ARTIST.
❍ Classification Layer and Learning
file:///g|/Wed/Piat.htm (1 of 5) [18/07/2000 00:39:45]

Upon presentation of an input to F1, the activation of the F2 nodes is computed. The degree of activation (or match) of each category depends on the similarity between its prototype (stored in the weights) and the input. The fuzzy AND operator (i.e., min) is used as the similarity measure, which is equivalent to computing the
proportion of features common to both the input and the prototype. The most activated category is then chosen as the winner to simulate lateral inhibition, and the other categories' activations reset to zero. If in learning mode, the weights of the winner are updated at this point (see next paragraph). Then Top-Down activation from the
winner propagates back to F1 as the new input measure is being presented; the average constitutes the new F1 activation, and the cycle starts again.
Upon learning, the winner category is subject to the vigilance test, a way to test the hypothesis that the stimulus belongs to the category: if the match is higher than the set vigilance parameter, resonance occurs. That is, the input is considered as belonging to the category, and its prototype is modified to reflect the new input. The new
weights are a linear combination of the old weights and of the input pattern being newly integrated into the category. They are then normalized, to avoid the problem of synaptic erosion (the weights decreasing towards 0) and the subsequent category proliferation and classification instability. If the vigilance test fails, that is if the input
is too different from even the best fitting prototype, a new category is created and its prototype set equal to the input pattern. Thus the vigilance parameter controls the generality of the categories. A low value (close to 0) generates a few broad categories, having rather abstract prototypes, each representing many exemplars. On the other
hand, a high vigilance value (close to 1) creates many narrow categories that contain few exemplars, or only one in the extreme case which is perfect memorization (but poor abstraction).
ARTIST's behaviour is quite robust regarding the vigilance values, at least when they lie in the middle range. With the chosen value equal to 0.55, the presentation of the 41,004 input measures constituting the 288 pieces resulted in the creation of 709 categories.
Melodic Expectancies
When we are very familiar with a melody, we can usually still recognize it after various transformations like transposition, rhythmic or tonal variations, etc... This is not the case when distractor (random) notes are added in between the melody notes, and even the most familiar tunes become unrecognizable as long as the distractors 'fit in' (if no
primary acoustic cue like frequency range, timbre or loudness for instance segregates them; Bregman, 1990). However, when given a few possibilities regarding the identity of the melody, it can be positively identified (Dowling, 1973). This means that Top-Down knowledge can be used to test hypotheses and categorize stimuli. For melodies, this
knowledge takes the form of a pitch-time window within which the next note should occur, and enables the direction of auditory attention (Dowling, Lung & Herrbold, 1987; Dowling, 1990). As the number of possibilities offered to the subject increases, his ability to name that tune decreases: when Top-Down knowledge becomes less focused,
categorization gets more difficult. With its built-in mechanism of Top-Down activation propagation, ARTIST can be subjected to the same task.
❍ Method
To get ARTIST to become very familiar with the first 2 measures of 'Twinkle twinkle little star', the learning rate and vigilance values were set to their maximum equal to 1 for both), so that the learning procedure would create two new F2 nodes to memorize those two exemplars and act as labels for the tune. Had the vigilance level
been too low, the tune would have been assimilated into an already existing category, which activation couldn't be interpreted as recognition of the tune. After learning the tune, the activation in F2 was recorded under 5 conditions corresponding to the presentation of:
■ The original melody alone
■ The melody with distractors,
■ No Top-Down activation
■ Top-Down activation from the label nodes

■ Top-Down activation from the label nodes and 3 others (random)
■ Control condition: A different melody (Prelude #1) with distractors, Top-Down activation from the label nodes
The control condition is necessary to make sure that testing the hypothesis that the tune is 'Twinkle twinkle...' by activating the label nodes does not always provoke false alarms.
For each condition, the activation ranks of the 2 label nodes were computed (1 for the most activated, 2 for the next, and so on) and added. A low rank indicates few categories competing and interfering with the recognition, and a probable "Yes" response to the question "Is this Twinkle twinkle ?". As the sum of the ranks increases,
meaning the label nodes are overpowered by other categories, the response goes towards "No".
❍ Results
Results appear in Figure 1. Thus the straight presentation of the tune results in the smallest possible summed ranks, equal to 3: ARTIST recognizes unambiguously the melody. When the melody is presented with distractors, the ranks are higher, indicating some difficulty in recognition. Among these 3 conditions, the lowest ranks are
found when testing the 'Twinkle' hypothesis and only that one. The label nodes are amongst the top 5 most activated, which suggests a strong possibility to identify the melody. Then, identification gets much more difficult when testing multiple hypotheses, about as difficult as without Top-Down activation (no explicit hypothesis is
being tested), exactly like when human subjects are not given a clue about the possible identity of the melody. Finally, the control condition shows that ARTIST does not imagine recognizing the melody amongst distractors when it is not there, even after priming the activation of its notes through Top-Down propagation.
Equivalent results can be obtained by summing the activations of the label nodes instead of computing their ranks. However, given the important number of categories, small differences of activation (especially in the middle range) imply strong differences in ranks and therefore the latter measure was preferred to exhibit the contrast
between conditions. In any case, the order of the conditions most likely to lead to the recognition of the familiar melody is the same for ARTIST and humans, and the effects of melodic expectancies can easily be observed in ARTIST.
Stylistic Expectancies
The most general and concise characterization of tonality --and therefore of most Western music-- probably comes from the work of Krumhansl (1990). With the probe-tone technique, she empirically quantified the relative importance of pitches within the context of any major or minor key, by what is known as the 'tonal hierarchies'. These findings
are closely related to just about any aspect of tonality and of pitch use: frequency of occurrence, accumulated durations, aesthetic judgements of all sorts (e.g., pitch occurrence, chord changes or harmonization), chord substitutions, resolutions, etc... Many studies support the cognitive reality of the tonal hierarchies (Jarvinen, 1995; Cuddy, 1993;
Repp, 1996; Sloboda, 1985; Janata and Reisberg, 1988). All these suggest that subjecting ARTIST to the probe-tone technique is a good way to probe whether it has extracted a notion of tonality (or its usage rules) from the music it was exposed to, or at least elements that enable a reconstruction of what tonality is.
❍ Method
The principle of the probe-tone technique is quite simple. A prototypical sequence of chords or notes is used as musical context, to establish a sense of key. The context is followed by a note, the probe tone, that the subjects have to rate on a scale to reflect how well the tone fits within this context. Repeating this procedure for all 12
possible probe notes, the tone profile of the given key can be generated. Out of the many types of contexts used by Krumhansl et al. over several experiments, the 3 standard ones were retained to test ARTIST: for each key and mode (major and minor), the corresponding chord, the ascending and the descending scales were used as
contexts. Several keys are used so the results do not depend on the choice of the particular pitch of reference. Here all 12 keys are used (as opposed to 4 by Krumhansl) and ARTIST's profile is obtained by averaging the profiles obtained with the 3 contexts for each key, after transposition to a common tonic. Thus the tone profile
obtained for each mode is the result of 432 trials (3 contexts × 12 keys × 12 probes). After each trial, the activations of the F2 nodes were recorded. Following Katz' (1999) idea that network's total activation directly relates to pleasantness, the sum of all activations in F2 is taken as ARTIST's response to the stimulus, the
index of its receptiveness/aesthetic judgement towards the musical sequence.
It appeared in the first studies using the probe-tone technique that subjects judged the fitness of the probe-tone more as a function of its acoustic frequency distance from the last note played rather than as a function of tonal salience. This results naturally from the usual structure of melodies, that favours small steps between consecutive
notes. This problem was eluded by using Shepard tones (Shepard, 1964). They are designed to conserve the pitch identity of a tone while removing the cues pertaining to its height, and their use to generate ever-ascending scales illusions prove they indeed possess this property. Shepard tones are produced by generating all the
harmonics of a note, filtered through a bell-shaped amplitude envelope. To simulate Shepard tones for ARTIST, the notes are played simultaneously on all 6 octaves, with different velocities (loudness) according to the amplitude filter: high velocities for the middle octaves, decreasing as the notes get close to the boundaries of the
frequency range.
❍ Results
Figures 2 and 3 allow direct comparison of the tone profiles obtained with human data (Krumhansl and Kessler, 1982) and with ARTIST, for major and minor keys, respectively. Both Pearson correlation coefficients between human and ARTIST profiles are significant, respectively -.95 and -.91, p<.01 (2-tail). Surprisingly, the
correlations are negative, so ARTIST's profiles are inverted on the figures for easier comparison with the human data. This is discussed in the next section. Once the tone profile for a particular key is available, we can deduce the profiles of all other keys by transposition. Then, we can compute the correlation between two different key
profiles as a measure of the distance between the two keys. This is the procedure used by Krumhansl to obtain all the inter-key distances, and the same was done with ARTIST data, to check whether its notion of key distances conforms that of humans. Both graphs for distances between C major and all minor keys are shown in Figure
4. Keys on the X-axis appear in the same order as around the circle of fifths. It is immediately apparent that both profiles are close to being identical. This is even more true for the key distances between C major and all major keys, as well as for distances between C minor and all minor keys: The correlations between human and
ARTIST data for major-major, minor-minor and major-minor key distances are respectively .988, .974 and .972, all significant at p<.01. Thus ARTIST clearly emulates human responses on the probe-tone task, and therefore can be said to have developed a notion of tonality, with the tonal invariants extracted directly from the musical
environment.
Implications
From the two simulations above we can see that it is easy to subject ARTIST to the same musical tasks as given to humans in a natural way, and that it approximates human behaviour very closely on these tasks. When probed with the standard techniques, it shows both melodic and stylistic expectancies, the two main aspects of musical
acculturation. ARTIST learns unsupervised, and its knowledge is acquired only from exposure to music, so it is a realistic model explaining how musical mental schemata can be formed. The implication is that all is needed to accomplish those complex musical processings and to develop mental schemata is a memory system capable of
storing information according to similarity and abstracting prototypes from similar inputs, while constantly interpreting the inputs through the filter of Top-Down (already acquired) knowledge. From the common action of the mental schemata results a musical processing sensitive to tonality. This property emerges from the internal
organisation of the neural network, it is distributed over its whole architecture. Thus it can be said that the structure of tonality has been internalized. Testing ARTIST with other musical styles could further establish it as a general model of music perception. In the simulation of the probe-tone task, ARTIST's response has to be recorded before
any lateral inhibition occurs in F2. Otherwise, the sum of all activations in F2 would simply be that of the winner, all others being null, and a lot of information regarding ARTIST's reaction would be lost. This is taking one step further Gjerdingen's (1990) argument for using ANNs, namely that cognitive musical phenomena are probably too

complex to be represented through the tidiness of a set of rules. In the present example, the simulation of complex human behaviours is achieved through the 'chaos' of the activation of hundreds of prototypes, each activated to a degree reflecting its resemblance to the input. This explains why the correlations between human and ARTIST tone
profiles are negative, and resolves the apparent contradiction with Katz' (1999) theory of the aesthetic ideal being 'unity in diversity': The global activation in F2 is a measure of diversity, not of unity. Lateral inhibition is the key element of the theory, but it is deliberately not used here to preserve all aspects of the complexity of the abstract
representation of a stimulus. The major limitation of ARTIST in its current state is that it cannot account for transpositional invariance. Whether the perception of invariance under transposition can be acquired at all through learning is not obvious, as the question concerning how humans possess this ability is still opened.
References
Bharucha, J.J. (1987). Music cognition and perceptual facilitation: A connectionist framework. Music Perception, 5(1), 1-30. Bregman, A.S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press. Carpenter, G.A., & Grossberg (1987). ART2: Self-organization of stable category recognition codes for analog input patterns. Applied
optics, 26, 4919-4930. Cuddy, L.L. (1993). Melody comprehension and tonal structure. In T.J.Tighe & W.J.Dowling (Eds.), Psychology and music: the understanding of melody and rhythm. Hillsdale, NJ: Erlbraum. Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive psychology, 5, 322-337. Dowling, W.J. (1990).
Expectancy and attention in melody perception. Psychomusicology, 9(2), 148-160. Dowling W.J., Lung, K.M.T., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception and Psychophysics, 41, 642-656. Francès, R. (1988). La perception de la musique. Hillsdale, NJ: Erlbraum. Originally
published 1954. Librairie philosophique J. Vrin, Paris. (Translated by W.J.Dowling) Gjerdingen, R.O. (1990). Categorization of musical patterns by self-organizing neuronlike networks. Music Perception, 7, 339-370. Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception, development, cognition and motor
control. Boston:D.Reidel/Kluwer. Janata, P., & Reisberg, D. (1988). Response-time measures as a means of exploring tonal hierarchies. Music Perception, 6(2), 161-172. Jarvinen, T. (1995). Tonal hierarchies in Jazz improvisation. Music Perception, 12(4), 415-437. Katz, B.F. (1999). An ear for melody. In Griffith, N. & Todd, P. (Eds.)
Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 199-224. Krumhansl, C.L. (1990). The cognitive foundations of musical pitch. Oxford psychology series, No. 17. Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation
of musical keys. Psychological Review, 89, 334-368. Krumhansl, C., & Shepard, R. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5, 579-594. Laden, B. (1995). Modeling cognition of tonal music. Psychomusicology, 14, 154-172.
Page, M.P.A. (1999). Modelling the perception of musical sequences with self-organizing neural networks. In Griffith, N. & Todd, P. (Eds.) Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 175-198. Repp, B.H. (1996). The Art of inaccuracy: Why pianists' errors are difficult to hear. Music
Perception, 14(2), 161-184. Sano, H, & Jenkins, B.K. (1989). A neural network model for pitch perception. Computer Music Journal, 13(3). Shepard, R.N. (1964). Circularity in judgments of relative pitch. The journal of the Acoustical Society of America, 36, 2346-53. Sloboda, J.A. (1985). The musical mind. Oxford psychology series, No.5.
Figures


From top to bottom: Figure 1. Summed ranks of the 2 label nodes for 'Twinkle twinkle' as a function of stimulus played and hypothesis tested. Figure 2. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile (correlation = .95). Figure 3. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile
(correlation = .91). Figure 4. Comparison of ARTIST and Krumhansl & Kessler (1982) inter-key distances between C major and all minor keys (correlation = .972).

Schubert
Proceedings paper
Unresolved Issues in Continuous Response Methodology: The Case of Time Series Correlations
Emery Schubert
University of New South Wales
Sydney 2052 NSW
AUSTRALIA
Phone: +61-2-9385 6808
Fax: +61-2-9313-7326
Email: E.Schubert@unsw.edu.au
ABSTRACT
Background: While continuous response methodologies have become increasingly popular among researchers of
emotional response to music, the literature is very light on critical analysis of the methodology.
Aims: This paper investigates the common formats in which the methodology has appeared: open-ended, checklist, and
rating scale; the kinds of problems for which it has been used: validation, comparison, the relationship between
stimulus and response, and the dynamic lag structure of the musicâ€"emotion system; and the analytic techniques
which have been applied: interoccular tests, correlation analysis, repeated measures approaches, and traditional time
series analytic techniques.
Main Contribution: The most popular continuous response format is the rating scale, however there is little
experimental evidence to support the reliability of this format over the checklist or the open-ended format. Also unclear
is the kind of rating scale to use (unipolar or bipolar), the number of scales to use simultaneously (one, two, or three),
the response sampling rate, and the label identifiers of the scales. Another serious problem facing continuous response
research is the analysis of data. While time-series textbooks have for a long time warned against the use of visual
inspection as the sole method of analysing continuous data, the literature is riddled with conclusions based on just such
a technique.
Implications: In this paper I argue that elementary methods of time series analysis can be applied by researchers to
produce a more valid basis for investigating their data. I also argue that continuous response methodology in
musicâ€"emotion research is in its infancy â€" evidenced by the large proportion of validation and comparative studies.
If and when the methodology matures, its most beneficial application will be in helping to understand the dynamic
structure of the musicâ€"emotion system, and not so much the understanding of basic stimulus-response relationships,
which traditional asynchronous approaches can do more efficiently.
FULL PAPER
Introduction
A common problem in studying emotional responses to music is that of lacking ecological
(naturalistic) validity. In the typical study, a listener will hear an excerpt of music and at the end of the
excerpt he or she will be asked to indicate the emotion expressed by the music or experienced by the
listener (eg, Gabrielsson & Juslin, 1996; Heinlein, 1932). This is a highly efficient way of collecting
data on emotional response to music. However, such instantaneous responses cannot tap into the
subtle patterns of emotion which change from moment to moment through the course of a listening.
file:///g|/Wed/Schubert.htm (1 of 9) [18/07/2000 00:39:47]

Schubert
For example, they cannot provide information about the lag structure between one response and
another, or between response and stimulus (Schubert & Dunsmuir, 1999).
One of the remedies for this problem is to measure responses to the musical stimulus continuously.
Instead of making a response at the end of an excerpt, the individual is continually assessing the
expressed or experienced emotion during listening. Such continuous responses enable the researcher
to build up a profile of the relationship between the stimulus and the response within a more realistic
musical context and psychologically valid framework. However, this methodology brings with it a
range of problems, many of which are yet to surface.
In this presentation I will mention methodological issues and concerns of which researchers using
continuous response devices should be aware. I will then focus on one such problem, specifically the
question of correlating comparative time series data.
General Methodological Issues
Two broad categories of problems in continuous response methodology are the response task
requirement and the analysis of continuous response data. An example of the response task problem is
that the concentration on the response task continuously is itself unnaturalistic. This should be a cause
for concern for it contradicts the initial drive toward the methodology (ecological validity). However,
continuous response researchers have found what appear to be reasonably adequate solutions to the
problem. A typical solution is to make the continuous task a simple one by making responses on a
single scale, such as amount of emotion (Krumhansl, 1998), tension (Madsen & Fredrickson, 1993;
Nielsen, 1983) or aesthetic experience (Madsen, Brittin & Capperella-Sheldon, 1993), which a
computer samples automatically in the background.
More serious are the issues regarding the analysis of continuous, time-series data. Many researchers of
emotional response to music who have chosen to adopt continuous response methodologies have yet
to come to terms with the issues that are pertinent in time series data analysis (Schubert, in press).
Amongst the analytic issues there are problems which lie on either side of a spectrum of
methodological issues (Figure 1). On one pole a large amount of continuous data is obtained but the
point of the collection is not immediately apparent (I call the extreme of this pole â€ño analysis'). For
example, if a researcher is going to collect time series data and then report on the time-average
(perhaps because he or she cannot find an appropriate way to analyse the data in its time series form),
the researcher should consider whether the extra effort in collecting continuous data was worthwhile.
On the other end of the spectrum, analysis is often applied which is appropriate for parametric data,
but not for serially correlated data (this is a problem of using parametric inferential statistical
analysis). For example Analysis of Variance is generally not an appropriate form of analysis for time
series data because the assumption of independent, normally distributed data is usually violated
(Gibbons, 1993). Somewhere along this spectrum lies the most common problem of emotional
response studies which analyse continuous responses: the interpretation of a visual inspection of the
time series. Gottman (1981) refers to this as an â€˜interoccular test' and warns against the use of this
descriptive approach as sole means of analysing time series data. While many of these issues are
known, particularly in the fields of Economics, Geography and Engineering, there also exists firmly
grounded literature to explain and correct these problems (eg. Box & Jenkins, 1976; Hamilton, 1994).
In this presentation I will focus on one issue: the use of Pearson's product-moment correlation
technique for comparing two or more time series.
Figure 1. Spectrum of problems associated with time series data analysis of emotional
response literature.

Schubert
No Analysis Descriptive Analysis Inferential Analysis
Continuous data collected Interoccular testing (i.e. Continuous data

but not really required or not describing the time series analysed without
used effectively response profile by visual attention to serial
inspection alone) correlation.
Correlation Analysis of Time-Series Data

The growing number of emotional response studies using continuous response has placed pressure on
researchers to find and appraise appropriate analytic techniques. For example, a common method for
comparing bivariate responses is to perform a Pearson product-moment correlation analysis (eg. see
Howell, 1997). The Pearson product-moment correlation procedure is used regularly for comparing
emotional response time series data (eg. Fredrickson, 1997; 1999; Frego, 1999; Krumhansl, 1996;
1997; 1998; Madsen, 1997; 1998; Madsen, Brittin, Capperella-Sheldon, 1993). Many of these studies
did not report the type of correlation analysis performed, however, convention suggests, and the
findings of the present study support, that Pearson's method was probably used in each case. Also,
some of these studies did not report the significance of the tests. Again, as will become apparent,
probably all studies would have reported a significant correlation coefficient with p â‰¤ 0.05. Given
that time series data generally contain serial correlation, it follows that the Pearson product-moment
correlation will produce an inaccurate estimate of the correlation coefficient, r. Provided the
correlation coefficients are compared to one another within data sets produced in response to the same
stimuli, this may not pose a major problem. The researcher can still make assertions about the ranking
of the correlation coefficients and hence report with confidence that time series A is more correlated
to time series C than is time series B. However, problems arise when (1) the correlation coefficient is
compared with coefficients from other sources and (2) when the significance of the coefficient is
taken "literally", without consideration for the effect of serial correlation.
An example of the first kind of problem is in reliability estimation, such as test-retest analysis. All the
studies cited which investigated test-retest scores reported high and significant correlation
coefficients. However, none of these studies determined whether the results were confounded by
serial correlation. In all likelihood, these time dependent data do contain serial correlation: A response
will be determined not by instantaneous changes in the musical signal, but by a collection of musical
events or contexts. The memory of the listener for the recently passing musical events is a
psychological manifestation of serial correlation. The large number of data points collected by
computer controlled continuous response devices further ensures that the correlation coefficient will
be significant. This is because the critical correlation coefficient, above which significance is
identified (testing the null hypothesis of a zero correlation), decreases as the number of samples
increases. There are instances in the literature when a result appears surprising, but the analytic
technique is not questioned. For example, Madsen, Byrnes, Capperella-Sheldon and Brittin (1993)
report that "no two people seem to relate to the same piece of music in exactly the same way, although
in test-retest situations each person responds similarly at the same points in the music, even after an
extended period of time, if the subject is listening in a similar situation" (p. 188). Madsen and his
associates refer to a correlation coefficient of .9 for the test-retest analysis (Madsen, Brittin,
Caperella-Sheldon, 1993, p. 65) but present no correlation coefficient for comparison with
intersubject response.
This example identifies a major problem with correlation analyses in the literature, in that the studies I

Schubert
investigated do not attempt to support their findings by falsification (Stanovich, 1998). They tend to
report positive relationships (significant correlations) without examining correlations that should not
be significant. Consequently, these researchers cannot be aware as to whether their significant
correlations (and they almost always are, or appear to be) are meaningful, or whether they are in fact
false correlations that have appeared due to underlying serial correlation. (That is, the correlation
gives information about the music stimulus, not just the measuring instrument). The assumption of the
independent sampling of the Pearson product-moment correlation is quite likely violated in time series
data.
A particular study prompted me to investigate the situation with correlations and time series data
further. Fredrickson (1997) reported the results of continuous tension responses as sample-by-sample
mean responses from 2nd grade, 5th grade, 8th grade, 11/12 grade students, professional and
non-professional musician listeners. All groups were highly correlated with one-another. Fredrickson
ranked the coefficients to demonstrate the strength of similarity between the various groups. He then
reported that "of particular note is that the correlations, including the lowest one of .71 between the
second graders and the musicians, were all significant at the [sic] p = 0.001 level" (p. 630). However,
if the correlation of 0.71 is inflated or it is a false correlation due to underlying serial correlation, it
leaves room open for a flood of studies to report incorrectly high correlation coefficients and to treat
them as meaningful results. While I believe that something like this is already happening in the
literature, at the same time I do not argue that Fredrickson's analysis is necessarily wrong. Instead I
felt that the use of correlation analysis of time series emotional response data required some
investigation.
In this paper I present some data which address the issue of correlation coefficients for serially
correlated, time-series responses. I do not claim to find a solution to the problem, but I do intend to
make researchers cognisant of issues concerning the application of correlation coefficients.
Monte Carlo Study
Using a sample of data from a study by Schubert (1999a), I constructed a pseudo-Monte Carlo study
to investigate the effect of correlation coefficient calculation. The study is pseudo-Monte Carlo
because I did not select data from a predetermined distribution (Mooney, 1997). Instead, I used actual
time series data which was collected in the form of emotional responses to music. The data comes
from continuous responses to three pieces of music: Edvard Greig's â€˜Morning Mood' from Peer
Gynt, Joaquin Rodrigo's Adagio movement from Concierto de Aranjuez for Guitar and Orchestra and
Antonin Dvorak's Slavonic Dance No 1, Op. 46. For each piece there existed two bipolar time series
responses: the arousal response (the amount of arousal or sleepiness expressed by the music) and the
valence response (the amount of happiness or sadness expressed by the music). Each response was
recorded by computer once per second on a scale of -100 to +100 for arousal and valence. Over
seventy participants' data were available for each piece and emotional response dimension.
Hypothesis
Being a Monte Carlo-type study, the hypothesis is assumed to be true, and the data is evaluated
according to how well it fits the hypothesis (cf. Mooney, 1993). In the present study, the hypothesis is
that responses by different participants will be correlated for the same dimension and piece of music,
and in all other conditions they will be uncorrelated (falsification). For example, all subjects' arousal
responses to Morning will be correlated, however, their valence response to Morning or to any other
piece (or dimension) will not be correlated with this arousal response.
Method

Schubert
For the present study I randomly sampled 16 responses from each of the three pieces. The first 200
seconds of each piece was selected. For simplicity and to conserve space, I will make reference to a
subset of five responses from each piece, but the processes and findings discussed apply to the entire
sample of sixteen.
Analysis
The data were factor analysed using a six-factor varimax-rotation solution using SPSS 6.1.1 for the
Macintosh software. The analysis was conducted using the original data sets. A second analysis was
conducted using the first-order differenced data sets. Differencing refers to a changes in responses
rather than absolute responses, a technique used to reduce serial correlation in time series data
(Gottman, 1981). By subtracting a data point from the previous (in time) sample, a first-order
difference series is generated. This series corresponds to the gradient of the original series.
A sample of the factor loadings for each analysis is shown in the Table 1. Only factor loadings above
.4 are considered. Relative to the undifferenced data, the differenced data is much closer to the
expected model stated by the hypothesis. When the data is differenced, it tends to load fairly neatly
onto separate factors grouped by the response dimension (Arousal or Valence) and musical item. For
example, the first-order differenced arousal responses to the Dvorak (responses labelled A_Dxx in
Table 1) load onto factor 3 for each of the five participants shown. When the same data are not
differenced, there is still a grouping of the data but, in the case of the Dvorak arousal data, the loading
occurs on two factors (1 and 3), contrary to the hypothesis. Further, undifferenced data factors are
more frequent and more scattered.
Table 1 Factor loadings for undifferenced (untreated) data and differenced (serial correlation
adjusted) data.
Undifferenced Data Factors Differenced Data Factors
UF 1 UF 2 UF 3 UF 4 UF 5 UF 6 RS DF 1 DF 2 DF 3 DF 4 DF 5 DF 6
-0.42 0.71 A_D11 0.56
-0.58 0.65 A_D12 0.72
-0.46 0.62 A_D15 0.60
-0.41 0.71 A_D17 0.65
-0.55 0.61 A_D18 0.65
0.78 A_GAL
0.80 0.45 A_GAN 0.55
0.71 A_GAI
0.87 A_GCH 0.44
0.75 A_GCO
0.43 0.43 A_RCH 0.59
0.72 A_RDI 0.78
0.53 A_RGR 0.47
-0.63 0.59 A_RJO 0.74
-0.41 0.62 A_RJU 0.80

Schubert
0.46 -0.45 V_D11 -0.51
0.43 V_D12 0.46
-0.46 V_D15
0.56 V_D17 0.44
-0.59 0.48 V_D18 -0.45
0.50 V_GAL 0.42
0.71 V_GAN
0.47 -0.55 V_GAI 0.44
0.44 -0.48 V_GCH
0.72 0.45 V_GCO 0.44
-0.54 -0.41 0.48 V_RCH 0.79
-0.57 V_RDO 0.62
-0.69 0.41 V_RJO 0.92
V_RJU 0.69
0.68 V_RKA 0.74
DF = Undifferenced Factor
DF = Differenced Factor
RS = Response Sample
Code used in Response Sample column:
A_ = Arousal response
V_ = Valence response
D = Dvorak Slavonic Dance
G = Grieg Morning
R = Rodrigo Adagio
The remaining letters/numbers are arbitrary participant codes.
The factor analyses provide evidence that serial correlation is present in undifferenced data, and
suggests that correlations between any pairs of participants is more likely to lead to a misleadingly
high correlation coefficient than when the data are first-order difference transformed. For example, the
undifferenced arousal response to the Dvorak for any particular participant is likely to correlate with
another participant's arousal response to the same piece. This result is fine, however Table 1 also
demonstrates that a significant correlation is also likely to occur with any of the other pieces or
dimensions because there exist reasonably large factor loadings onto factors 1 and 3 for each of the
other examples. This is an incorrect result according to the hypothesis.
The differenced data still posed some problems. Dvorak Valence and Dvorak Arousal load onto the
same factor for all but one of the participants. Further, there appears to be some contradictions with
the sign of the factor loadings which are inconsistent within the rest of the group. For example, factor

Schubert
3 in the â€˜Dvorak Valence' group has two negative loadings and two positive loadings. (Note: Factor
4 contains no loadings probably because of sampling error and because only factor loadings greater
0.4 are shown.) The first problematic finding can be reconciled by a closer examination of the Dvorak
responses. For this piece the valence and arousal were more correlated than for other pieces (meaning
that the hypothesis requires correction, or that different-dimension responses to the same piece should
not have been compared). The second problematic finding could be explained by sampling error. With
such a small sample chosen for analysis (Monte Carlo studies are considerably larger) the effect of
sampling error becomes quite problematic (16 per group in the original study, five shown in Table 1).
The important point, however, is that the differenced responses are considerably better grouped than
the undifferenced responses.
Discussion and Conclusions
The Monte Carlo-type study demonstrates that Pearson product-moment correlations between time
series responses tend to be inflated and misleading. A better result was obtained when each time series
was first-order differenced. Differencing, in this case, removes some of the serial correlation from the
data. The amount of serial correlation in the data can be diagnosed by examining the autcorrelation
function, not discussed here (see Schubert & Dunsmuir, 1999). Another possible method of
controlling the inflation of the correlation coefficient and the possibility of false correlation is to use
more conservative correlation analyses such as Spearman's rho or Kendall's Tau (Howell, 1997).
However, the mathematical derivation of these methods is not based on principles of time series. My
own investigation of correlation coefficient matrices (again a Monte Carlo-like study on the above
data) demonstrated minimal reduction in the number of false correlations when data is undifferenced
(matrices not shown here to conserve space). Consequently, the conclusion drawn from the present
investigation is that it is appropriate to control serial correlation before calculating correlation
coefficients, and that a simple method of controlling serial correlation is to apply a
first-order-difference transformation to the data.
While there are many issues that are of concern to emotion-in-music researchers who adopt
continuous response methodologies, the present investigation and those discussed elsewhere (Beran
and Mazzola, 1999; Schubert, 1999b; Schubert & Dunsmuir, 1999) suggest that there are simple
techniques available for dealing with many of these matters. However, for continuous response
methodology to be a plausible alternative to conventional, more efficient approaches, researchers must
become aware of the issues and the solutions. In particular, the issue of serial correlation needs to
receive more consideration than is currently the case in the literature.
References
Beran, J. & Mazzola, G. (1999). Analysing musical structure and performance - A
statistical approach. Statistical Science, 14, 47-79.
Box, G. E. P. & Jenkins, G. M. (1976). Time series analysis: Forecasting and control
(Rev. ed.). San Francisco: Holden-Day.
Fredrickson, W. E. (1997). Elementary, middle, and high school perceptions of tension in
music. Journal of Research In Music Education, 45, 626-635.
Fredrickson, W. E. (1999). Effect of musical performance on perception of tension in
Gustav Holst's first suite in E-flat. Journal of Research in Music Education, 47, 44-52.
Frego, R. J. D. (1999). Effects of aural and visual conditions on response to perceived
artistic tension in music and dance. Journal of Research in Music Education, 47, 31-43.

Schubert
Gabrielsson, A. & Juslin, P. N. (1996). Emotional expression in music performance:

Between the performer's intention and the listener's experience. Psychology of Music, 24,
68-91.
Gibbons, J. D. (1993). Nonparametric Measures of Association. Newbury Park: Sage.
Gottman, J. M. (1981). Time-series analysis: A comprehensive introduction for social
scientists. Cambridge: Cambridge University Press.
Hamilton, J. D. (1994). Time series analysis. Princeton, New Jersey: Princeton
University Press.
Heinlein, C. P. (1928). The affective characters of the major and minor modes in music.
Journal of Comparative Psychology, 8, 101-142.
Howell, D. C. (1997). Statistical methods for psychology (4th ed.). Belmont, CA:
Duxbury.
Krumhansl, C. L. (1996). A perceptual analysis of Mozarts Piano Sonata K.282 -
segmentation, tension, and musical ideas. Music Perception, 13, 401-432.
Krumhansl, C. L. (1997). An exploratory study of musical emotions and physiology.
Canadian Journal of Experimental Psychology, 51, 336-352.
Krumhansl, C. L. (1998). Topic in music: An empirical study of memorability, openness,
and emotion in Mozart's String Quintet in C major and Beethoven's String Quartet in A
Minor. Music Perception, 16, 119-134.
Madsen, C. K. & Fredrickson, W. E. (1993). The experience of musical tension: A
replication of Nielsen's research using the continuous response digital interface. Journal
of Music Therapy, 30, 46-63.
Madsen, C. K. (1997). Emotional response to music as measured by the two-dimensional
CRDI. Journal of Music Therapy, 34, 187-199.
Madsen, C. K. (1998). Emotion versus tension in Haydn's Symphony No. 104 as
measured by the two-dimensional continuous response digital interface. Journal of
Research In Music Education, 46, 546-554.
Madsen, C. K., Brittin, R. V. & Capperella-Sheldon, D. A. (1993). An empirical
investigation of the aesthetic response to music. Journal of Research In Music Education,
41, 57-69.
Mooney, C. Z. (1997). Monte Carlo Simulation. Thousand Oaks: Sage.
Nielsen, F. V. (1983). Oplevelse af musikalsk spænding (the experience of musical
tension). Akademisk Forlag, Copenhagen.
Schubert, E. & Dunsmuir, W. (1999). Regression modelling continuous data in music
psychology. In Suk Won Yi (Ed.), Music, Mind, and Science (pp. 298-352). Seoul
National University Press.
Schubert, E. (in press). Continuous measurement of self-report emotional response to
music. In Patrik Juslin and John Sloboda (Eds.), Music and Emotion: Theory and

Schubert
Research. Oxford University Press.

Schubert, E. (1999a). Measuring emotion continuously: Validity and reliability of the
two-dimensional emotion-space. Australian Journal of Psychology, 51, 154-165.
Schubert, E. (1999b). Measurement and Time Series Analysis of Emotion in music.
Unpublished doctoral dissertation, University of New South Wales, Sydney.
Stanovich, K. E. (1998). How to Think Straight about Psychology (5th ed.). New York:
Longman.
back to index

A Sensory-Motor Theory IV: Vestibular responses to music
A Sensory-Motor Theory IV: Vestibular responses to music.
N.P.McAngus Todd, F.W. Cody and J. Banks. Department of Psychology
and School of Biological Sciences, University of Manchester, Manchester,
M13 9PL.
1. Background
Since the question "Is hearing all cochlear?" was posed some seven decades past
(Tait, 1932), there has amassed considerable evidence that the sacculus, an
organ of hearing in fish (Popper et al, 1982), has retained some acoustic
sensitivity throughout phylogeny (McCue and Guinan, 1995). In humans, myogenic
vestibular evoked potentials (MVEP) may be obtained from motorneurones
inervated by the vestibulo-spinal tract, particularly from the cervical region
of the spinal cord (Ferber-Viart et al, 1999). MVEP has been studied
principally as non-invasive clinical tool for evaluation of normal otolith
vestibular function, since traditional nystagmographic methods only assess
canal function. However, we have been interested in using MVEP as a window on
what possible 'auditory' function acoustic sensitivity sacculus may have.
2. Aim
To describe a series of experiments designed to investigate the role of

acoustic sensitivity of the sacculus in the perception of sounds.
(1) Effect of requency,
(2) threshold of response to dance music and
(3) effect of stimulus duration.
3. Method
MVEP was obtained in response to:

1. 110 dB tone pips with frequencies ranging from 100 Hz to 3200 Hz;
2. a cycle of techno music at a range of intensities from 90 - 120 dB SPL and
3. 100 Hz click trains and tones bursts.
4. Results.
1. MVEP has a frequency tuning (Todd, Cody and Banks, in press), with a best
frequency between 300 Hz - 350 Hz and a band-width of about 3 octaves.
This response is consistent with it being saccularly mediated,
particularly since we were able to model the selectivity by means of a
mass-spring-damper system with a Q of about 0.7.
2. MVEP can be obtained to 'natural' acoustic stimuli (Todd and Cody, 2000),
such as dance music, above about 90 dB SPL.
3. MVEP can be obtained to continuous sounds (Todd, in preparation), i.e. a
frequency-following response may be obtained with longer duration stimuli,
suggesting that acoustically evoked phase-locking is taking place in the
saccular nerve giving rise to the adaptation and inhibition characteristic
file:///g|/Wed/Todd4.htm (1 of 2) [18/07/2000 00:39:48]

A Sensory-Motor Theory IV: Vestibular responses to music
of normal sensory transduction.
5. Conclusions
Given the above then, there are a number of natural stimuli, including vocal
and musical sounds, where saccular acoustic sensitivity may play a role in
perception. Such sensations could be vestibular or 'auditory' since there
exists anatomical evidence of a saccular projection to the cochlear nucleus
(Burian et al, 1989). Further, such a mechanism could interact with a more
general sensory-motor mechanism (Todd, 1999) involved in rhythmic processing,
particularly given the input that the vestibular system has to the cerebellum.
Burian, M., Gstoettner, W. and Zundritsch, R. Arch Otolaryngol, 1989; 246:

238-241.
Ferber-Viart. C., Ducleax, R., Colleaux, B. and Dubreuil, C. Acta Otolaryngol

(Stockh.), 1997; 117: 472-481.
McCue, M. and Guinan, J. J Neurophysiol, 1995; 74(4): 1563-1572.
Popper, A. Platt, C. and Saidal, W. TINS, 1982; August: 276-80.
Tait, J. Ann Otol Rhinol Laryngol, 1932; 41: 681-704.
Todd, N.P.McAngus (1999) Motion in Music: A neurobiological perspective. Music

Perception. 17(1).
Todd, N.P.McAngus and Cody, F.W. (2000).Vestibular responses to loud dance

music: A physiological basis for the "rock and roll threshold"? J Acoust Soc
Am. 104(1).
Todd, N.P.McAngus , Cody, F.W. and Banks, J. A saccular origin of frequency

tuning in myogenic vestibular evoked potentials?: Implications for human
responses to loud sounds. Hear Res. (in press).
Back to index
file:///g|/Wed/Todd4.htm (2 of 2) [18/07/2000 00:39:48]

Charles Byrne
DEVELOPING PROCESS BASED INVENTING ACTIVITIES: A SPIDER'S WEB OF INTRIGUE AND
CREATIVITY
Mr Charles Byrne
c.g.byrne@strath.ac.uk
Background:
The Scottish system of examination and assessment of the secondary school music
curriculum places emphasis on the certification of children's achievements in
musical invention which may be to the detriment of developing lifelong learning
skills, interests and enthusiasm. World wide web based materials have been
created which provide teachers and pupils the opportunity to develop creative
music making through exploration and experimentation (Spider's Web Composing
Lessons).
Aims:
This paper sets out the theoretical context for the Spider's Web Composing
lessons, the suggested methodologies related to self-directed learning and
reviews the progress to date in the creation of materials, identifying further
questions and possible strategies for developing critical thinking skills
through musical composition and improvisation.
Main contributions:
In order to bring about change in the way that composing and improvising
activities, are conceived within the music curriculum, more feedback is needed
from users of the world wide web based materials. Both formal and informal
evaluative techniques are being used to collect information on teacher, student
teacher and pupil views and attitudes to process based activities in composing
and improvising.
Implications:
Little evidence is available regarding the use of the spider's web composing
lessons other than statistics on the number of 'downloads' from the web server.
Evaluations and feedback will be reviewed in order to shape and inform the
development of musical activities within a 'knowledge unrestricted problem
environment'.
Back to index
file:///g|/Wed/Byrne.htm [18/07/2000 00:39:50]

Fineberg
SIMILARITY JUDGEMENTS FOR HARMONIC PHRASE UNITS: THE RELEVANCE OF HARMONIC

FEATURES AND SUBTLE RHYTHMIC MANIPULATIONS
Ms. Ioana Apetroaia Fineberg
fineberg@music.columbia.edu
Background:
The literature in the field of music cognition has shown that music is
processed as units of 'discourse' unfolding in time, forming larger scale
structures of various levels of complexity. Grouping into larger scale
structures might rely on computation of similarity or difference between units.
Studies on similarity between musical units have, so far, concerned mainly
melodic stimuli.
Aims:
This study investigates features which are used in similarity judgments for
pairs of harmonic phrase units. Stimuli were constructed by varying three
music-theoretically distinct harmonic dimensions and a subtle aspect of rhythm
for a two-measures excerpt from a Chopin Prelude. Stimuli were written in a
number of different musical idioms.
method:
32 subjects with three different levels of musical training (novice to expert)

rated the similarity of pairs of stimuli. The data obtained was analyzed using
a Multidimensional-Scaling algorithm with latent class analysis.
Results:
The algorithm yielded a two-dimensional solution with specificities (features

unique to individual stimuli), and three classes of subjects. The stimuli
distributed coherently along the two axes which were interpreted to be: a
'global harmony' dimension and a temporal dimension. No direct relation was
found between the three classes and musical expertise.
Conclusions:
1) When comparing harmonic phrases, harmonic contour, roughness contour and

chord type are all aspects of a 'global harmony' computation. 2) The temporal
dimension, even when subtly manipulated, is extremely salient for some
subjects. 3) Subjects adopted three different listening strategies depending on
how much they relied on each of the dimensions manipulated and not on their
expertise. 4) The results were pertinent to all the stimuli, suggesting that
the same dimensions are used across musical idioms.
file:///g|/Wed/Fineberg.htm (1 of 2) [18/07/2000 00:39:50]

Fineberg
Back to index
file:///g|/Wed/Fineberg.htm (2 of 2) [18/07/2000 00:39:50]

TIM HORTON
MUSICAL MEANING AND THEORIES OF TONAL STRUCTURE
MR TIM HORTON
tjh20@cam.ac.uk
Background:
Although there have been various theories of how primitive structural units in
music might be said to have meaning, such theories are generally unable to deal
with the composition of the meaning components they identify into complex
meaning structures. These theories thus remain somewhat superficial analogies
to conceptions of meaning in other domains, such as natural language.
Aims:
Various formal parallels between tonal structure and linguistic syntax will be
examined. It will be suggested that the functional properties of tonal harmony
play a role in the domain of tonal music analogous to the semantic properties
of natural language.
Main contributions:
It will be proposed that, like semantic function in natural language, the

property of harmonic function in tonal music provides the criteria which must
form the foundation of any theory of meaning, namely, (i) the factors that
govern the covariance of structural units within a domain, and (ii) the factors
that govern the causal role of primitive structural units in complex
structures.
Thus, it will be argued that it is the functional properties of tonal harmony

that motivate the distributional criteria relevant to establishing the
syntactic categories and relations posited by tonal theory. Further, it will be
demonstrated that tonal structures exhibit functional compositionality, a
property that gives rise to the generativity of information structures within a
particular domain.
Implications:
The compositionality of tonal functions has far-reaching consequences for the

type of theory required to describe pitch structure in tonal music, an
observation that entails a critique of the foundations of current tonal theory.
Back to index
file:///g|/Wed/Horton.htm [18/07/2000 00:39:51]

Implied Polyphony in the Unaccompanied String Works of J.S. Bach:
Proceedings paper

A Rule System for Discerning Melodic Strata
Stacey Davis
Northwestern University School of Music
Research into the existing writings on J.S. Bach's unaccompanied string works reveals generally unenlightening
attitudes and comments. For the most part, these writings raise this music up to an almost mythical status,
containing phrases such as "unrivaled wealth", "infinite inventiveness", "incomparable challenge", and
"unsurpassed masterpieces". As one author states,
A monumental challenge to violinists for over two hundred and fifty years, Bach's 'Six Solos for
Violin without Bass Accompaniment' . . . happily show no signs of age and offer no hint whatever
that their quasi-hypnotic sway over players and public will ever diminish. The reason is not hard to
seek. Such music, and by such a master, would command and rivet attention no matter how it might
be scored; but scored as it is for an instrument with but four strings (and the smallest member of its
family at that), this set of sonatas and partitas calls for a mastery of technique and a gift of sheer
virtuosity the like of which had never been known before Bach's time, and has rarely been exceeded
since (Stevens 221).
This type of description certainly reveals the reverence that various writers have for this music, with much talk of
their grand reputation, technical demands, and unique place in the string repertoire. Rarely, however, do they go
on to describe in any depth the specific structural characteristics that might have brought about this great esteem
or the influence that these features might have on both an expressive performance and a perceptual experience.
This paper will respond to this lack of adequate research by addressing one distinct structural feature of this music
through the use of both analytical and experimental research methods, all as a means of gaining greater insights
into the unique characteristics of these works and acquiring a more comprehensive understanding of the ways in
which this structure influences both performance and perception.
One of the most striking and oft discussed characteristics of these pieces is their polyphonic structure. Although
there are certainly movements within these sonatas that reflect the predilection of 18th century German
composers for multiple stops in solo string music, the majority of these pieces are almost completely
monophonic. Still, countless performers, pedagogues, and theorists maintain that there is a sense of polyphony in
these movements, and that Bach created counterpoint by outlining multiple voices within a single instrumental
line. In a 1968 dissertation entitled Heinrich Biber and the Seventeenth Century Violin, Elias Dann made the
following reference to Bach's unaccompanied violin pieces.
Any superficial examination of these solos, the most polyphonic pieces ever written for the violin, will reveal so
many single notes rather than double-stops or chords that the musician unacquainted with these works (a
hypothetical one, if necessary), may well wonder where the polyphony is to be found . . . If these movements, in
which only one tone at a time is sounded, are to be considered polyphonic, it becomes obvious immediately that
no usual definition of polyphony, predicated upon the combination of several sustained parts, will suffice. Any
attempt at a coherent analysis from the standpoint of melody alone soon raises more questions than it can possibly
answer. Careful study would seem to suggest that either these movements are polyphonic or they cannot be
explained at all (196-97).
From this and many other similar comments, it is apparent that a general consensus about the existence of this
implied polyphony has been reached. There is, however, very little explanation of how this monophonic music is
actually parsed into these different voices. The few explanations that do exist typically only include a short
file:///g|/Wed/Davis.htm (1 of 8) [18/07/2000 00:39:54]

excerpt of music (most often taken from some Bach instrumental piece), a brief definition of what is typically
called "compound melody," and a diagram showing the melodic line separated into multiple voices. But there is
rarely any description of how this separation was determined or any consistent statement about which specific
musical features contributed to that particular parsing of the melodic line (cf. Piston, Kennan).
General principles of auditory stream segregation may help to explain some cases of this type of linear
polyphony. In many ways, these principles coincide with the fundamental claim of Gestalt psychology. As
Lerdahl and Jackendoff describe, this claim is that "perception, like other mental activity, is a dynamic process of
organization, in which all elements of the perceptual field may be implicated in the organization of any particular
part" (Lerdahl and Jackendoff 303). Two Gestalt principles that seem especially applicable to auditory perception
are proximity and similarity. Proximity is basically the idea that listeners tend to perceptually group elements
together that are closer to one another, while similarity refers to the tendency to group elements of similar shape
or other likeness together. According to Albert Bregman, auditory stream segregation seems to follow most
directly from the Gestalt law of grouping by proximity (20).
In audition, the two most important influences on the segregation of tones by proximity are the rate of the
sequence and the frequency separation of different elements within the sequence (Bregman 643). Separations of
this kind represent a bottom-up type of processing, with emphasis on the more detailed, note-to-note level of the
music. A grouping determined according to similarity, however, can work on a variety of different levels.
Examples in music might include the grouping of instrumental sounds by timbre (low level processing) and by
motivic parallelism (larger level processing).
In creating melodies, composers have long realized the influence that these grouping principles have on
perceptual coherence, especially the repetition rate and the frequency separation of tones. For instance, various
studies have shown that much Western music is dominated by small melodic intervals, thereby reflecting the idea
that notes closer together in frequency tend to produce stronger perceptual groupings (Ortmann 7). Even though
many composers seek to achieve this melodic coherence by avoiding any extended use of those features which
are apt to create segregation, others choose to purposely maximize the tendency for tone sequences to break apart
(given a sufficient degree of frequency separation, for instance). In an interesting reference to the very style of
music that Bach's unaccompanied string pieces represent, Bregman states,
Rapid alternations of high and low tones are sometimes found in music, but composers are aware that such
alternations segregate the low notes from the high. Transitions between high and low registers were used by the
composers of the Baroque period to create compound melodic lines - the impression that a single instruments,
such as a violin or flute, was playing more than one line of melody at the same time. These alternations were not
fast enough to cause compulsory segregation of the pitch ranges, so the experience was ambiguous between one
and two streams. Perhaps this was why it was interesting (675-76).
The previously mentioned dissertation by Elias Dann contains one of the few published attempts to separate one
of Bach's monophonic movements into multiple voices. Dann bases his analysis on the assumption that the
melodic function of each individual tone in this music is dependent on the tones that surround it, its rhythmic
placement within a measure or phrase, and whether its range ever crosses into the frequency space of another
voice (199). He also points out that each individual tone has a dual role, first as part of a single melodic line
simply because the tones actually are heard one after the other and then as a member of one of the many
polyphonic lines that can be followed throughout the course of the piece (212-13). Dann then provides his
interpretation of how this music could be separated into multiple voices, using the opening four measures of the
Sarabande Double from Bach's B minor Partita as his material.
As can be seen in Example 1, Dann separates this brief excerpt of music into five different voices, with some
instances of doubling reflecting times when two different lines have coincided or when a single tone functions as
part of more than one polyphonic strand. At least initially, it seems that Dann's analysis is mainly determined by
his interpretation of the opening three tones, a simple root position arpeggiation of the tonic triad. Although he
does acknowledge that these three tones could be heard as a single entity due to the influence of harmony, he
ultimately chooses to interpret them as "carving out an area of musical space in which they will start operation as
three individual voices" (Dann 215). He then goes on to explain in detail the different lines that he sees emerging
out of just the first tone. While it is seen as a sustained note in one voice that remains constant until the leading

tone enters on the downbeat of the third measure, it is also considered the beginning point of both an ascending
and a descending voice that he marks in the diagram with a : and an 6 , respectively.
Example 1.
While further explaining his particular analysis, Dann states the following:
In a polyphonic complex such as the one under consideration, no system of analysis can be expected to present
more than a partial picture of the various voices and their interrelationship. The following analysis does not
presume to be the only possible one, nor even to be entirely correct; it merely attempts to illustrate one way in
which the inner ear may gather together the threads implicit in this piece of one-line polyphony . . . the five staves
have been chosen for convenience, to bring out certain polyphonic relationships and not to argue that there are
exactly five voices to be heard (215).
This brings up the important point of the perceptual relevance of this type of analysis and also returns us to issues
of auditory stream segregation. Although Dann's interpretation does provide an interesting perspective on the
implied polyphony based mostly on the harmonic and rhythmic functions of each individual tone, he admittedly
has not taken into account how this passage might actually be heard by both performers and listeners.

One of the first issues that arises when principles of auditory stream segregation are applied to Dann's analysis is
the way he parses the opening b minor arpeggiation into three different voices. In a 1975 study, Leo van Noorden
presented subjects with an alternation of two tones in varying rates of repetition, with one tone remaining fixed
and the other tone moving to various frequency differences. The subjects' task was to indicate the points at which
the frequency separation became too large to hear one coherent stream and too small for separate streams to be
perceived. Van Noorden essentially concluded that "the degree of association varies inversely as the pitch
difference, or pitch distance", with streams played at high rates of repetition being heard as a single, coherent
unity when the frequency separation was less than five semitones (13). It, therefore, seems unlikely that the
opening arpeggiation of the Bach Sarabande Double would actually be perceptually segregated into three
different strands based on frequency separations of only three or four semitones.
Another issue concerns the fact that Dann separated this brief passage into five total voices. In a 1989 study
whose aim was to determine the number of simultaneously sounding polyphonic voices that a listener could
identify and count, David Huron found that a threshold seems to exist at three voices. One subject even
commented after the study that he found himself using two different techniques for determining the number of
concurrent voices. The subject felt very confident in his ability to provide an accurate count when there was a
small number of voices, but instead found himself comparing the density of surrounding textures and simply
estimating the number of voices when the total number was greater than three. As Huron then concluded,
It appears that in the perceptual denumeration of sounds of homogeneous timbre, listeners do not follow the
arithmetic sequence: one, two, three, four, etc. to infinity, but proceed in a manner similar to the counting
language of the San bushmen: auditorily we may count: one, two, three, many - where one might admit only
gradation of "manyness" rather than definite discrete values (378).
After taking into consideration both the lack of extensive research into this issue and the apparent disregard for
fundamental perceptual tendencies, it is clear that a more detailed set of guidelines is necessary in order to shed
greater light on this idea of implied polyphony. For this reason, a simple rule system was created to provide a
concrete and consistent method for parsing these solo instrumental lines into multiple voices. This rule system
focuses on bottom-up conditions, or note-to-note interactions, and intentionally does not take into account every
possible musical parameter. For this reason, some voice changes will be blatant or obvious, while others will be
harder to distinguish or perhaps not present at all. Some degree of such ambiguity certainly seems appropriate
since even the most superficial glance at the movements consisting of mostly chords and multiple stops shows
that Bach did not consistently maintain the same number of voices throughout a single piece. There are numerous
instances were one voice seems to be suspended while other voices move around it, with that original voice only
later reappearing for resolution and further melodic motion. It, thus, seems reasonable to assume that similar
compositional techniques were applied in the monophonic movements. Again, this is something that Bregman
recognized as stemming from fundamental principles of auditory stream segregation. As he stated,
The alternation of registers in a single instrument tends to produce a more ambiguous percept in which either the
two separate lines or the entire sequence of tones can be followed . . . It is not certain that the composers who
used this technique would have called for such an unambiguous separation of musical lines even if the players
could have achieved it. It is likely that the technique was not used simply as a way of getting two instruments for
the price of one, but as a way of creating an interesting experience for the listener by providing two alternative
organizations (464-65).
In this rule system, weights are assigned to the transitions between different pitches based on the degree to which
three basic features are present. These weights essentially only signify the extent to which these features might act
in conjunction with one another at any single point of transition to suggest a clearer or more obvious change of
voice. Transitions whose weight crosses a threshold of four points are seen to signal a change of voice, thereby
generally ensuring that more than one of the following rules would be enforced. In order to facilitate the analysis
of this entire repertoire, a computer program representing this rule system was created using the Humdrum
Toolkit (Huron 1999). This program works from a file of each original score with all pitches translated into a
succession of ascending and descending intervals.
Rule 1: Interval Size
Given a sequence of four notes (n1, n2, n3, n4), let Int2 = n3-n2.

If Int2 > 3, then score = Int2-3.

This first, and most important, rule centers around the parameter of pitch distance or interval size. As has already
been discussed, it is assumed that larger intervals are more likely to signal a change of voice than smaller
intervals. The perfect fourth was chosen as the dividing point between small and large intervals, with the
previously cited van Noorden study providing collateral that there is some tendency for streams to divide
perceptually at this interval. Points are then assigned according to the actual size of the interval, with points
increasing by one with each expansion in diatonic interval size. This accounts for the fact that the voice changes
become more obvious as the interval gets larger. In terms of this rule system, the presence of a large interval is
absolutely required in order to even suggest a change of voice and to move on to the subsequent two rules.
Rule 2: Change of Contour Direction
Let Dir1 = the contour of Int1 (same applies for the contour of Int2 and Int3)
If Dir1 = "up" and Dir2 = "dn" (or vice versa), then score = 1.
If Dir2 = "up" and Dir3 = "dn" (or vice versa), then score = 1.
The second rule addresses the issue of contour, and more specifically whether or not one or both of the two notes
that make up the large interval mark the point where the contour moves from ascending to descending (or vice
versa). In other words, this rule accounts for the places where the sign of the slope of the contour changes
between two successive intervals. One point is assigned per instance of a change in contour surrounding a given
interval, thus making two points possible for this rule.
Rule 3: Influence of Conjunct Motion
Let conjunct = the number of consecutive instances of conjunct motion
If Int2 = 2, then conjunct+ = 1. If Int2 > 3, then score = conjunct and conjunct = 0.
The third and final rule considers the degree to which any given large interval is surrounded by conjunct motion
on both sides. This takes into account the continuous or stepwise nature of the music, which contrasts with the
discontinuous nature provided by the large leaps. It also accounts for the fact that two stepwise melodic lines
separated by a large interval would more easily be interpreted as two different voices. One point is added for each
instance of conjunct motion that precedes or follows a large interval, with directional changes still qualifying for
points as long as the conjunct motion is not interrupted.
Another important issue in this repertoire is that of melodic continuity -- the question of whether or not a single
voice can be followed throughout an entire piece. Although a melodic continuity rule has not yet been fully
integrated into this simple model, this rule will eventually attempt to follow the path of each individual voice and
connect every pitch with either a currently existing voice or a previously heard voice that was momentarily latent.
The computer-assisted implementation of this rule system is only a first step in the analysis of this music.
Ultimately, this model serves as an aid in interpreting the results of various analytical and perceptual tasks that
examine the degree to which listeners and performers are able to recognize this polyphonic structure. It basically
provides a prediction or hypothesis against which the results of these activities can be compared, thereby also
testing the validity of this particular set of rules.
One such activity required a class of fifteen sophomore music majors to analyze the implied polyphony in the
Allemande from Bach's d minor Partita for Unaccompanied Violin. This is an almost entirely monophonic
movement, with actual chords only occurring at the beginning and ending of each half of the piece. The students'
task was to separate this single strand of notes into multiple voices and to then provide a detailed description of
the musical features that most significantly influenced their decisions. In order to make their choices absolutely
explicit, the students were required to reorchestrate the piece for multiple instruments (corresponding to the
number of voices that they heard), which resulted in a single staff of music representing each implied voice.
Since the melodic continuity rule has not yet been fully implemented, the output of the rule system only provides
information regarding the degree to which any interval might suggest a change of voice of some kind. Therefore,

the first step in analyzing the results of this activity was to simply identify the specific places in the music where
students most often notated a change of voice, regardless of which voice they placed that passage into within the
overall texture. A quantitative analysis of the students orchestrations was then performed by marking each note in
the score with the number of students that signified that note as the beginning of a new voice. These numbers can
be found below each line of the score provided in Example 2. The upper row of numbers then represents the
output of the model described above, with each score reflecting the degree to which interval size, contour, and
conjunct motion combine to suggest a change of voice. The highest possible score for the students responses is
fifteen, while there is no upper limit for the output of the rule system.
Example 2.
Since these two sets of scores are measuring two fundamentally different things, their significance does not
ultimately lie in their actual numerical values. The numbers, instead, simply reflect the relative strength of any
potential change of voice. In general, then, a comparison of these strengths shows a fairly high degree of
correlation between the model's output and the students' responses. This correlation is most easily seen in the
notes which have been circled in the score, which show places where both the model and at least 10 out of 15
students indicated an obvious change of voice. Due to the large leaps, the dramatic changes in register, and the
instances of stepwise motion, the clearest example of this correlation is found in measures six and seven of the
Allemande, where almost all of the fifteen students consistently indicated the same changes of voice that the rule
system strongly suggests.
There are, however, places in the score which exhibit a much greater degree of ambiguity, where there is more
discrepancy between the output of the model and the responses of the students. This is mostly due to the fact that
the rule system described here essentially only analyzes note-to-note interactions in order to determine the extent
to which bottom-up processing can adequately explain the implied polyphony in this type of music. Many other
larger level structures and processes certainly influence the tendency for this music to separate into multiple
voices, though. Based on the students' responses, rhythmic placement, motivic repetition, timbre, and articulation
are some of the most powerful of these influences.

In measures four and five, for example, a majority of the students only indicated changes of voice in three places
(the circled C, D, and E in Example 2). While these results certainly reflect the fact that larger intervals are more
likely to suggest a change of voice than smaller intervals, they also reflect an awareness of articulation changes
and motivic parallelism. It is evident that the students were recognizing the shift between slurred and separately
bowed notes, as well as the repetition of a two beat motivic pattern (which is misaligned by one sixteenth note
with the meter). Even though the model indicates additional voice changes inside this repeated pattern, fewer
students chose to include them as part of their analysis. Perhaps this is because these changes are only influenced
by the note-to-note details that the model addresses, not by either one of the larger level groupings that the
students were taking into consideration.
One instance of the influence of meter then occurs in the sequence which begins in measure eleven. A large
majority of the students consistently noticed that the first note of each leg of this sequence creates a descending
melodic line and chose to signal a change of voice prior to each of those notes. The rule system also marks the
same points as changes of voice, mostly because of the descending sevenths and the change of contour that occurs
between each leg. It is interesting to note, however, that there is a corresponding descending melodic line
occurring in the last note of each leg of the sequence. Perhaps this line was not recognized by as many students as
a change of voice because it is not strengthened by a metric accent, thereby supporting Leonard Meyer's idea that
an accent is "a stimulus (in a series of stimuli) which is marked for consciousness in some way" (8).
Although these larger level issues have certainly been recognized and addressed to some degree, their
formalization as part of the model set forth here is still forthcoming. Perhaps the fundamental value of this model
as it currently stands thus lies in its contrast to the ways that this music is typically discussed, especially by
performers and pedagogues. As a final example, noted violinist and pedagogue Yehudi Menuhin stated,
Even though there can be no allowance in the music of Bach for arbitrary effects, personal indulgence, or changes
of direction, as there are indeed in the romantic literature, there is every justification for a flexibility, a fluidity of
line, a play of accent, colour and stress within a given series of notes, but only of course when these are justified
by a sensitive and disciplined musical intuition and by an intellectual awareness . . . For instance, although many
of Bach's movements for solo violin and particularly for 'cello are written in one voice, that is without
counterpoint and harmony, the counterpoint and the harmony are in fact implied and every effort must be made to
bring the different voices out clearly, even though there is never more than one voice sounding at a time (119).
Despite the fact that this statement does make reference to both the existence of this implied polyphony and the
need for flexibility and sensitivity in its performance, Menuhin does not actually describe how to identify this
polyphony or to "bring the different voices out clearly." Perhaps this phenomenon is related in some way to basic
ideas about automaticity, which suggest that someone who has mastered any technique or skill often has a
difficult time adequately describing what he is doing and instead prefers to teach by demonstration or imitation. It
is possible that many musicians have developed their skill to the extent that they no longer consciously think
about what might be considered the fundamental technical aspects of the music, instead turning their thoughts to
larger level groupings or phrasings. In turn, this results is an underestimation of the power of these note-to-note
details that are the very focus of the rule system presented here.
For this reason, it is important to have some method which can assist in clarifying the issue, thereby helping to
identify those features of the music that might otherwise be overlooked, taken for granted, or left unexplained.
Although it is unlikely that any single system could fully account for all aspects of this intensely complex music,
this study has shown that a model based on a small number of simple guidelines is actually a fairly powerful
indicator of how musicians might interpret the implied polyphony in this piece. As performers and pedagogues
become more aware of these specific polyphonic potentials, they also become more conscious of the expressive
potential in this music. Ultimately, an informed performer has utmost freedom to either let the implied polyphony
emerge on its own or to provide added emphasis through the use of a variety of expressive techniques, thereby
making this structure more perceptually relevant to audiences of all kinds.
REFERENCES
Bregman, A. S. (1994) Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT

Press.
Cooper, G. , Meyer, L. (1960) The Rhythmic Structure of Music. Chicago: University of Chicago Press.
Dann, E. (1968) Heinrich Biber and the Seventeenth Century Violin. Ph.D. Diss., Columbia University.
Huron, D. (1989) Voice Denumerability in Polyphonic Music of Homogenous Timbres. Music Perception, 6 (4),
361-382.
Huron, D. (1999). Music Research Using Humdrum: A User's Guide. Stanford, California: Center for Computer
Assisted Research in the Humanities.
Kennan, K. 1987 Counterpoint: Based on Eighteenth-Century Practice, 3rd ed. Englewood Cliffs, N. J.:
Prentice-Hall.
Lerdahl, F., Jackendoff, R. (1983) A Generative Theory of Tonal Music . Cambridge, MA: MIT Press.
Menuhin, Y. (1976) The Violin. New York: Schirmer Books.
Ortmann, O. (1926) On the melodic relativity of tones. Psychological Monographs, 35, Whole No. 162.
Piston, W. (1947) Counterpoint. New York: W. W. Norton & Company, Inc.
Stevens, D. (1976) "Bach's Sonatas and Partitas for Violin." In Violin and Viola. New York: Schirmer Books.
Van Noorden, L.P.A.S. (1975) Temporal Coherence in the Perception of Tone Sequences. Unpublished doctoral
dissertation, Eindhoven University of Technology.
Back to index

EMOTION RECOGNITION AND EMOTION INDUCTION:
EMOTION RECOGNITION AND EMOTION INDUCTION:
PARALLELS AND CONTRASTS
Alf Gabrielsson
Uppsala University
Box 1225
SE - 751 42 Uppsala
SWEDEN
Background:
Studies on emotion in music can be divided into two main categories: studies focusing on recognition of emotional
expression in music and studies focusing on listeners' own emotional response to the music (induced emotions). The
distinction between these categories is not always upheld in investigations.
Aims:
This paper aims at comparing and discussing methods and results of studies in both categories mentioned above with
regard to parallels and contrasts.
Main Contribution:
On the basis of extensive reviews of empirical studies, various classifications of both recognised and induced emotions
have been made. Classifications may be data-driven, theory-driven, or both. In either case, a basic classification into
aspects of valence and arousal seems feasible. However, it has to be supplemented in various ways to do reasonable
justice to the manifold and the subtleties of emotions recognised in or induced by music. Results depend to a great deal
on the music used, voluntarily or involuntarily chosen, as well as on the methods for obtaining and analysing data.
Furthermore, results are by necessity influenced by listener and situation characteristics.
Implications:
Investigations and discussions on emotion in music would benefit from a clearer distinction between recognised and
induced emotions. Although this distinction may be somewhat blurred, awareness thereof should help in interpreting
results, reveal inconsistencies, and contribute to increasing both internal and external validity of studies on emotion in
music.
Back to index
file:///g|/Wed/Gabriels.htm [18/07/2000 00:39:55]

Intro
Proceedings paper
Choreographic Cognition: Composing Time and Space
Kate Stevens1, Shirley McKechnie2, Stephen Malloch1 and Agnes Petocz1

1 Department of Psychology & Macarthur Auditory Research Centre, Sydney
University of Western Sydney, Australia
2 School
of Dance
Victorian College of the Arts, Melbourne, Australia
In the creation, performance and appreciation of contemporary dance we find a microcosm of
cognition. Contemporary dance is communicative and expressive; it is visual, spatial, temporal,
kinaesthetic; it is sensual, affective, evocative, dynamic and rhythmic. The aim of this paper is to
examine the degree to which contemporary psychological theory can explain the complex processes
that mediate creation and performance of contemporary dance. These perceptual, cognitive and
emotional processes are termed collectively for the first time choreographic cognition. A review of
literature and experimental studies finds that psychological investigations have dealt with dance as
discrete movements or steps, and questions of memory and imagery have been unnecessarily confined
to codes that are verbal or visual. We propose there is much more. Movement through space is
continuous, it flows, transitions are the conveyors of information and form. In an effort to capture the
temporal and spatial characteristics, we outline a new theoretical approach that conceptualises
choreographic cognition as a dynamical system and propose that the underlying code is neither visual
nor verbal but an abstraction of time and movement represented in brain-based rhythms or
oscillations. We also pose new research questions and suggest ways that these may begin to be
addressed.
Contemporary dance is defined here as a work in which the major medium is movement, deliberately
and systematically cultivated for its own sake, with the aim of achieving a work of art. It shares with
other art forms the possibility of being viewed either as non-representational/non-symbolic (typically
termed "formalist" in aesthetic theory), or of being representational or symbolic in some sense.
Regardless of the approach that is adopted, time, space and motion are the media for choreographic
cognition.
Creating Contemporary Dance - The Choreographer
An obvious branch of cognitive psychology that may provide some insight into choreographic
cognition involves theories of creativity and attempts to explain processes and circumstances that give
rise to innovative thought. Creativity is almost universally defined in terms of novelty: a creative act,
idea, solution, artistic form, or product, is novel and original, and incorporates substantial new ideas
not easily derived from earlier work (Johnson-Laird, 1988; Wales & Thornton, 1994). Simonton
(1994) suggests that creativity is marked by an unending search for the new. Boden (1996) alludes to
the apparent difficulties inherent in studying the creative process, and refers to the perceived mystery
that surrounds it: "how could science possibly explain fundamental novelties?" (p. 75). This almost
file:///g|/Wed/Stevens.htm (1 of 9) [18/07/2000 00:39:57]

Intro
mystical aura that surrounds the concept of creativity can be said to exist in the absence of a complete
theoretical explanation of the phenomenon (Finke, Ward & Smith, 1996). However, Boden suggests
that creativity may be no more mysterious than other unconscious processes and systems, such as
vision, language, and commonsense reasoning (p. 75). The essence of the novelty in artistic creativity
may be metaphorical thinking. All humans are likely to use such thinking, and perhaps people who are
creative, such as artists and scientists, simply use it more often or to more focussed purposes
(McKechnie, 1996). Alongside novelty, unconscious processes and metaphor, another element
common to a number of accounts of creative thinking is the juxtaposition of two seemingly
contradictory ideas. Rothenberg (1994) refers to this as a Janusian process: the ability to hold two
competing, contradictory ideas, images or concepts in mind simultaneously. He proposes that
creativity is the synthesis or coalescence of these. Koestler (1964) pointed to useful distinctions
between creativity as it appears in humour (the collision of matrices or planes of thought), in science
(integration), and in art (analogy). More recent accounts of creativity emphasize processes of problem
solving and problem finding (Kay, 1994; Wakefield, 1994). Putting these notions together, Boden
argues that a theory that considers unexpected combinations, together with a psychological
explanation of analogy, may suffice as a theory of creativity.
Creativity in Choreographic Cognition
By nature contemporary dance is difficult to study as it is ephemeral and, unlike a musical score,
painting or sculpture, there are few notes of the development of the work or even good records of all
aspects of the performance. Fortunately, since early 1999 a collaborative research team involving the
Victorian College of the Arts, dance industry partners, and researchers in Australia has captured on
digital video the inception and development of new dance works by two elite choreographers. We
draw on this video and journal documentation seeking examples of problem finding and problem
solving, metaphorical thinking, and evidence of the synthesis of competing ideas.
An important characteristic of creativity in contemporary choreographic cognition is that dancers and
choreographers increasingly work together exploring, selecting, and developing dance material.
Australian choreographer Anna Smith developed new dance material working closely with eight
experienced and professional dancers over a period of six months. The dance materials were generated
from improvisations of the whole group. At one stage, spoken cues were given to the dancers such as
'Right elbow behind back, shoulders tilting, left hand reaching' and each dancer interpreted the cue.
Individual solutions were found and the group gradually selected and developed the interpretations
made by one or more of the dancers. Importantly, the choreographer was not in control of the material
thus generated but the choreographic process took place through interactive dance-making to which
everyone contributed (McKechnie & Grove, 2000). An explanation of creativity in choreography must
therefore address the complex of dynamics and interactions among dancers and choreographer in this
community of creative minds. In addition to motivation, memory, and personality factors that
underpin the individuals' thoughts and behaviour, there are dyads and triads within the group and
concomitant ideas, tensions, conflicts, attractions and defences. Thus the social and cognitive
psychologist searching for a fresh domain to test current theoretical assumptions will be pleased with
the uncharted territory offered by choreographer and dancer interactions.
Instances of problem finding and problem solving in choreographic cognition are easily found. The
development of movement as art brings with it challenges of the limits of the human body and best
use and negotiation of the dimensions, space and time. Although difficult to capture in writing, video
footage of Smith and her team demonstrates the cognitive complexity of a segment that involved rapid
and continuous whole body movement from all dancers with each performing a different series of
complex transitions. As well as the motor and spatial complexity of each transition, the dancers were

Intro
to carry out their individual movements while the group traced a DNA-like double helix. Before the
sequence could be performed a logistical analysis to determine a way in which it could work spatially
was carried out. Finally, movement of the complex spatial configuration of parts (dancers) and whole
(group forming the helix) was realized using colour-coded paper trails of the path of each dancer.
Thus the spatial and temporal configuration was modelled with concrete materials and after much
analysis and trial and error it was achieved in real time and space.
In another example, McKechnie describes creation of a work commissioned for a very small dance
space. The space elicited ideas and images related to the use of simple forms in small spaces.
Pondering this problem led to images of Ikebana, to the similar asymmetry of human lives lived in
close contact in small rooms, to the alienation of separate lives closely entwined spatially but
separated by emotional chasms. The source of the solution to the problem lay in synthesis between the
imagery of confined spaces and the experience of contained tensions. A final example involves
synthesis of real and imagined time in the perceptions of observers. Amplification, choreographed in
1999 by Phillip Adams reflects an interest in the contemporary cult of the pornography of car crashes.
The choreographer faced a problem of how to represent a distorted experience of time in dance terms.
The seemingly endless expansion of time experienced by car crash victims during the few seconds of
a violent accident became the source of a central image. The problem of conveying the nature of the
experience in real time was solved by breaking up movement material into brief distorted and
fractured components and performing a long and complex sequence of them at a tempo verging on the
perilous, a feat accessible only to highly trained contemporary dancers. The presence of imagery is
evident in these two examples and in most accounts of creativity. The examples also demonstrate that
imagery can occur in all sensory modalities and in contemporary dance, unlike other artforms, the
creative search is embodied in the human form.
Memory and Imagery in Rehearsal and Performance of Contemporary Dance - The Performer
Although choreography and contemporary dance have only rarely captured the interest of
experimental psychologists, classical ballet versus contemporary dance have been used as tools to
examine coding in human short- and long-term memory. Results include the observation that memory
for complex movements is more kinaesthetic than verbal (Starkes, Caicco, Boutilier & Sevsek, 1990).
Anecdotal accounts suggest that recall is often multi-modal such that activity in one mode triggers
knowledge or recall in another. Smyth and Pendleton (1994) used an interference paradigm and
measured effects of articulatory and movement suppression. Dancers' spans were longer than those of
non-dancers for both classical ballet and modern movement and both articulatory and movement
suppression decreased the dancer's spans. This implies that material is coded at least in the short-term
in both verbal and kinaesthetic form. Long-term memory for dance material has been examined by
Solso and Dallob (1995). They propose that a class of movements is represented abstractly in memory
in the form of a prototype. Solso & Dallob conclude that there is an underlying scheme that governs
the formation of body actions in general and dance routines in particular and that it may be possible to
determine basic laws of motor performance and transformation as part of a comprehensive theory of
dance 'grammar' and general kinaesthetic 'grammar'.
Of the experimental studies of memory for dance most have used classical ballet in which a sequence
of prescribed steps is drawn from an established repertoire of labelled formal movements. By contrast,
contemporary dance frequently consists of idiosyncratic movement derived from the theme being
explored and is less easily reduced to verbal description. At one point in developing Red Rain the
dancers commented on the extraordinary amount of information they needed to retain while working
with new and demanding movement material. On another occasion a dancer watched herself perform
a slow and intricate move on video but had little recollection of performing the movement or how she

Intro
made her body move in a particular way. Such observations have implications for memory in
choreographic cognition. One testable hypothesis is that verbal labels or cues for single movements
(such as 'Deirdre's wrist; Kathleen's sitting bones; Nicole's no. 3') are used initially. Over time, longer
and more complex movements are sequenced, rehearsed, and chunked in long-term memory. With
repetition, the entire sequence becomes part of kinaesthetic memory. A crucial question that arises is
to ask what is the nature of the representation in memory that stores and integrates visual, auditory,
propositional, spatial, temporal, and kinaesthetic features?
Imagery is used extensively in dance because "an arsenal of images has the ability to find a concise
way of describing a movement" (Smith, 1990, p. 17). Using psychometric tools, Overby (1990)
showed that experienced dancers differed significantly from novice dancers on three of four imagery
ability measures, namely body image, cognitive imagery and spatial ability. Interpreting results on the
Individual Differences Questionnaire (IDQ), Overby suggested that, while novice dancers prefer a
visual mode of thought, experienced dancers were equally inclined to verbal and visual modes of
thinking. She speculated that dance experience may be related to a tendency to process visual and
verbal information on equal terms. Foley, Bouffard, Raag & DiSanto-Rose (1991) demonstrated that
subjects who performed movements or imagined themselves performing them were better at
recognizing the movements than those who observed or imagined another performing. This
self-performed task effect was evident only for "uncommon" movements (modern dance and ballet).
Visualising movement and movement patterns is now common in sport and in systems of training in
kinesiology (Sweigard, 1974). Foley et al. concluded that further research is needed to establish
whether imagery abilities differ in general or specific ways across expert and novice dancers. The
ideas of Damasio (1999) and improved methods of investigation using new scanning technology are
likely to contribute to our understanding of imagery and movement.
In his analysis of Red Rain development footage, Grove (1999) noted that, as the dancers explored
movement triggered by a verbal cue, they appeared to intellectualise their task. Paradoxically, "the
movement became more internal, establishing its own pathways through the body, internal
realizations, instead of relying on a picture or mirror-image of what the spectators see". For Grove, "it
was as if the piece was being created from the inside out". There is imagery indeed behind the
movement created and explored by the eight dancers but the way in which experimental studies have
considered imagery seems to fall short of the processes involved in an actual creative act. Experiments
have dichotomised memory codes as either propositions or non-verbal structural descriptions and
images. However, categorisation of movement in terms of what it is not, i.e. non-verbal, is
uninformative and simply reflects the lingua-centric bias of cognitive psychology. In its stead, Grove
refers to dance-making as "an utterance of the body". The artist, whether poet or choreographer, does
not necessarily start out with words or a visual image, but instead material may come from a pulse or
a rhythm. A challenge for the experimental study of choreographic cognition is to divest itself of
reference to verbal versus non-verbal features and turn to the seemingly simple notion of a generative
pulse or rhythm. A dynamical view is based on this simple but powerful assumption.
As the medium of contemporary dance is time we propose that the artistry of movement is in
trajectories, transitions, and in the temporal and spatial configurations in which moves, limbs, bodies,
relate to one another. Choreographic cognition can be conceived as a dynamical system wherein
change to a single component can affect the entire interacting network of elements. In a dynamical
system, time is not simply a dimension in which cognition and behaviour occur but time, or more
correctly dynamical changes in time, are the very basis of cognition.
Meaning and Communication in Contemporary Dance - The Observer
The power of movement and dance to evoke memories has been identified as an important factor in

Intro
the communication achieved via contemporary dance. Hanna (1979) suggests that affective and
cognitive communication in dance are intertwined and she gives a broad account of the way in which
emotion is communicated. For example, physical movements associated with affect may stimulate or
sublimate a range of feelings and may be elicited for pleasure or coping with problematic aspects of
social involvement. Adults may find succour and release cathexis in culturally permissible motor
behaviour; this may be reminiscent of nurturance and protection of prenatal and infancy stages and
imitates satisfaction of childhood behaviour. Dance may communicate a kind of excitement; may also
provide a healthy fatigue or distraction that may abate temporary crises. Examples of the intoxication
that occurs with rapid movement abound. Such therapeutic matters are unlikely to be of concern to the
choreographer. However, such responses on the part of the observer constitute communication and
will reinforce pursuit of dance as art or entertainment. The psychological issue that remains is to
explain the mechanism that underpins release and cathexis. Sympathetic kinaesthesia is one possible
explanation.
Conversations with elite choreographers and dancers suggest the presence of intriguing somatic and
kinaesthetic processes when they observe dance performance and this leads to many possibilities for
research into communication via kinaesthetic perception. Anecdotal reports suggest that expert
observers actually feel the movement or feel as if they are performing the movement; a kind of
sympathetic kinaesthesia. One way to examine this would be to take detailed physiological recordings
of changes in tension, galvanic skin response, muscle response, heart rate, and blood pressure, as an
observer watches a performance. We can examine the effects of differing levels of experience and
performance expertise. We can also assess the way the presence of music might moderate
physiological change. Finally, we can ask whether there is evidence of comparable physiological
change in other performance artists such as elite musicians as they observe a virtuosic performance on
their instrument. Is observation for all elite artists a virtual performance? Indeed, one could imagine
that if such muscular and physiological changes occur during mere observation then styles of dance
do not evolve or change from simply watching seminal works but aspects of the performance may
literally be stamped in to the choreographer's kinaesthetic memory: a kind of virtual plagiarism!
Interestingly, recent neurophysiological findings suggest a mechanism that may underpin sympathetic
kinaesthesia. Neurons have been identified in both monkeys and humans that fire according to
particular actions of the hands and mouth, rather than with the individual movements that form them.
Furthermore, a class of these same neurons fire when the action is observed being performed by an
other (Di Pellegrino, Fadiga, Fogassi, Gallese & Rizzolatti, 1992; Fadiga, Fogassi, Pavesi &
Rizzolatti, 1995). Rizzolatti and Arbib (1998) suggest that the mirror system represents in an observer
the actions of an other. If this is so, then as we observe a dance performance particular neurons are
firing that represent particular dance actions in us.
A Dynamical Systems View of Choreographic Cognition
Time is the glue, the medium of choreography and contemporary dance and, for this reason,
contemporary dance lends itself to analysis in terms of dynamical systems theory. In this theory
complex wholes and forms emerge from simple elements and in self-organising dynamical systems
structures emerge from chaos. It is possible to apply the dynamical view to identify pulses, rhythms,
patterns that spark an idea that is utterable in movement. The pulse or rhythm can occur in any
modality but, for the creative choreographer, will be expressible as a composition of movement in
space and time. Ultimately, we can apply the notion of dynamical systems to better understand,
possibly to model, the movements and form of a single body, or many bodies, in space and time.
Another artform - music - has been described as a dynamical system (Burrows, 1997; Sloboda, 1998)
and Sloboda's account in particular is relevant to dance. Sloboda argues that meaning in music comes

Intro
from the way it embodies the physical world in motion. Human understanding of music comes from
our capacity for analogical thinking. If contemporary dance too embodies the physical world in
motion it may be doubly powerful in that it can be understood both by analogy and by direct
perception. That is, the trajectory of objects in motion, through time, is the very stuff of dance - real
objects moving in real space and real time. Adams' Amplification is a good example of a
contemporary dance work that can be understood both directly and by analogy.
A dynamical systems view of choreographic cognition holds that behaviour is continuous and each
component acts and interacts with others in the system. Each state of the system determines the next
state so that a structure or form evolves. Change occurs at many time scales and change at one scale
shapes and is shaped by change at others. The process is one of self organization where solutions
emerge to problems defined by particular constraints of the immediate situation (Thelen, 1995). In the
context of movement there will be a number of physical constraints that will influence and determine
the evolving form. Constraints might include mass, limb structure, size, weight, flexibility, space
limitations, and so on. Within the set of constraints there will only be a certain number of possibilities
so that the evolving movement is determined by what has come before and the context in which the
movement is set. Importantly though, the movement is flowing, continuous, transitional - it is motion
rather than simply 'moves' or 'steps'.
Constructing and Testing a Dynamical Theory of Choreographic Cognition
One of the first tasks for the development and evaluation of a dynamical theory of choreographic
cognition is to specify the constraints and features/variables of the system. Such information may be
procured from detailed three-dimensional analysis of (initially) simple movements/transitions. Even a
five-second sample of a transition in a modern dance piece will generate a multitude of possible
features. The simplest starting point would be to identify a feature as a single oscillator and
demonstrate entrainment and coupling to a relatively simple motion task. Thelen (1995), Saltzman
(1995) and Large and Jones (1999) provide examples of the way in which single- and dual-degree of
freedom oscillatory models lead to testable predictions of human timing behaviour. Although a single
oscillator is unlikely to capture the richness of the creation and performance of modern dance, an
investigation of a single oscillator model of dance-like movement or body transition would provide
the needed existence proof of the viability of the dynamical approach.
At a higher level of complexity, detailed analysis of dances (in Adshead, 1988) and the movement
notation system of Laban (1975) are fertile ground for identifying the key features and movement
variables in contemporary dance. The three broad categories in Laban Movement Analysis are use of
the body, use of space, and use of dynamic energy. Adshead expands on these concepts to include
analysis of relationships between the parts and the whole, and of interpretation and evaluation. Precise
elements are then defined within each of the categories. To take an example, movement may leave
straight lines as vapour trails, an action may result in curved and arc-like trails, and other motions
leave behind complex three-dimensional loops, twists, and spirals. These so-called trace-forms can
then be analysed in more detail: linear trace forms may be accomplished simply with flexion and
extension of various joints; curved trace-forms require abduction/adduction and sometimes rotation.
Technology now provides us with motion capture systems via digital video and computer imaging
which is able to reveal the most subtle and complex movement pathways. To scrutinize samples of
outstanding contemporary dance works with the tools of dynamical systems theory and digital
technology would be a fascinating and informative exercise.
A dynamical view of choreographic cognition has great explanatory power in that it is relevant to the
three key actors we have described - choreographer, performer and observer. Our dynamical view
proposes that the basis for an idea in movement can come from a pulse, beat, rhythm, or action. The

Intro
task and artistry of the choreographer is to notice such a pulse and to express it in bodily form. The
germ of an idea may multiply and develop so that from a single movement other variations,
approximations, caricatures, inversions, emerge. (At some later stage the movement may be described
using verbal language or visual images - but this is not necessarily its original form). In dynamical
terms, there may be structure and order, perhaps self-similarity, that emerges from apparent chaos.
Complexity increases when choreographer and dancers interact and dancers perform - transitions,
explorations continue to be conceived as a state space of many dimensions. Finally, for the observer,
there is understanding from recognition, perhaps via analogy, of objects, organisms, moving in space
and time and experiencing the conflicts, tensions, resolutions of a biological object negotiating the
world, a world of time, space, and others. Within the dynamical scheme, a dance work consists of
transitions of elements in high-dimensional space. The meaning for choreographer, dancer and
observer lies in the dynamics of these transitions and their embodiment of the physical and biological
world.
Author Notes
This research was supported by an Australian Research Council SPIRT grant. Details of the project
Unspoken Knowledges and Red Rain can be found at http://ausdance.anu.edu.au/unspoken
References
Adshead, J. L. (Ed.) (1988). Dance Analysis: Theory and Practice. London: Dance Books.
Boden, M. A. (1996). What is creativity? In M. A. Boden (Ed.), Dimensions of Creativity Cambridge,
Mass: MIT Press. pp 75-117.
Burrows,. D. (1997). A dynamical systems perspective on music. The Journal of Musicology, 15,
529-46.
Damasio, A. R. (1999). The Feeling of What Happens: Body and Emotion in the Making of
Consciousness. Harcourt Brace.
Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor
events: a neurophysiological study. Experimental Brain Research, 71, 491-507.
Fadiga, L., Fogassi, L., Pavesi, G., & Rizzolatti, G. (1995). Motor facilitation during action
observation: A magnetic stimulation study. Journal of Neurophysiology, 73, 2608-11.
Finke, R. A., Ward, T. B., & Smith, S. M. (1996). Creative Cognition: Theory, Research, and
Applications. Cambridge, Mass: MIT Press.
Foley, M. A., Bouffard, V., Raag, T., & DiSanto-Rose, M. (1991). The effects of enactive encoding,
type of movement, and imagined perspective on memory for dance. Psychological Research, 53,
251-59.
Grove, R. (1999). In the house of breathings. Proceedings of the Second International Dance
Research Conference. Auckland, New Zealand: Danz.
Hanna, J. L. (1979). To Dance is Human: A Theory of Nonverbal Communication. Austin: University
of Texas Press.
Johnson-Laird, P. N. (1988). Freedom and constraint in creativity. In R. Sternberg (Ed.), The Nature

Intro
of Creativity. Cambridge, UK: Cambridge University Press. pp 202-19.

Kay, S. (1994). A method for investigating the creative thought process. In M. A. Runco (Ed.),
Problem Finding, Problem Solving, and Creativity. Norwood, NJ: Ablex Publishing Corporation. p
116-129.
Koestler, A. (1964). The Act of Creation. London: Hutchinson.
Laban, R. (1975). Laban's Principles of Dance and Movement Notation. In R. Lange (Ed.) 2nd ed.
London: Macdonald and Evans.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying
McKechnie, S. (1996). Choreography as research. In M. M. Stoljar (Ed.) Creative Investigations -
Redefining Research in the Arts and Humanities. Canberra: The Australian Academy of the
Humanities.
McKechnie, S., & Grove, R. (2000). Thinking bodies. Brolga: An Australian Journal About Dance, in
press.
Overby, L. Y. (1990). A comparison of novice and experienced dancers' imagery ability. Journal of
Mental Imagery, 14, 173-184.
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neuroscience, 21,
188-194.
Rothenberg, A. (1994). Creativity and Madness: New Findings and Old Stereotypes. Baltimore: The
Johns Hopkins University Press.
Saltzman, E. L. (1995). Dynamics and coordinate systems in skilled sensorimotor activity. In R. F.
Port & T. van Gelder (Eds.), Mind as Motion: Explorations of the Dynamics of Cognition. Cambridge,
Simonton, D. K. (1994). Greatness: Who Makes History and Why. New York: The Guilford Press.
Sloboda, J. A. (1998). Does music mean anything? Musicae Scientiae, 2, 21-31.
Smith, K. L. (1990). Dance and imagery: The link between movement and imagination. Journal of
Physical Education, Recreation, and Dance, 61, 17.
Smyth, M. M., & Pendleton, L. R. (1994). Memory for movement in professional ballet dancers.
International Journal of Sport Psychology, 25, 282-294.
Solso, R. L., & Dallob, P. (1995). Prototype formation among professional dancers. Empirical Studies
of the Arts, 13, 3-16.
Starkes, J. L., Caicco, M., Boutilier, C., & Sevsek, B. (1990). Motor recall of experts for structured
and unstructured sequences in creative modern dance. Journal of Sport & Exercise Psychology, 12,
317-321.
Sweigard, L. (1974). Human Movement Potential: Its Ideokinetic Facilitation. New York: Dodd,
Mead & Co.
Thelen, E. (1995). Time-scale dynamics and the development of an embodied cognition. In R. F. Port

Intro
& T. van Gelder (Eds.), Mind as Motion: Explorations of the Dynamics of Cognition. Cambridge,
Wakefield, J. F. (1994). Problem finding and empathy in art. In M. A. Runco (Ed.), Problem Finding,
Problem Solving, and Creativity. Norwood, NJ: Ablex Publishing Corporation. pp 99-115.
Wales, R., & Thornton, S. (1994). Psychological issues in modelling creativity. In T. Dartnall (Ed.),
Artificial Intelligence and Creativity. Netherlands: Kluwer Academic Publishers. pp 93-105.
Back to index

CHILDREN'S COLLABORATIVE MUSIC COMPOSITION:
Proceedings paper

COMMUNICATION THROUGH MUSIC
Louise Morgan, University of Wales College of Medicine

David Hargreaves, University of Surrey, Roehampton
Richard Joiner, Open University
INTRODUCTION
Since the introduction of the National Curriculum (1988) in schools in England and Wales, it is now
required that all children study music up to the age of 14, and composition forms a large part of this.
Composition is defined very broadly in the primary school and refers to the briefest musical
utterances, as well as to more sustained inventions. This paper reports three studies of children's
collaborative music composition.
Collaborative work among children has been the focus of much research since the initial writings of
Piaget and Vygotsky. Essentially, it was claimed that children working in pairs or groups on any kind
of task can achieve a higher level of understanding than any one child could achieve alone (see Doise
& Palmonari, 1984). Research has since sought to explain what is learned through social interaction
and how the interaction takes place.
Research of this kind is vast, and has looked at children working collaboratively on a wide range of
scientific tasks, such as mathematical problem solving, logical reasoning and so on (e.g. Tudge &
Rogoff, 1989). Little of the research to date has examined the role of peer collaboration in creativity,
where goals are less clearly defined and measures are more ambiguous. It could be that factors found
to be responsible for productivity in the science-based tasks differ from those responsible for
productivity in creative tasks.
Much of the previous (science-based) peer collaboration research suggests that the most important
element of task activity in groups is the dialogue among group members (e.g. Tolmie et al, 1993). The
recurring theme is one of sharing ideas verbally, arguing through alternatives, and providing
justifications for accepted and rejected solutions. That is, the more of this type of talk that occurs
file:///g|/Wed/Morgan.htm (1 of 15) [18/07/2000 00:40:02]

among collaborating children, the greater their productivity. It is suggested here that in collaborative
music composition tasks, rather than discussing their ideas, the children would prefer to try their ideas
out directly on the musical instruments, and thus somehow communicate with each other through the
music itself.
Possible support for this hypothesis comes from studies of computer based problem solving tasks.
Pheasey & Underwood (1986) and others have found evidence of peer facilitation effects but low
levels of verbal interaction. Children working on a computer based task were found to produce higher
quality work when they collaborated with a partner than if they worked alone, but they did not appear
to be sharing their ideas verbally. Subsequent analyses revealed that the children were trying their
ideas out directly on the computer, thus they were said to be communicating with actions rather than
words. It is suggested that children working on music composition tasks will work in the same way.
The gender composition of the collaborating group is reported by previous researchers to be a salient
factor (e.g. Underwood, 1994). A detailed examination of the gender issues is beyond the scope of the
present paper, therefore this paper will consider only two key issues: firstly, the previous finding that
in mixed gender groups, boys tend to dominate over girls; and secondly, that single gender groups
tend to achieve better results than mixed gender groups. It should be noted that all of these findings
come from studies of science-based tasks and it is important to examine these issues in relation to
creative tasks.
To summarize, the overall aim of the present research was to examine what factors within a group of
children lead to the production of a good music composition. Particular attention was paid to the
amount of verbal interaction, and to whether the children communicated their ideas through the music,
and if so was this form of communication important for group productivity. Also of interest was how
the gender composition of the group affected the composition process and the quality of the work
produced.
The assessment of quality in music composition is a tricky area. It is argued that assessment
procedures need to be task specific and should be tailored to meet the needs of the researcher and the
demands of the task. Thus, procedures to assess the compositions are discussed within the context of
each of the studies. For a detailed discussion on these issues see Morgan (1998) and Webster (1992).
Three studies are reported here that differ only in the nature of the task given to the children. The first
task was a representational composition task, in which children were asked to compose a piece of
music to represent the events of a story. The second task was a formal music composition task and
required children to compose a piece of music 'that has a beginning, middle and end'. The third study
used an emotion-based composition task, and required the children to compose a piece of music 'that
will make me happy'. For a detailed discussion on these three types of task, see Barrett (1995).
STUDY 1
A TRIP TO THE SEASIDE: A REPRESENTATIONAL MUSIC
COMPOSITION TASK
METHOD
Eighty-eight children aged 9-10 were put into groups of 4 of varying gender compositions. The music

composition stimulus was a story about a family's trip to the seaside, which is presented in full below.
The children were asked to work together to produce a series of sounds or music to represent the
events of the story. They were given four musical instruments; a xylophone, a drum, a triangle and a
cabasa. It was ensured beforehand that they had prior experience of this kind of task, and of these
kinds of instruments. The children were told that they would have 20 minutes to work on the task and
they were videotaped throughout. They were also told that they would be asked to give a final
performance of their composition for the video camera.
A TRIP TO THE SEASIDE

"Mum, Dad, Ben and Sarah are all in the car, travelling to the seaside. The children are
very excited about their day out and chatter all the way. Dad parks the car and Mum
unloads the picnic from the boot. Ben and Sarah take off their shoes and rush through the
sand into the water. The children shriek as the icy water laps up against their ankles. Dad
throws a ball to Ben, who drops it. A dog snatches the ball and runs with it along the
beach. Dad chases the dog, grabs the ball and throws it back to Sarah. Mum tells the
others that the picnic is ready and the family tucks into sandwiches and cakes. After their
lunch, the children play in the sea once more before Dad shouts that it is time to go.
Everyone is very tired and Ben and Sarah fall asleep in the back of the car. Mum yawns,
and Dad switches on the radio to keep himself awake for the journey."
Process variables
The collaborative working period was assessed independently of the performance of the finished
composition. Several aspects of the children's interaction during the working period were timed with a
stopwatch.
Total talk for each child was sub-divided into task directed talk, time spent reading the story aloud,
off task talk and interaction with the researcher.
Task directed talk was defined as any talk directed towards the successful completion of the task.
This type of talk included the presentation of ideas and suggestions to other group members, the
discussion of alternatives and the justifications of accepted and rejected solutions. Task directed talk
was therefore assumed to be indicative of attempts to share the social reality of the problem-solving
situation.
Read was simply the time spent reading the story aloud. This was included because it comprised a
large part of the child's talk time, and while it was task directed by nature, it was not seen as actively
sharing one's ideas with other group members.
Off task talk was defined as any talk not directed towards completion of the task, suggesting time out
from actively working to complete the task.
Interaction with the researcher was any time spent talking to the researcher, including questions of
help.
Similarly, there were two sub-variables of total time playing the instruments: task directed play and
exploratory play.
Task directed play was defined as play directed towards completion of the task and towards other

members of the group. This definition included the presentation of ideas directly on the instruments,
and was viewed as an alternative means of sharing the social reality.
Exploratory play refers to the exploration of the sound materials, and was seen as being directed
towards the individual, or 'playing for oneself'. This type of play was not seen as contributing to a
mutual understanding of the task, and did not move the group closer towards establishing shared
understanding or towards the completion of the task.
To assess the possible effects of the gender composition of the collaborating group in terms of verbal
and musical interaction and subsequent group performance, the type of group in which the children
were placed was coded as consisting of all boys, all girls or mixed gender.
Assessment of the Compositions: The Selectivity Rating Scale
A five-point rating scale was developed to assess the quality of the compositions. The scale provides
guidelines to assist the raters in their marking of the compositions and is presented in full in below.
Three independent raters used the scale to give each of the compositions a mark out of 5. The group
score was therefore the mean mark given by the three raters.
The essence of this scale is the extent to which the children display selectivity or discrimination of
both the actions and events within the story, and of the instruments chosen to represent these. There
are many actions and events within the story that could be represented by an infinite number of
musical sounds. The children must then select a variety of actions or events from the story and decide
how to illustrate these with the available musical apparatus. In this way, groups of children who score
well on the rating scale will be those who demonstrate a certain degree of musical thinking, apparent
in this context through the selection and rejection of sounds. This is based largely on Swanwick's
(1979) three proposed criteria for attempting a definition of music, namely selection, relation and
intention. For a full discussion of the development of this scale see Morgan (1998).
Rating Scale
Score 1:
All sound effects* are played, with no evidence of selection or discrimination. Sound
effects are stereotyped. No evidence of decision making as to which sound should
represent which event or action within the story. No apparent organisation.
Score 2:
More selective with a sense of unity. One or two instruments have been chosen to
represent certain elements of the story. The sound effects tend to focus on events, rather
than actions, and are still very stereotyped. Little structural control and the impression of
spontaneity without development of ideas.
Score 3:
Further selection of events/actions and of instruments is apparent. Sounds become more
appropriate and more inventive. Evidence of a structure to the finished piece.
Compositions still rather predictable.
Score 4:
More selective still. Less narrative. Clear beginning and ending.

Score 5:
High level of selection and discrimination, of both the events/actions chosen and of the
instruments. Clear beginning, middle and ending. A more abstract level than previously.
Equal representation of events, actions, emotions, etc.
*The use of the term 'sound effects' is for descriptive clarity only. At no time at all was it suggested to
the children that they work on producing a series of sound effects. For the children, the emphasis was
put on the transformation of elements within the story into a musical medium.
RESULTS
Verbal and musical interaction
Table 1 shows significant relationships between group productivity, as determined by the selectivity
rating scale, and task directed talk (r=.47, p<.05) and task directed play (r=.44, p<.05). There were no
significant relationships between group productivity and the time spent reading the story aloud, off
task talk, interaction with the researcher or exploratory play. A t-test revealed that there was
significantly more talk than play (t=2.30, p<.05).
Pearson Correlations Between the Process Variables and the Group Score
Table 1
Group score (N=22)
Process variables r p
Task directed talk .47 .01
Task directed play .44 .02
Exploratory play -.29 ns
Off task talk -.23 ns
Interaction with researcher -.16 ns
Reading story .22 ns
Gender
Table 2 suggests that the all-girl groups achieved the highest marks for their finished compositions
(mean 3.60), followed by the mixed gender groups (mean 3.30), then the all-boy groups (mean 3.14).
These differences were not significant.

Group Score by Group Type

Table 2
Mean (seconds) F p
(S.D.)
Boys Girls Mixed df(2,19)
Group score* 3.14 3.60 3.29 2.46 .09

(.80) (.74) (.76)
* maximum score = 5
In the mixed gender groups, a series of t-tests revealed that the girls engaged in significantly more
total talk than the boys (t=2.02, p<.05).
DISCUSSION
An important finding in this study was the significant relationship between task directed play and
group productivity. This suggests that the children were communicating their ideas through the music,
and that this type of communication was important for group productivity.
A significant relationship was also found between group productivity and task directed talk, and there
was significantly more talk than play. It is suggested here that this may be due to the nature of the
task. The stimulus was highly verbal and the children's ideas may be adequately expressed verbally.
An alternative task is necessary to study this further.
Gender
In this study, the girls in the mixed gender groups talked significantly more than the boys. This
finding is in contrast to those of previous peer collaboration research, where boys tend to dominate
over girls. A possible explanation for this comes from status theories (e.g. Lee, 1993), which suggest
that if a task is perceived as being within the domain of expertise of one particular gender, that gender
will dominate in a mixed gender setting. The previous peer collaboration research has focused on
science-based tasks which could be perceived as being more 'for boys'. Music in schools is perhaps
seen as more 'for girls', and may help explain female verbal domination in the present study (Archer,
1992).

STUDY 2
COMPOSE A PIECE OF MUSIC THAT HAS A BEGINNING, MIDDLE
AND END: A FORMAL MUSIC COMPOSITION TASK
It was suggested that the children's ideas in study 1 might have been adequately expressed verbally
given the verbal nature of the task. The task in study 2 was a formal music composition task that
required the children to work directly with musical structure and form, and moved away from the
direct representation of external events. It was proposed that with a formal composition task, musical
interaction will be related to group productivity and that verbal interaction will show no relationship.
The two key gender issues are again examined here: whether one gender consistently takes control of
the task verbally and non-verbally in the mixed gender groups, and the relative productivity of single
and mixed gender groups.
METHOD
Seventy two children aged 9-11 were taken from a second primary school. The same procedure used
in study 1 was used in study 2. The only difference was the composition task. In this study, the
children were asked to work together to compose a piece of music that has a beginning, middle and
end.
Assessment of the compositions

In this study, a series of validated rating scales developed by Hargreaves, Galton & Robinson (1996)
were used to assess the compositions, and these are presented below. Each of the compositions was
rated by three independent raters. The group score was the mean total awarded by the three raters.
UNEVOCATIVE 1 2 3 4 5 6 7 EVOCATIVE
DULL 1 2 3 4 5 6 7 LIVELY
UNVARIED 1 2 3 4 5 6 7 VARIED
(repetitive, limited) (wide ranging)
UNORIGINAL 1 2 3 4 5 6 7 ORIGINAL
(safe, conventional) (imaginative, innovative)
INEFFECTIVE 1 2 3 4 5 6 7 EFFECTIVE
UNINTERESTING 1 2 3 4 5 6 7 INTERESTING
UNAMBITIOUS 1 2 3 4 5 6 7 AMBITIOUS
(adventurous)
DISJOINTED 1 2 3 4 5 6 7 FLOWING
(articulate)

AESTHETICALLY 1 2 3 4 5 6 7 AESTHETICALLY
UNAPPEALING APPEALING
TECHNICALLY 1 2 3 4 5 6 7 TECHNICALLY SKILFUL

UNSKILFUL
RESULTS

Table 3 shows a significant relationship between task directed play and group score (r=.47, p<.05). No
relationship was found between task directed talk, exploratory play, off task talk and interaction with
the researcher and group score. A t-test revealed that there was significantly more play than talk
(t=10.71, p<.001).
Pearson Correlations Between the Process Variables and the Group Score
Table 3
Group Score (N=18)
Process Variables r p
Task directed talk .33 ns
Gender
In the mixed gender groups, neither gender consistently took control of the task. Table 4 shows the
mean marks awarded to the compositions. The all-girl groups were awarded the highest marks,
followed by the all-boy groups and the mixed gender groups. These differences were not significant.
Score by Group Type

Table 4

Mean F p
(S.D.)
Boys Girls Mixed df (2,15)
Group score 53 69 52 2.41 .12

(20.89) (6.37) (14.97)
DISCUSSION
In this study, there was a significant relationship between group productivity and task directed play.
This suggests that, in line with study 1, musical interaction was important for productivity. However,
in contrast to study 1, no relationship was found between group productivity and task directed talk,
and there was significantly more play than talk. It is suggested here that this was due to the nature of
the task. The formal music composition task required the children to work directly with musical
structure and form rather than with the direct representation of external events. The children's ideas
were more efficiently expressed directly through the music and not through verbal discussion.
Gender
In this study, neither gender consistently took control of the task in the mixed gender groups. The
boys and girls may have felt on a more equal footing in this study than in Study 1 in terms of their
ability to tackle the task. Perhaps this type of task was one to which the boys were better able to relate
and may be more akin to the sort of music they enjoy. Or perhaps it was a type of task to which the
girls were less able to relate. This requires further investigation.
STUDY 3
COMPOSE A PIECE OF MUSIC THAT WILL MAKE ME HAPPY: AN
EMOTION-BASED COMPOSITION TASK
Studies 1 and 2 have looked at children working on a representational and formal music composition
task respectively. With the representational music composition task, where the stimulus was highly
verbal, both verbal and musical interaction were related to productivity. With a formal music
composition task, while musical interaction was related to productivity, verbal interaction was not. It
was suggested that these differences were due to the nature of the task. It is therefore important to
examine a further type of task in order to support these claims. On the basis of the above findings, it is
suggested that with an emotion-based music composition task, musical interaction will have a
significant relationship with group productivity and that verbal interaction will be both less prevalent
and less important.
The two key gender issues are again examined here: whether one gender consistently takes control of
the task in the mixed gender groups, and the relative productivity of single and mixed gender groups.

METHOD
Seventy two children aged 9-11 were taken from a third primary school. The same procedure as
before was used in this study. The children were asked to work together to compose a piece of music
'that will make me happy'. The quality of the compositions was assessed by three raters using the
Hargreaves et al (1996) scale discussed above.
RESULTS
Table 5 shows that a significant relationship was found between group productivity and task directed
play (r=.56, p<.05). No relationships were found between group productivity and task directed talk,
exploratory play, off task talk or interaction with the researcher. A t-test revealed that there was
significantly more play than talk (t=19.76, p<.001).
Pearson Correlations Between The Process Variables and the Group Score
Table 5
Group Score (N=18)
Process variables r p
Task directed talk .14 ns
Gender
Neither gender was found to consistently take control of the task in the mixed gender groups. Table 6
shows that the all-boy groups were awarded significantly higher marks for their compositions than the
mixed gender groups (F=5.28, p<.05). The all-girl groups lay in the middle.
Score by Group Type

Table 6
Mean F p
(S.D.)
Boys Girls Mixed df (2,15)

Group score 37 30 21 5.28 .02

(5.85) (12.39) (4.40)
DISCUSSION
The results support the hypothesis that musical interaction would be related to group productivity and
that there would be no relationship between group productivity and verbal interaction. It is suggested
that this is due to the nature of the task.
In this study, the all-boy groups were significantly more productive than the mixed gender groups,
with the all-girl groups lying between the two. It was suggested in the discussion of study 1 that the
status theories may explain female verbal domination in the mixed gender groups, and that music in
schools may be perceived as being more 'for girls' than 'for boys'. It may not be quite so clear cut -
rather than the subject being gender specific, it may be the task within that. There is some suggestion
that girls prefer to work on tasks that are more structured and verbal in nature (Morgan, 1998). Boys
tend to fare better on more open-ended, creative tasks. The task in study 3 is the most abstract of the
three and may therefore appeal more to the boys than the girls.
GENERAL DISCUSSION
Verbal and Musical Interaction

The three studies of children's collaborative music composition presented in this chapter provide
support for those who claim that communication among children in collaborating groups is crucial for
group productivity (e.g. Rogoff, 1998, Wegerif, Mercer & Dawes, 1999). However, the present
research suggests that this communication need not always be verbal, but can also be communication
through music. These findings are important because they show that children communicated their
ideas through music across a range of music composition tasks.
In study 1, the children were given a representational composition task and it was found to be
important that they talked to each other during the collaborative working period in addition to playing
the instruments. There were significantly higher levels of talk than play, but both were important for
the productivity of the group. In study 2, the children were given a formal music composition task,
where they were asked to produce a continuous piece of music as distinct from a series of sounds. The
most important element of the task activity was found to be task directed play, that is the presentation
of ideas through music rather than words. Verbal interaction did not have a significant relationship
with group productivity and there was significantly more play than talk during the collaborative
working period. To assess this further, study 3 was conducted with a third type of task, an
emotion-based task. In this task, the children were asked to compose a piece of music that "will make
me happy". Again, communication of ideas through the musical instruments was both apparent and
important, and verbal interaction showed no clear relationship with group productivity. There were
very low levels of verbal interaction and high levels of musical interaction.
In study 1, there was a positive relationship between talk and quality of composition. No such

relationship was found in studies 2 and 3. One possible explanation for these different findings is the
nature of the task. In studies 2 and 3 the children had to work directly with musical form and structure
and thus communication through music was more important. This explanation is only tentative,
because each of the studies was carried out in a different school with different approaches to music
education. More research is needed to establish which of the findings are due to the nature of the task,
and which are due to the differences among the schools.
All three studies showed there was a significant and positive relationship between musical interaction,
as measured by task-directed play, and the quality of the music compositions.
No relationship was found between exploratory play and the quality of the music composition in any
of the three studies. The exact nature and function of what was called exploratory play is still rather
unclear. It was defined as an individualistic form of play, as opposed to play directed towards the
group or towards completion of the task. It was essentially the exploration of the musical instruments.
While this element of play is considered individualistic rather than co-operative, it did not have a
negative relationship with group productivity as would be expected. Rather it showed no relationship
with group productivity. It is therefore dangerous to assume that exploratory play is somehow
detrimental, it may in fact be a vital part of task accomplishment, or have some other role that the
present analysis has not tapped into. It may be an important precursor to task directed play, where the
child may be trying out ideas for him/herself before feeling ready or able to share those ideas with the
rest of the group. What begins its life as an exploration of ideas at the individual level may somehow
make the transition to task directed play at the group level. 'Group score' may not be the most
effective means of assessing its importance.
A fundamental difficulty with the definition of exploratory play was that it did not distinguish
between individualistic playing involving trying out ideas, and simply 'messing around' with the
instruments. On a behavioural level, this distinction is problematic to make as it involves inferences of
intention on the part of the child. While exploratory play did not show a clear relationship with group
productivity, high levels of this behaviour were observed in all four studies, and so it would seem
feasible to suggest that it must have some function. Is it improvisation, exploration of ideas,
exploration of the instruments or simply a time wasting activity to avoid working on the task? It is
important to study the elements which make up the category of exploratory play as it may consist of
all of these.
Although all three studies showed that children can make use of musical interaction for the effective
communication of ideas, many questions remain unanswered. Does musical interaction act like verbal
interaction. That is, if the purpose of verbal interaction in collaborating groups is to present ideas and
discuss their alternatives, how is this happening in music? To what extent are ideas presented
musically and subsequently modified musically? Verbal interaction essentially involves reciprocity; to
what extent does this occur in musical interaction? Does one person in the group dominate in their
instrumental playing as sometimes occurs in verbal interaction? These issues require further
investigation.
Given Allison's (1986) argument that problem solving in the arts requires the use of thought patterns
different from those in science, it may have been expected that the children would work in a way that
was different from the way they might approach a science-based task. However, composition is a
form of problem solving, where a problem is set up, decisions are taken to solve the problem which
results in the satisfaction of having answered them (Salaman, 1988). While it is accepted that there
may be infinite solutions to this problem, the results of the present research suggest that the work
needed to complete the task may involve similar processes to those observed in science-based tasks.
That is, behaviourally, the same factors found to be responsible for productivity in science-based tasks

account for productivity in music composition tasks, namely the communication of ideas and the
establishment of a shared social reality.
Gender Composition of the Collaborating Group
The studies concentrated on two main gender issues: firstly the findings of previous research which
suggest that boys in mixed gender groups dominate verbally and non-verbally over girls; and
secondly, the suggestion that mixed gender groups tend to be less productive than single gender
groups.
Mixed Gender Groups

Study 1 showed the most surprising results in that it was the girls in the mixed gender groups who
dominated verbally over the boys. This is in stark contrast with previous research, which suggests that
in mixed gender groups, it is the boys who occupy the most 'verbal space' (Swann, 1992). This could
be explained by the theory that if a subject is perceived as being within the domain of expertise of one
gender, that gender will take control of the task (Lee, 1993). Music in school has been rated as being
'towards the feminine' (Archer, 1992), and so one would expect the females to dominate. However,
studies 2 and 3 found the genders to be on an equal footing in that neither boys nor girls consistently
took control of the task. Perhaps it is not the simple case of music being seen as 'feminine', rather the
task within that. It is important also to determine whether it is the boys who felt more competent in the
second two tasks than they did in the first one, or whether the girls felt less able to tackle the second
two than the first one.
The Relative Productivity of Single and Mixed Gender Groups
Previous research suggests that mixed gender groups perform less effectively than single gender
groups (e.g. Fitzpatrick & Hardman, 1994). Study 3 was the only one to show a significant difference
in group productivity as measured by the rating scales, with the all-boy groups attaining significantly
higher marks for their compositions than the mixed gender groups. Study 2 showed the mixed gender
groups to be the least productive, but this difference was not significant. In Study 1, the all-girl groups
achieved the highest marks, followed by the mixed gender groups then the all-boy groups, although
this difference did not reach significance. The results of the studies point towards the suggestion that
mixed gender groups are less effective than single gender groups, although as the differences in
productivity only reached significance in one of the studies, any conclusions based on this should be
made with caution. These findings are important as a study carried out by the authors revealed that
teachers prefer children to work in mixed gender groups for music composition (Morgan, 1998), and
this may in fact not be the most effective method in terms of group productivity.
Any conclusions about the gender issues raised here should be attempted with caution. The fact that
there is not one consistent finding relating to gender throughout the three studies suggests that more
research of this kind is needed to determine how much of the observed differences are due to the
nature of the task. It could tentatively be concluded that the girls engaged in more task directed
behaviours when working on the representational task (Study 1), whereas the boys seemed unable to
engage themselves appropriately. This was apparent by female verbal domination in mixed gender
groups and the all-girl groups' achievement of the highest marks. In the other two types of task
(formal and emotion-based), the boys and girls seemed on a more equal footing in terms of task
directed behaviours, although the all-boy groups in Study 3 obtained significantly higher evaluations
for their compositions than the mixed gender groups.

SUMMARY
In sum, the present research has been concerned with children's collaborative music composition, with
the principal aim of establishing which factors within groups of children are important for group
productivity. Previous peer collaboration research has suggested that the most important element of
task activity within groups is the dialogue among group members. In the three studies reported here,
the importance of verbal communication was found to be dependent on the composition task. The
present research also showed that this 'dialogue' could occur musically, that is through the music itself
rather than through words. Thus, talking about music composition is not always productive, and there
is no substitute for the experience of the music itself.
References
Allison, B. (1986). Some aspects of assessment in art and design education. In M. Ross (Ed.)
Assessment in arts education: A necessary discipline or a loss of happiness. Pergamon Press, Oxford.
Archer, J. (1992). Gender stereotyping of school subjects. The Psychologist: Bulletin of the British
Psychological Society, 5, 66-69.
Barrett, M. (1995). Children composing: What have we learnt? In H. Lee & M. Barrett (eds.), Honing
the Craft: Improving the Quality of Music Education. Conference proceedings of the 10th National
Conference of the Australian Society for Music Education, Hobart: Artemis Publishing Consultants.
Doise, W. & Palmonari, A. (1984) (Eds.). Social interaction in individual development. Cambridge
University Press.
Fitzpatrick, H. & Hardman, M. (1995). Gender and the classroom computer: Do girls lose out? In H.
C. Foot, C. J. Howe, A. Anderson, A. K. Tolmie, & D. A. Warden (Eds.), Group and interactive
learning. Boston: Computational Mechanics Publications.
Hargreaves, D.J., Galton, M. J. & Robinson, S. (1996). Teachers' assessments of primary children's
classwork in the creative arts. Educational Research, 38 (2), 199-211.
Lee, M. (1993). Gender, group composition and peer interaction in computer-based co-operative
learning. Educational Computing Research, 9 (4), 549-577.
Morgan, L.A. (1998) Children's collaborative music composition: Communication through music.
Unpublished doctoral dissertation, University of Leicester.
Pheasey, K. & Underwood, G. (1995). Collaboration and discourse during computer-based problem
solving. In H. C. Foot, C. J. Howe, A. Anderson, A. K. Tolmie, & D. A. Warden (Eds.), Group and
interactive learning. Boston: Computational Mechanics Publications.
Rogoff, B. (1998). Cognition as a collaborative process. In W. Damon, D. Kuhn, & R. S. Siegler
(eds.) Handbook of Child Psychology: Cognition, Perception & Language (5th Edition). (pp.
679-744), New York: Wiley.
Salaman, W. (1988). Personalities in world music education. No. 7 - John Paynter. International

Swann, J. (1992). Girls, boys and language (First ed.). Oxford: Blackwell.
Swanwick, K. (1979). A basis for music education. Windsor: NFER.
Tolmie, A., Howe, C., Mackenzie, M. & Greer, K. (1993). Task design as an influence on dialogue
and learning: Primary school group work with object flotation. Social Development, 2 (3), 189-211.
Tudge, J. & Rogoff, B. (1989). Peer influences on cognitive development: Piagetian and Vygotskian
perspectives. In M.H. Bornstein & J.S. Bruner, Interaction in human development. Lawrence Erlbaum
Associates.
Underwood, J. (ed.) (1994). Computer-based learning (First ed.). London: David Fulton Publishers
Ltd.
Wegerif, R., Mercer, N., & Dawes, L. (1999). From social interaction to individual reasoning: an
empirical investigation of a possible socio-cultural model of cognitive development. Learning &
Instruction, 9, 6, 493-516.
Back to index

From: Prof Richard Parncutt
EFFECT OF TEMPORAL CONTEXT ON PITCH SALIENCE IN MUSICAL CHORDS
Prof Richard Parncutt
parncutt@kfunigraz.ac.at
Background:
Terhardt's theory and model of musical pitch perception, as further developed

and applied by Parncutt, predicts the pitch-salience profiles of musical chords
(i.e., the relative perceptual salience of 12 chroma following an isolated
presentation of a chord of octave complex tones). Such profiles apply firstly
to temporally isolated sounds. Research on streaming and auditory scene
analysis by van Noorden and by Bregman, and on memory for melodic pitches by
Deutsch, suggests that temporal context has a significant impact on such
profiles, and raises doubts about the music-theoretical applicability of the
approach.
Aims:
We tested the effect of simple temporal contexts on the tone profiles of

musical chords.
method:
In each trial, music students heard a chord of octave complex tones followed by
single tone, and rated how well the tone followed the chord. Different
combinations of chord and probe tone were presented in random order and in
random transposition. In a second experiment, the chord was either preceded or
followed by a distractor tone, which listeners were instructed to ignore. When
the distractor followed the chord, the distractor also followed probe tone.
Results:
We hypothesized that distractor tones would stream with nearby chord tones,
attracting attention to them and increasing their perceptual salience. This was
not confirmed. Instead, peaks were generally observed at both chord and
distractor pitches. More importantly, when the chord and the distractor tone
together created a more familiar tonal fragment or progression, the tone
profile was more clearly structured.
Conclusions:
Temporal context and streaming do not destroy pitches that are implied
according to the pitch model, but not played. This seems to confirm the model's
music-theoretic potential. However, responses were strongly influenced by
listeners' familiarity with specific sound structures that occur frequently in
western music and are not represented in the model.
Back to index
file:///g|/Wed/Parncuab.htm (1 of 2) [18/07/2000 00:40:03]

From: Prof Richard Parncutt
file:///g|/Wed/Parncuab.htm (2 of 2) [18/07/2000 00:40:03]

The Mimetic Hypothesis and Embodied Musical Meaning
Proceedings paper

Arnie Cox, Oberlin College
Recall the Beethovenian theme of the last movement of Brahms's 4th Symphony, or perhaps some other favorite instrumental theme, such as
the Adagio from Dvórak's 9th Symphony. As you recall either of these or some other melody, ask whether your voice is involved or activated
in any, whether imagining singing, or singing along, or feeling only the impulse to sing along.
If your voice is involved in any way, as anecdotal evidence suggests that it is for many people, why this should be? Why should this form of
subvocalization be a part of how one recalls an instrumental melody?
Evidence from various fields of cognitive inquiry suggest that subvocalization is part of a process of mimetic participation whereby we
understand human-made movements and human-made sounds. This paper presents the "Mimetic Hypothesis," which holds that i) part of how
we understand human movement and human-made sounds is in terms of our own experience of making the same or similar movements and
sounds, and ii) this process of comparison involves overt and covert imitation of the source of visual and auditory information. After
presenting the evidence for the hypothesis, I will suggest how mimetic participation appears to play a role in the creation of musical meaning,
in terms of its relevance for the concept of musical verticality and with respect to studies of gesture, musical affect, melodic forces,
semiosis, music and gender, music and drama, music and society, aural skills pedagogy, music and dance, and music therapy.
Figure 1: Process of Understanding Sounds
Figure 1 represents the process hypothesized whereby heard human-made sounds are understood in part via overt and covert mimetic
participation.
The hypothesis holds that we understand all of the overt gestures of performers-the finger, arm, trunk, and leg movements-via overt and covert
imitation. Overt forms of mimetic participation include toe-tapping, swaying, dancing to music, and singing along with music; covert forms
include subvocalization and other aspects of motor imagery. Because the overt forms are generally more occasional, they are somewhat less
informative than the covert forms, which, according to the evidence presented below, appear to occur regularly as an automatic part of music
perception and cognition.
file:///g|/Wed/Cox.htm (1 of 7) [18/07/2000 00:40:06]

An important aspect of the hypothesis is the claim that we understand human-made musical sounds, whether vocal or instrumental, via covert
vocal imitation, or subvocalization. As explained below, when we see that vocal sounds are products of motor activity, we can see that vocal
imitation is a special case of imitating the motor activity of performers. Because subvocalization is perhaps the least intuitively relevant form
of mimetic participation, I will focus primarily on the evidence for this part of the process. The tacit activation of the voice in perceiving,
recalling, and conceptualizing music is part of the larger story of mimetic participation in musical experience, which in turn is part a still
broader story of human cognition and learning involving imitation, whereby we participate with and understand one another. In the case of
human sound production, because most of us have made vocal sounds everyday since birth, the voice becomes our central basis for
understanding the sounds made by other humans.
What follows is in some ways similar to Naomi Cumming's explanation of how we hear subjectivities in music (1997); similar to Andrew
Mead's "Bodily Hearing" (1996) and Judy Lochhead and George Fisher's work on gesture (1997), and similar to ideas put forth by Patricia
Carpenter (1967), Thomas Clifton (1983), David Lidov (1987), and Kendall Walton (1993, 1997); however, my approach is actually most
similar to that of the 19th-century philosopher Herbert Spencer (1857).
Kinds of Evidence for the Mimetic Hypothesis
Here are six kinds of evidence for the mimetic hypothesis:
1) Face-to-face imitation 4) Musical imagery studies
- mimesis between infants & parents - vocal mimesis and musical sounds
- mimesis in pupil dilation 5) Studies of speech as gesture
2) Motor imagery studies - mimesis and audible gestures
- mimesis and visible gestures 6) Vocal descriptions of non-vocal music
- mirror neuron studies - MUSICAL SOUNDS ARE VOCAL SOUNDS
3) Subvocalization studies
- mimesis and vocal sounds
1) Imitation in Face-to-Face Communication

Infant studies confirm what may not seem an especially profound observation: infants imitate the vocalizations, facial expressions, and
gestures of others around them (Stern 1985). This is part of how we first learn to take part in the world-by imitating those around us. But these
studies show something that may not seem obvious or important at first. Not only do infants imitate parents, but parents also imitate infants.
Consider the question of why we should use such voices and behavior as we do around infants-or around dogs and cats. One answer is that
mutual imitation fosters mutual understanding, and if it is a basic human desire to understand and to be understood, mutual imitation helps
satisfy this desire. I want to suggest, in a manner similar to Kendall Walton (1997), that the overt imitation we use as children remains a part
of how we participate with and understand others in the world, and that, rather than outgrowing imitation as adults, mimesis instead becomes
generally more covert.
2) Mimesis and Motor Imagery
The strongest evidence for the role of mimesis comes from studies of short-term memory, motor imagery, speech imagery, and musical
imagery. Evidence from clinical studies measuring reflex activity, EMG activity, autonomic activity, and associated brain activity measured in
PET scans and fMRI scans, suggests that we understand the movements, speech, and musical sounds produced by others in part via
unconscious imitation of those we observe. The "monkey see, monkey do" behavior of children can be understood as an overt form of the
same process that eventually becomes covert in adults.
The importance of mimesis in human cognition is reflected in current neurological studies such as Gallese and Goldman's (1998) studies on
mirror neurons and these authors' theory of mind-reading. In one of the studies that they cite, a monkey performs a certain grasping gesture,
and there is corresponding neural activity in a specific part of the monkey's brain. This monkey then observes the same gesture performed by
the experimenter and the same firing pattern recurs in the monkey's brain. "Monkey see, monkey do" is thus also "monkey see, monkey
imagine-do." For the monkey, understanding the grasping behavior of others appears to involves covert imitation as part of the act of
imagining what it must be like to perform the observed gesture. Gallese and Goldman's study, and other studies of motor imagery (Fadiga et
al. 1998; Fadiga and Gallese 1997; Gallese et al. 1996; Gallese et al. 1997; Rizzolatti et al. 1996), report experimental evidence which
suggests that very much the same thing happens in human cognition: understanding the observed behavior of others involves imagining
performing the same or similar actions. Since human musical performance involves specific motor actions, these studies on mimetic motor
imagery become relevant for conceptualization of music as performed: if Gallese and Goldman are correct about mirror neurons and cognition
of observed motor activity generally, then their arguments also hold for the specific cases of motor activity and motor imagery in human
musical production, perception, and cognition.
3) Subvocalization
As with understanding observed motor activity, it has long been known that comprehension of written and spoken language involves
subvocalization (Gibson and Levin 1975). I want to suggest that subvocalization be seen as a form of imitation, which begins in infancy in the
imitation of the speech of parents and others around us, and which continues in generally more covert form(s) throughout our lives.
Gathercole and Baddeley (1993) argue that subvocalization while listening to spoken language is part of how short-term memory for speech

operates: comprehension of spoken language is believed to involve rehearsal of heard speech in a phonological loop, which is reinforced by
subvocal articulation. If comprehension of spoken words involves covert imitation, it seems reasonable to expect that comprehension of sung
words ought to involve covert imitation as well-and this is in fact just what Baddeley and Logie (1992) suggest. Indeed, it would be strange if
subvocalization were relevant to human speech but not relevant to human song. Studies such as Vaneechoutte and Skoyles (1998), linking the
phylogenetic development of song and speech, strengthen the intuition that subvocalization is just as relevant for song as it is for speech.
4) Musical Imagery: Subvocalization in Music
Studies by Smith, Reisberg, and Wilson (1992) and Smith, Wilson, and Reisberg (1995) suggest that subvocalization may in fact be integral to
perception and cognition of vocal music. But then what about instrumental music? Studies devoted to musical imagery by Crowder and Pitt
(1992), Baddeley and Logie (1992), and Smith, Reisberg, and Wilson (1992) suggest that we also subvocalize when listening to instrumental
music. This might seem counterintuitive at first: why should we subvocalize when listening to instrumental music? To answer this question, it
helps to think of instrumental music as human-made sounds. As in recalling the Brahms symphony, whether listening to speech, or poetry, or
sung melody, or instrumental melody, we are in each case listening to human-made sounds and recognizing human physical behavior. Vocal
music and instrumental music are products of more or less specific motor activity, and part of how we understand such sounds is in
recognizing the motor activity that produces them and (covertly) imitating this motor activity.
5) Speech As Gesture
One perspective that helps with this last point is from a recent book by Armstrong, Stokoe, and Wilcox (1995). While it may seem that
comprehension of overt physical gestures, and comprehension of speech and song, are categorically different aspects of cognition, these
authors argue that, in a fundamental sense, speech is embodied gesture. It just happens to be made for the most part with body parts that we
cannot normally see, and it happens to produce lexicalized sounds. They cite Neisser (1976), who writes that
To speak is to make finely controlled movements in certain parts of your body, with
the result that information about these movements is broadcast to the environment.
For this reason the movements of speech are sometimes called articulatory gestures.
A person who perceives speech, then, is picking up information about a certain class of real, physical, tangible. . . events (156).
Comparing overt gestures and speech, Armstrong, Stokoe, and Wilcox write that. . . the difference is not in the form of production (both are
articulatory movements of the body), but in the form of the signal. Some articulatory movements result in primarily acoustic signals. Others,
including semiotic "gestures" as well as natural signed language, result in primarily optical signals (45).
Think of it this way: when you see someone waving their hand, how do you know that they are waving their hand? If you are sighted, the
evidence of this motor action comes to you in the form of light waves; the light waves are evidence of specific motor activity. If this same
person claps their hands, the evidence of this motor action comes to you in the form of both light waves and sound waves; both the light
waves and the sound waves are evidence of specific motor activity. Similarly, when a person speaks, there is both visual and auditory
information, although most of our conscious attention may be given to the auditory portion of the information. Although most of the sounds
that we hear in listening to speech and song may be words, in an important sense every word is acoustic evidence of specific kinds of motor
activity. Because speech, song, and acoustic musical instrument sounds are all produced via motor activity, we can see that speech imagery
and musical imagery are actually special cases of motor imagery in general. The evidence from mirror neuron studies then strengthens the
intuition that each of these three kinds of motor imagery involve covert mimetic participation.
6) Instrumental Voices
One question that is integral to the mimetic hypothesis and it implications for musical meaning is whether and to what extent subvocal
mimetic participation is relevant to the perception and cognition of instrumental music. There is some evidence for this in terms of
subvocalization in musical imagery (see section 4 above), but there is another kind of evidence in terms of how we describe vocal and
instrumental sounds.
For most of us, the voice is our primary medium for making sounds which we mean to communicate to others. When others speak or sing, we
understand these sounds partly in terms of our own experience of making the same or similar sounds. When others make sounds on musical
instruments, evidence cited above suggests that we understand these human-made sounds not only in terms of covert imitation of the gestures
of the performer(s), but also in terms of our own vocal experience, by way of subvocalization. Why should this be? Imagine that we
automatically attempt to understand human behavior in terms of our own behavior, as far as experience allows; this is in effect what the
mimetic hypothesis holds. The imagination looks for any basis for comparison: one is the experience of making the same or similar gestures;
another is the experience of making sounds that are in some way(s) acoustically similar to those heard. In this sense, anyone with vocal
experience has a basis for understanding most instrumental sounds, without having to have ever played any of the various kinds of
instruments, to the extent that vocal sounds are acoustically similar to instrumental sounds. This seems plausible, and if it is the case, then
there should be evidence in how we describe instrumental sounds. That is, if the voice is the basis whereby we understand musical sounds
generally, whether vocal or instrumental, then we ought to find that instrumental sounds are regularly described in vocal terms. In addition, we
ought to find that the opposite case-of describing vocal sounds in terms of instrumental sounds-is relatively rare. This does indeed appear to be
the case.
Consider how commonly we map human vocal qualities onto instrumental sounds. Figure 2 shows a set of mappings from the domain of vocal
sounds onto the domain of instrumental sounds. For example, we commonly describe instrumental melodies metaphorically in terms of
cantabile. The cross-domain mappings demonstrated by these examples are so fundamental to how we understand instrumental sounds that we
may not even be aware that we are speaking metaphorically when we use such terms to describe instrumental sounds. These mappings are
captured by the conceptual metaphors MUSICAL SOUNDS ARE VOCAL SOUNDS and INSTRUMENTAL SOUNDS ARE VOCAL SOUNDS. Note that
this does not say that all musical sounds are vocal sounds, or that they all musical sounds are necessarily understood as vocal sounds. The

conceptual metaphors simply identifies the fact that one of the fundamental ways in which we regularly understand instrumental sounds is,
rather than in terms of instrumental sounds, in terms of vocal sounds.
Figure 2: Instrumental "Voices": Mappings of the Human Voice onto Instrumental Sounds
Source Domain: Human Voice Target Domain: Instrumental Sounds
singing voice 'cantabile'

whispering voice 'sotto voce'
medium voice 'mezza voce'
screaming jazz trumpets
screaming rock guitars

speaking notes sounding clearly
testifying jazz improvisation
(muffled voice) ('muted' strings, brass)
cantabile 'singable', 'songlike'

cantilena 'songlike'
recitativo 'recitation'
voice single polyphonic part
voicing piano hammers, organ pipes
choir sets of string-'voices'
Notice that the mappings are one-directional, from the source domain of the human voice to the target domain of instrumental sounds.
Although singers may sometimes speak of their "instrument," this reverse mapping of instrumental sounds onto the voice is far less common.
Now if it were simply a matter of vocal and instrumental sounds being alike, then the mappings might go in either direction. But in addition to
whatever acoustic similarities there may be, the voice provides most of us with an experiential basis for understanding the majority of
instrumental sounds. The unidirectionality of the mappings, from this generally applicable vocal experience onto the more specific cases of
instrumental sounds, are consistent with the mimetic hypothesis. The mappings of the conceptual metaphors MUSICAL SOUNDS ARE VOCAL
SOUNDS and INSTRUMENTAL SOUNDS ARE VOCAL SOUNDS indicate that part of how we understand musical sounds generally is in terms of
our own vocal experience, while the mimetic hypothesis holds that the process whereby we perform these cross-domain mappings involves,
and is perhaps motivated by, mimetic participation on the part of listeners.
Implications
If the mimetic hypothesis is correct, it holds fundamental implications for various aspects of musical meaning. At a broad, philosophical level,
it helps show explicitly what role embodied experience plays in the imagination of what might otherwise seem like autonomous, objective
musical properties, such as musical verticality. We normally treat the concept of musical verticality as if it were literal-"music-literal," to use
Guck's (1991) term-while perhaps acknowledging that on some level it is metaphoric. One problem with this is the implicit suggestion that
musical tones simply have some property, which we understand metaphorically as "high" and "low," and which we perceive in ways that only
incidentally involve embodied experience-a position that might be put, "Yes, we have ears with which we hear, but this does not determine the
property of the tones which we understand in terms of verticality." The mimetic hypothesis and analysis of the metaphor, however, say
something quite different.
Tones are neither "high" nor "low," and they neither "ascend" nor "descend," and yet so much music discourse and meaning depend on the
imagination that they somehow do. As I have explained elsewhere (Cox 1999), we can understand the logic of this metaphor in terms of the
conceptual metaphor GREATER IS HIGHER (Lakoff and Johnson 1980) and the folk theory to increase is to raise, whereby we regularly
understand greater and lesser quantities and magnitudes in terms of vertical relations. In the same way, "higher" notes are produced by and
large via greater quantities and magnitudes of air, effort, and tension. There are exceptions of course, but when we recognize the role of vocal
mimesis, the exceptions become even fewer. If we understand musical sounds in general in terms of our own vocal experience, and that vocal
experience involves greater and lesser amounts of air, effort and tension, then we have a basis for understanding sounds in general
metaphorically in terms of "higher" and "lower." This view of things brings together embodied musical experience, in terms of mimetic
participation, and the embodied metaphoric reasoning that we use everyday, in order to account for the fundamental concept of musical
verticality which heretofore has not been explained beyond the level of identifying it as a metaphor. The further implication of this view is
that any concept and any claim about music based on the concept of verticality-whether in terms of melodic motion or shape-describes not a
property of the music itself, but an interpretation based in part on the imagined (re)production of the sound described, understood via the logic
of the conceptual metaphor GREATER IS HIGHER. Musical properties that might otherwise seem to be located in the music itself are instead
shown to emerge in the imagination of listeners, as we draw on embodied experience and the logic of metaphoric thought. This claim has
further philosophical implications with regard to the autonomy of musical works and related issues, but I will proceed to the more directly

musicological implications suggested at the outset.
Gesture. One may notice that the gestures of performers seem somehow relevant to musical meaning (Mead 1996; Lochhead and Fisher
1997), but theorizing about exactly how this might be is difficult. The motor imagery evidence of the mimetic hypothesis suggests that we
understand these gestures via mimetic participation and that these gestures are relevant as a normal part of music perception and cognition.
The hypothesis suggests that the reading of gestures is neither occasional nor ancillary and it thus provides evidence for gesture studies.
Musical Affect. One of the difficulties in theorizing about musical affect is the very basic question of how it is possible for musical sounds to
elicit any kind of affective response in the first place. The mimetic hypothesis holds that musical experience involves motor imagery,
including the memory and the activation of muscular activity. If this is correct, it means that muscular tension and relaxation, of kinds that
would produce sounds similar to those heard, are integral to music perception and cognition, regardless of whether we are conscious of this
motor imagery. To the extent that emotional states are tied to muscular states, and to associated memories of similar states and their contexts,
we can better account for, for example, the dynamics of tension and release in music: the build up of tension in some music is not simply a
property of the music itself, but is rather a tension that we feel in part as a result of mimetic participation in the creation of such sounds. The
hypothesis suggests that muscular-emotional response to music is not something that occurs occasionally, in certain kinds of music, but that is
instead integral to how we normally perceive and understand music, because we normally imagine (most often unconsciously) what it is like
to make the sounds we are hearing.
Musical Forces. Larson (1993) has shown that we appear to think of some music as if it were subject to the melodic forces of gravity,
magnetism, and inertia-not only in how we describe melodies, but also in how we tend to complete a melodic prompt: we behave musically as
if melody were subject to these imagined forces. Larson suggests that we understand melody metaphorically in terms of our own experience
of moving in and through the world, where we observe and experience gravity, magnetism, and inertia. The mimetic hypothesis suggests a
more basic account. Because melody is understood in part through the motor activity of its production, the basis for the metaphoric reasoning
is in the dynamics of muscular exertion and relaxation in the sound-producing musculature. What needs to be shown is how our experience of
gravity, magnetism, and inertia involves analogous or identical motor activity, perhaps by way of some motor image schemata. Additionally,
it seems clear that the logic whereby we imagine these musical forces also integrates the concept of musical motion (Cox 1998), which we
then understand in terms of the motor activity involved in moving through the world.
Semiosis. As with musical affect, the mimetic hypothesis offers a physiological basis for the construction of signs in music. While some signs
may indeed be arbitrary, mimetic participation motivates, and to some extent constrains, the construction of particular signs, and so the
hypothesis offers a way of grounding semiotics in basic embodied experience.
Music and Gender. If a culture defines certain kinds of motor activity as feminine and others as masculine (Young 1990), then listeners will be
motivated to understand replications of the same, similar, or analogous motor activity in music in terms of these same gender categories. (This
is of course apart from any consideration of whether and how a culture ought to gender embodied experience.) Mimetic participation provides
the basis and initial motivation for understanding music in terms of gender, and culture provides the categories for interpreting the motor
information. In an important sense, hearing music in terms of gender is no more and no less metaphoric than hearing music in terms of
verticality. The mimetic hypothesis helps explain how, by showing that we regularly and automatically understand music partly in terms of
our own embodied experience.
Music and Drama. When McClary (1991) describes the recapitulation in the first movement of Beethoven's 9th Symphony as "horrifyingly
violent," one might question the basis for making such a description, despite the arguments she brings to bear. But we already know that
descriptions of musical motion and space are metaphoric, so that a description of this same event as, for example, a "return" is in one sense
just as metaphoric as a description in terms of violence. There is good reason to describe music in terms of motion and space, and the mimetic
hypothesis helps explain why. By the same token, the hypothesis strengthens the reasons for describing music in terms of violence and other
kinds of drama. Mimetic participation is part of the motivation in each case, and in the case of musical drama, the hypothesis shows that part
of how we understand music is in terms of muscular tension and relaxation, and to the extent that our experience of drama involves the same
or analogous dynamics of muscular states, there is motivation and reason to hear music as drama.
Aural Skills Pedagogy. According to the mimetic hypothesis, we understand melody partly via subvocalization, suggesting that musical
imagery for melody is very much, if not primarily, grounded in vocal motor imagery. This in turn suggests, not surprisingly, that the
development of aural skills, and aural skills remediation, benefit from the experience of singing aloud, with a full voice, which provides a
basis for aural imagery for melody. The hypothesis suggests that the development of aural skills-the ability to recognize and imagine musical
phenomena-is fundamentally a matter of learning to imitate, which in turn suggests that aural skills pedagogy benefits from activities that
involve regular full-voice singing and regular production of otherwise rhythmic and harmonic patterns. It does not suggest that simple
attentive listening practice is of any less value, but it does suggest that practice in producing the sounds to be recognized and understood may
be more important than they might otherwise seem.
Music and Society, Music and Dance, and Music Therapy. From Plato's warnings, to the current warning labels on recordings, there has long
been concern over music's effect on the minds and bodies of those who listen to it. On the mimetic hypothesis we can see that music invites us
to participate-both in our imagination, which is automatically informed by embodied experience via tacit mimetic participation, and overtly in
the form of such things as toe-tapping, swaying, dancing, and singing along. The hypothesis suggests that music listening cannot be a neutral
activity, and that music is necessarily understood at least partly in terms of embodied experience and the experience of being in the world. It
adds evidence to the intuition that music invites us to participate-to engage with it and its performers physically, both covertly and overtly. To
whatever extent and in which ever ways we respond to the invitation to participate is part of how we define ourselves and society, as is the
question of whether we even recognize the extent of the power that music listening exerts in inviting us to participate..
In trying to account for musical meaning, it can be appealing to look to the music itself, as if musical meaning were something to be
discovered in musical sounds. Although it may be recognized that perception, cognition, and interpretation on the part of listeners is crucial to
certain aspects of meaning, it seems that there remains a belief that some aspects of meaning-the most objective and thus the most
valuable-are properties of the music itself and only incidentally involve human embodiment for their comprehension. Musical verticality is

one of these properties, and the mimetic hypothesis allows us to pose and solve a fundamental problem of music epistemology that has
heretofore gone unsolved. The problem is that much of music is explained in terms of concepts based directly or indirectly on the metaphor of
musical verticality. Why do we have this metaphor (those of us who do in fact have and use this metaphor)? A strong account of the
motivations behind it and the logic of the metaphoric reasoning has not previously been offered, but the mimetic hypothesis gives us a way of
accounting for the metaphor and in so doing shows embodied experience, perception, and cognition to be fundamental to verticality and its
related concepts. It shows how verticality is not a property of tones but is rather a product of embodied, metaphoric reasoning. This is
important for philosophical interests, but it is also important for the relationship between studies of perception and cognition on the one hand,
and text-based musicological studies on the other. It shows how studies of perception and cognition are relevant to musical meaning, not only
for some aspects of specialized interest, but for aspects at the very foundations of musical knowledge. It is my hope that this hypothesis will
speak to people's intuitions and that further investigations will show more clearly the relevance of covert mimetic participation in various
aspects of musical meaning.
Works Cited
Armstrong, D. F., Stokoe, W. C., and Wilcox, S. E. (1995). Gesture and the Nature of Language. Cambridge: Cambridge University Press.
Baddeley, A. (1986). Working Memory. Oxford, Clarendon Press.
Baddeley, A., and Logie, R. (1992). Auditory imagery and working memory. In D. Reisberg (Ed.). Auditory Imagery. Hillsdale, NJ, Lawrence
Erlbaum.
Carpenter, Patricia. 1967. The musical object. Current Musicology, 5, 56-87.
Clifton, T. (1983). Music As Heard. New Haven, Yale University Press.
Cox, A. (1998). As time goes by: the past, present, and future of musical motion. Society for Music Theory, Chapel Hill.
Cox, A. (1999). Verticality, conceptual blending, and the mimetic hypothesis. Society for Music Theory, Atlanta.
Crowder, R. (1989). Imagery for musical timbre. Journal of Experimental Psychology: Human Perception and Performance, 15, 472-78.
Crowder, R. G., and Pitt, M. A. (1992). Research on memory/imagery for musical timbre. In D. Reisberg (Ed.). Auditory Imagery. Hillsdale,
NJ, Lawrence Erlbaum. pp. 29-44.
Cumming, N. (1997). The subjectivities of 'Erbarme Dich'. Music Analysis, 16/1, 5-44.
Fadiga, L.; Buccino, G.; Craighero, L.; Fogassi, L.; Gallese, V.; Pavesi, G. (1998). Corticospinal excitability is specifically modulated by
motor imagery: a magnetic stimulation study. Neuropsychologia, 37/2, 147-158.
Fadiga, L., and Gallese, V. (1997). Action representation and language in the brain. Theoretical Linguistics, 23/3, 267-280.
Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti G. (1996). Action recognition in the premotor cortex. Brain, 119, 593-609.
Gallese, V., and Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends In Cognitive Sciences, 2/12,
493-501.
Gallese, V., and Goldman, A. (1998). Mirror Neurons and the Simulation Theory of Mind-Reading. Trends in Cognitive Sciences 2/12,
493-501.
Gathercole, S. E., and Baddeley, A. D. (1993). Working Memory and Language. Hillsdale, NJ, Lawrence Erlbaum.
Gibson, E. J., and Levin, H. (1975). The Psychology of Reading. Cambridge, MA, The MIT Press.
Grafton, S. T., Fadiga, L., Arbib, M. A., and Rizzolatti, G. (1997). Premotor cortex activation during observation and naming of familiar tools.
Neuroimage, 6/4, 231-236.
Guck, M. (1991). Two types of metaphoric transfer. In J. C. Kassler (Ed.). Metaphor: A Musical Dimension. Sydney, Currency Press.
Lakoff, G., and Johnson, M. (1980). Metaphors We Live By. Chicago, University of Chicago Press.
Larson, S. (1993). Modeling melodic expectation: using three "musical forces" to predict melodic continuations. Proceedings of the Fifteenth
Annual Conference of the Cognitive Science Society, 629-34.
Lidov, D. (1987). Mind and body in music. Semiotica, 66/1, 66-97.
Lochhead, J., and Fisher, G. (1997). Performance and gesture: on the projection and apprehension of musical meaning. Society for Music
Theory Annual Meeting, Phoenix.
Logie, R., and Edworthy, J. (1986). Shared mechanisms in the processing of verbal and musical material. In D. G. Russell, D. Marks, and J.
Richardson (Eds.). Imagery 2. Dunedin, New Zealand, Human Performance Associates. pp. 33-37.
McClary, S. (1991) Feminine Endings: Music, Gender, and Sexuality. Minneapolis, University of Minnesota Press.
Mead, A. (1996). Bodily hearing. Annual Meeting of the Society for Music Theory, Baton Rouge.
Neisser, U. (1976). Cognition and Reality: Principles and Implications of Cognitive Psychology. New York, Freeman.

Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain
Research, 3/2, 131-141.
Smith, J. D., Reisberg, D., and Wilson, M. (1992). Subvocalization and Auditory Imagery: Interactions Between the Inner Ear and Inner
Voice. In Reisberg, D. (Ed.). Auditory Imagery. Hillsdale, NJ, Lawrence Erlbaum.
Smith, J. D., Wilson, M., and Reisberg, D. (1995). The role of subvocalization in auditory imagery. Neurospychologia, 33/11, 1433-54.
Spencer, H. (1951 [1857]). The origin and function of music. In Spencer, H. Literary Style and Music. New York, Philosophical Library. pp.
45-106.
Stern, Daniel N. (1985). The Interpersonal World of the Infant: A View from Psychoanalysis and Developmental Psychology. New York,
Basic Books.
Vaneechoutte, M., and Skoyles, J. R. (1998). The memetic origin of language: modern humans as musical primates. Journal of
Memetics-Evolutionary Models of Information Transmission, 2. URL:
http://www.cpm.mmu.ac.uk/jom-emit/1998/vol2/vaneechoutte_m&skoyles_jr.html
Walton, K. (1993). Understanding Humor and Understanding Music. In Krausz, M. (Ed.). The Interpretation of Music: Philosophical Essays.
Oxford, Clarendon Press. pp. 259-70.
Walton, K. (1997). Listening with imagination: is music representational? In Robinson, J. (Ed.). Music and Meaning. Ithaca, Cornell
University Press. pp. 57-82.
Young, I. (1990). Throwing Like a Girl and Other Essays in Feminist Philosophy and Social Theory. Bloomington, Indiana University Press.
Back to index

THE RELATION OF BODY MOVEMENT AND VOICE PRODUCTION IN EARLY CHILDHOOD MUSIC LEARNING
Proceedings paper
The Relation of body movement and voice production in early childhood music learning
Wilfried Gruhn, Musikhochschule Freiburg, Germany
Background
Observational and experimental research on early music learning (Gordon 1990; Wilson & Roehmann 1990; Deliège &
Sloboda 1996; Gruhn 1999) as well as brain studies on music learning (Altenmüller & Gruhn 1997; 1999; Liebert &
Gruhn 1999) have demonstrated that the brain holds a powerful potential by its high degree of plasticity which is
fundamental to learning. Although all primary sensory cortices are genetically predetermined, the extension and neuronal
connectivity within particular brain areas vary over time according to experience and practice. For learning humans
profit from the plastic power of the brain which enables it to modulate different functions depending on new demands.
The neurobiological potential of the brain makes the "competent infant" (Dornes 1993) whose competence is grounded in
the ability to develop dynamic neural networks within a given genetic program and to form mental representations for all
kinds of experiences and incoming information.
Since cognitive psychology has described different types of mental representation as "figural" and "formal" (Bamberger
1991) depending upon what aspects the perceptive mind focuses on,
and since EEG studies have identified different activation patterns which can be referred to those types, it is likely to
assume that the very nature of infant learning centres in the genetically determined and environmentally stimulated
growth of neuronal connections which form genuine musical representations as their neuronal correlates. In a long-term
observational study over more than five years (Gruhn 1999; 2000) we exposed groups of young children to various
materials derived from Gordon's learning theory (Gordon 1980, 51997) and applied a language acquisition model to the
informal teaching and learning process. One of the goals of the longitudinal study was to stress on the interaction of
motor skills with musical activities like singing in tune and chanting rhythmically. What educators and musicians like
Jaques-Dalcroze, Laban, Jacoby, and Gordon have assumed intuitively or by observation - that different dimensions of
body movement reflect musical experience and procedural knowledge - should be proved experimentally.
Preparatory pilot studies

Two pilot studies with three groups of older pre-school children (1. n = 18, M 57 months, SD 6.83; 2. n = 27, M 27
months, SD 6.20) were conducted since 1996. Group 1 (n = 18; m: 9, f: 9; age 4 -5) was taught for more than one year.
1997 two more groups continued with younger children (total n = 27; m: 13, f: 14; age: 2 - 3 years). The main goal of
these studies was to explore musical materials appropriate to children's music learning and to develop criteria for data
collecting on children's musical growth. Children younger than three years came with their parents or caregivers, and all
got an informal instruction once a week for half an hour over a one year period. All sessions were videotape recorded,
each child was observed individually, and any developmental change was reported in a protocol. With respect to musical
materials the training effect of major/minor based patterns was compared to the effect caused by the combination with
other modes (dorian, phrygian etc.). Correspondingly, rhythm patterns in duple and triple meter were compared to those
including unusually paired and combined patterns. The broader variety of stimuli produced a slight effect on singing in
tune and on discrimination in pitch and duration compared with enhanced training of limited materials, although the data
were found to be nonsignificant. With respect to data collection a criterion based observation form (CBOF) was
developed containing 45 rating scales referring to five criteria: attention, movement, voice response in tonal context,
voice response in rhythm context (imitation, improvisation, creativity), and audiation (see appendix).
Experimental study
Subjects
file:///g|/Wed/Gruhn.htm (1 of 5) [18/07/2000 00:40:08]

Parents of children from birth up to two years from the municipal area of Freiburg volunteered in the study for 15 months
from October 1998 until December 1999. Children were selected from a larger sample according to the criteria age (1
year +/- 3 months) and gender (balanced distribution of male and female). From a total of 13 children we lost 4 because
of parents' move or other circumstances. Finally, 9 children (M 19.5 months, SD 5.83) completed the study. The social
structure was not representative for the average population; all children grew up in a musically active family, and
exhibited - as reported by the parents - high musical sensitivity (attracted by music, conscious listening, spontaneous
moving along with music). A control group (9 children from a nursery, M 23.2 months, SD 3.15) was matched with the
experimental group as to age, gender and social background.
Procedure
Children with their parents or caregivers participated in an informal instruction once a week for 30 minutes. The entire
teaching period of 15 months was segmented into four 10-week-sections. Materials presented were children songs with
words, tunes without words, chants with words (nursery rhymes), chants without words, tonal patterns, and rhythm
patterns. Songs included all tonalities (major, minor, modes); chants consisted of duple and triple meter and included
unusual combinations either way. Whatever type was presented, singing was always accompanied by body movements.
However, movement did not necessarily reinforce beats, rather any movement focused on continuous flow and weight.
As long as the teacher kept eye contact with a child, (s)he persisted in presenting the same sound pattern to the child.
Additional materials (like scarves, balls, hoops, trampoline) were introduced as far as they were apt to support the
experience of flow and weight
All sessions were videotape recorded; additionally children were observed individually by two independent judges
(interjudge reliability r = .88) using the structured observation form CBOF. Data of any child were averaged for always
4-week periods. For analysis the means of the beginning and ending of each 10-week-section were taken from CBOF. In
all, data of 8 measurements (2 for each of the 4 sections) were related to the means of the control group that did not get
any musical instruction whatsoever. However, teachers spent approximately the same time with them on the playground
and in other classroom activities to gain confidence of the children and to avoid any care effect in the study group.
Additionally, parents reported regularly at the end of each section by a questionnaire.
Results
The behaviour over time exhibits slow changes in some areas as type of movement (explorative, imitative, creative),
quality of movement (flow, coordination, synchronisation, expressiveness), and voice response (rhythm patterns). (Fig.
1) Since the children started at a very early age, one cannot expect too much progress in voice production (singing and
chanting) and little enhancement in movement at the beginning when children begin to stand freely and walk alone.
Therefore, only after introducing new models of movement during the first section we find a significant change at the
end of that section (flow p = .000; coordination p = .020; synchronisation p = .042). Major and most obvious
development of the quality of movement happens only during the second half of informal instruction. In contrast, the
readiness for sound reproduction grows slowly. Children need a long time of perceiving structured patterns and
processing it before they start imitating them beyond mere babble. Therefore, significant improvement of sound
reproduction - if ever observed - is only to be recognised in rhythm patterns.
The development of each particular skill (e.g. coordination, synchronisation, accuracy, consistency, intonation, pitch
discrimination, expressiveness) within any criterion (movement, tonal and rhythm response, imitation, improvisation,
and audiation) does not proceed continuously, but with accelerations and decelerations. (Fig. 2) Moreover, there is a
different progression speed in voice and body movement depending on many biological, social, and environmental
influences. From sociology we know about the U-shaped growth which is also reflected by several developmental curves
of voice and movement achievement in this study. Any child performs his/her own developmental drift with varying
phases of progress and retardation.
Looking for interactions between the various attributes, a Pearson correlation was calculated for each of the 45 criteria.
Here, a significant correlation appears only in body movement and voice production at all measurements throughout the
four sections. In most cases results are even highly significant (r ≥ .80). (Fig. 3) That indicates an important interaction
of the way how children use their body for moving and their ability for matching accurate pitches and keeping a
consistent tempo. The better they can control their body, the more precisely they can also control their vocal apparatus.
With growing experience and practice, the correlation, then, shifts from patterns to songs and tunes at the end of the
long-term observation.

Most interestingly, children of the control group show exactly the same direction in their development, but on a lower
level although they started from exactly the same level as children in the study group. Significant changes in their
musical behaviour over the entire time are nonsignificant except for imitation of movements (p = .009) and coordination
(p = .001).
Figure 4
Compared development of movement of children in the study group (S) with those of the control group (C) at the beginning (measurement 1)
and at the end of each section (measurements 2 - 5).
Because of their lack of exposure to a musically enriched environment which offers many opportunities to explore their
body as well as their voice as a means of expression and communication, the control children could only perform
movements according to their biological maturation. In the experimental group that process appeared significantly
enhanced.
Discussion

Although one must take into consideration that correlational findings do not allow causal statements, it is remarkable that
the empirical data support a strong interaction between movement and voice production what has already been observed
by other music educators. Furthermore, researchers have investigated infants' development of movement and its impact
on early childhood music learning (Metz 1989; Blesedell 1991; Hicks 1993; Reynolds 1995). However, it is not easy to
adequately interpret the data presented here. One might assume that there is a common neuronal basis for connected
mechanisms underlying the performance of movement and vocal sound. But there is no neurological evidence for such
an assumption. Rather it is evident that precise control of efferent neuronal transmission that governs motor coordination
affects both, general motor skills for moving arms and legs as well as fine motor skills needed for controlling the
adjustment of the vocal cords. Movement supposedly facilitates somato-sensori and sensori-motor stimulus transmission
and enhances the primary sensori-motor cortex which inversely affects muscular motor processes in the larynx and
enables it to react physically to sound by matching a perceived pitch with the vocally produced pitch. If that kind of
motor skills is developed properly, correct voice production can function more easily. EEG, MEG and fMRI studies
might demonstrate whether there is a neuronal basis for interaction and how it works. As to now one can conclude that
music training at an early age is best supported by integrating body movement.
The salient interaction of body movement and voice production can also be interpreted in terms of transfer effects. If we
differentiate between internal and external transfer effects (Gruhn 2000), the interaction may be based on an internal
effect that connects different brain functions to a more complex network. However, strong research data which support
this hypothesis are still missing.
More likely, Condon's observation (1975) that infants immediately after birth respond to mother's voice by synchronous
movements parallels our findings. From long-term behavioural observations of newborn he concludes that infants learn
their mother tongue even before the age of actual language acquisition by first imitating the structure of movements
along with their mothers' speech. Rhythmically structured vocal sequences are - as to Condon - basically perceived as
rhythmically structured movement patterns. Investigations on mother infant communications (Malloch 1999/2000;
Trevarthen 1999/2000) support the evidence of rhythmically structured patterns in terms of call and response and of
childrens' exploration of pitch ranges. Those interactions, which are "built from the units of pulse and quality found in
the jointly created gestures of vocalisations and bodily movement" (Malloch 1999/2000, 45), are basic for the
development of skills necessary for communication through sound production.
References
Altenmüller,E. & Gruhn,W. (1997). Music, the brain, and music learning. Chicago: G.I.A. Publ.Inc. (GIML series
vol. 2)
Altenmüller, E., Gruhn, W., Parlitz, D. (1999). Was bewirkt musikalisches Lernen in unserem Gehirn? In
H.G.Bastian (Ed.). Musik begreifen. Mainz: Schott. pp. 120 - 143.
Bamberger, J. (1991). The mind behind the musical ear. Cambridhe: Harvard Univ.Press.
Blesedell, D.S.(1991). A study of the effects of two types of movement instruction on the rhythm achievment and
developmental rhythm aptitude of preschool children. Dissertation Abstracts International, 52 (07), 2452.
Condon, W. (1975). Speech makes babies move. In R.Lewin (Ed.). Child alive. New York: Anchor. pp. 75 - 85.
Deliège, I. & Sloboda, J. (Eds.) (1996). Musical beginnings. Origins and development of musical competence.
Oxford: Oxford Univ.Press.
Dornes, M.(1993). Der kompetente Säugling. Frankfurt: Fischer.
Gordon, E.E. (1980). Learning sequences in music. Chicago: G.I.A.Publ.Inc. ( 5th edition 1997)
Gordon, E.E. (1990). A music learning theory for newborn and young children. Chicago: G.I.A. Publ.Inc.
Gruhn, W. (1999). The development of mental representations in early childhood. In Suk Won Yi (Ed.). Music,
Mind, and Science. Seoul: Seoul Nat.Univ.Press. pp.434 - 451.
Gruhn, W. (2000). Does brain research support the hope for musical transfer effects? SRPMME Conference,
Leicester UK (mscr.)
Hicks, W.K.(1993). An investigation of the initial stages of preparatory audiation. Dissertation Abstracts
International, 54 (04), 1277.

Liebert, G., Gruhn, W. et al. (1999). Kurzzeit-Lerneffekte musikalischer Gehörbildung spiegeln sich in kortikalen
Aktivierungsmustern wieder. Proceedings of Deutsche Gesellschft für Musikpsychologie. Karlsruhe.
Malloch, S.M. (1999/2000). Mothers and infants and communicative musicality. Musicae Scientiae. Special Issue,
29 - 54.
Metz, E. (1989). Movement as a musical response among preschool children. Journal of Research in Music
Education, 37 (1), 48 - 60.
Reynolds,A.M. (1995). An investigation of the movement responses performed by children 18 months to 3 years
of age and their caregivers to rhythm chants inn duple and triple meters. Dissertation Abstracts International, 56
(04), 1283.
Tervarthen, C. (1999/2000). Musicality and the intrinsic motive pulse: evidence from human psychobiology and
infant communication. Musicae Scientiae. Special Issue, 155 - 211.
Wilson, F.R. & Roehmann, F.L. (Eds.) (1990). Music and child development. St.Louis: MMB Music.
Back to index

Thursday
Back
Proceedings
Thursday 10th Morning Session

. S1 S2 S3 S4 S5 S6
Thematic Individual papers Symposium: Thematic Session: Thematic session: Thematic

Session: Session:
Chair: Time in music: Music as context The expert
Music and Pembrook,R. from performer: career Melodic
Development psychoacoustics to Chair: Rauscher,F. and practice issues expectancies and
cognitive representations
Chair: Webster,P. psychology Chair: Clarke,E.
Chair: Stevens,K.
(Third session)
Convenors:
Drake,C.,
Palmer,C.
Chair: Palmer,C.
9.00 Brand, E. Cohen, D. Bigand, E. Cohen, A. Lammers, M. Roh, S.
Development of More on the How much time do Music cognition Student practice The effect of
musical meaning of natural we need to process illuminates our habits in the musical styles
organisation in schemata - their harmonic understanding of United States and and experiences
children's music role in shaping structures the experience of Japan on melodic
notations types of film expectancy
directionality (Abstract)
9.30 Ziv, N. Temperley, D. Tillmann, B. Vitouch, O. Gembris, H Schellenberg, E.
Narrative and Musical meaning Effect of harmonic When your ear sets Muisical and Melodic
musical time: and the line of relatedness on the the stage: musical extra-musical expectancies in 7
children's fifths detection of context effects in factors and the and 8 year olds
perceptions of temporal film perception career of
structural norms (Abstract) asynchronies professional (Abstract)
musicians: the
perspective of
employment
agencies
(Abstract)
10.00 Leopold, P. Oblad, C. Drake, C. Kendall, R. Galvao, A. Okumiya, Y.
Music-language On using music: Time in music: Perception of Practice in Expectance

interactions in about the car as a four interactive musical meaning orchestral life effects in
musical savants concert hall museum exhibits in film, video, and memory for the
dance individual tones
(Abstract) of a tonal melody
(Abstract)
file:///g|/Thurs/thursday.htm (1 of 2) [18/07/2000 00:40:10]

Thursday
10.30 . Schellenberg, E. Goldstein, M. Richardson, J. Moore, A. Matzkin, D.
The distance and Musical timing Music, mood and Preparing Similarity
ratio model of data as memory professional perception of
pitch perception performance performers? Music variations of
gesture students' tonal and
(Abstract) perceptions and twelve-tone
experiences of melodies
their orchestral
training at
Birmingham
Conservatiore
11.30 Invited Speaker: Palmer, C.

Music performance: When novices outshine experts
Chair: Yi, S. W.
file:///g|/Thurs/thursday.htm (2 of 2) [18/07/2000 00:40:10]

From: Head of Program Eva BRAND
DEVELOPMENT OF MUSICAL ORGANIZATION IN CHILDREN'S MUSIC NOTATIONS
Eva Brand
evabrand@netvision.net.il
Background:
On close examination children's notations provide a wealth of information about

the way they hear music. This paper presents notations of a ZULU song that
children learned independently from a taped recording.In a previous stage of
this study, children's musical organization in learning the song was identified
from the way they sang and played it. This included making boundaries between
parts, identifying repetition and change, stability and instability, musical
motion and conflicts between musical elements.
Aims:
The aim of this study was to identify and describe the development of musical
organization found in children's notations of the song.
method:
A total of 36 children paricipated in the study, coming from three age groups :
6, 9 and 12 year-olds. The children were asked to learn unfamiliar ZULU song
through independent use of a taped recording of a song, a tape, xylophone, drum
and paper and colored pens. Making a visual representation of the song was one
of the learning strategies they used.
Results:
Notations emphasized structure, musical motion and surprise. Little evidence

was found of the conflicts between musical elements that had been particularly
evident in the previous study. A developmental trend was found, with
6-year-olds making undifferentiated representations (such as a woman singing),
9-year-olds showing early manifestations of standart music notation and
structure, and 12-year-olds combining use of pitch and rhythm notation, as well
as more complex invented notational systems. Particularly fascinating were
individual examples that conformed to no pattern, yet showed a response to
contrast, the surprise of a sudden leap, sound and spacen stability as the
deciding factor in creating boundaries and associative memorizing of mords.
Conclusions:
Musical notations contain " more than meets the eyes ".
Back to index
file:///g|/Thurs/Brand.htm [18/07/2000 00:40:11]

More on the Meaning of Natural Schemata:
Proceedings paper

Their Role in Shaping Types of Directionality
Dalia Cohen and Hanna Mondry
Department of Musicology, Hebrew University of Jerusalem
A. INTRODUCTION
B. NATURAL SCHEMATA
C. EXAMPLES: PRINCIPLES OF NATURAL SCHEMATA IN PIECES
WITH DIFFERENT DIRECTIONALITY
C.1 Example of Rules of Unexcited Expression
C.2 Convex/Concave Curves
C.3 The Principle of Evenness
C.4 Deviation from the Norm in an Inverted U Function
C.5 Concurrence/Nonconcurrence
C.6 Example from the Twentieth Century (with a figure)
D. THE EXPERIMENT
D.1 Method
D.2 A Selection of Significant Findings
D.3 Discussion of the Results
CONCLUSION
REFERENCES

Their Role in Shaping Types of Directionality
Dalia Cohen and Hanna Mondry
Department of Musicology, Hebrew University of Jerusalem
A. INTRODUCTION
The topic of this paper links certain aspects of three partly overlapping concepts: meaning in music, natural schemata, and types of
directionality. Although much has been written about them, they require clarification. We assume here that meaning in music refers
to the types of experiences evoked by the music. We will continue to classify the types from three overlapping overall standpoints
that may contribute to characterizing and distinguishing between cultures, periods, or composers, since they represent different
messages or ideals. (1) the existence or absence of a direct link with the extramusical world (as expressed in "functional,"
"programmatic," and other music, as opposed to "absolute" music); (2) association with one of two poles: "ethos" or "pathos" (to use
the terminology of Curt Sachs [1946]; these poles can also be termed "tranquility/excitement," "clarity/blurring," or
"Classical/Romantic"); (3) types of directionality and complexity.
"Directionality," in the most general terms, can be said to represent the sense of certainty as to the continuation of the musical
progression on various levels of musical organization. It is related to expectations that may arise from learned and natural schemata
and how they are realized. The concept of directionality has been referred to in theoretical discussions by many terms, including
"progression," "process," "processive forms," "flow," "goal," "direction," and "approaching." (For a summary of studies on
expectancy in music from a historical perspective, beginning in 1903, see Carlsen 1990.) Different styles have different forms of
directionality: momentary, overall, clear, suspensive (or blurred), and various combinations thereof, and these forms of directionality
file:///g|/Thurs/Cohend.htm (1 of 10) [18/07/2000 00:40:14]

combine in meaningful ways with momentary or overall complexity (all the combinations are limited by various constraints, such as
Von Forster's Law [Shanon and Atlan 1990], which has to do with the effects of organization on one level on a higher level).
And what contributes to the different perceptions of types of directionality and complexity? Without a doubt, learned schemata play
an important part in shaping the types of directionality. Here, however, we would like to focus on the contribution of the natural
schemata.
The present paper is, in a sense, a continuation of our research into learned and natural schemata and types of directionality as
characteristics of styles with various ideals (Cohen 1994; Cohen and Granot 1995; Cohen and Dubnov 1997; Cohen and Mondry
1997; Cohen and Michelson 1999; Mondry 1999) and of other studies in various fields. Here we attempt to formulate and sum up at
least some of the features of natural schemata, then use principles of natural schemata to illustrate regularity in pieces whose ideal is
known, and finally present selected findings of an experiment that examined responses to some features of natural schemata.
B. NATURAL SCHEMATA
Natural schemata that are related to universal phenomena can be regarded as complementary contrasts of learned schemata. They are
familiar from outside music, too, and are not expressed in precise terms. (The concept of the schema, a term that was coined by
Bartlett [1932] and is in fairly common use today, is still referred to by somewhat equivalent terms such as archetype, prototype,
alphabet, model, and "structured system of knowledge.")
Despite their tremendous contribution to all types of experiences, natural schemata have been largely neglected in discussions and
analyses of music, especially Western tonal music. Western musical theory focuses almost exclusively on the learned schemata and
not on their realization through natural schemata. In non-Western cultures, some natural schemata are integral parts of musical
theory. Another reason why the natural schemata are neglected or ignored is that they are not defined in quantitative terms.
Awareness of them increased in the twentieth century, as they became more important in the organization of pieces of music at the
expense of the learned schemata and as research into music and cognition and various universal phenomena developed substantially.
Many analyses of twentieth-century music are based on natural schemata, as manifested in the texture in its broad sense, both by
theoreticians (Lansky 1974; Goldstein 1974; Scolnic 1993) and by the composers themselves (Stockhausen 1957, Varèse 1967;
Boulez 1971; Ligeti 1993). In contrast, few analyses of Western tonal music have been done based on natural schemata (Lorince
1966; Rothgeb 1977; Ratner 1980; Levy 1982), although there have been many studies on various aspects of natural schemata,
especially those related to expectations and realization of them (Meyer 1975; Narmour 1991; Krumhansel 1997; Yeger-Granot 1996;
to mention just a few) and curves of pitch (see below). Here we would like to discuss an additional aspect of natural schemata and
examine their selection in light of the stylistic ideal.
Due to the large number of parameters and combinations thereof, there is a huge variety of manifestations of natural schemata. In
order to understand their contribution to shaping perceptions of types of directionality, we will attempt to define cognitive principles,
and in light of them, we will examine the meaning of the natural schemata.
We can see the manifestations of natural schemata in four main realms:
1. The absolute and relative range of occurrence of various parameters, with treatment of the normative range of expectations
following an inverted U function (on the U function with respect to musical phenomena, see Hargreaves 1986). Any deviation
in either direction (greater or less) runs counter to natural expectations, spoils the clear directionality, and elicits tension
(therefore music that is very low, very slow, or without change also produces tension). Extensive deviation introduces another
factor-infrequency/frequency-that affects the perception of expectations.
The relative range is represented by the ambitus-medium, large, or small-which may appear in any of the absolute ranges for
each parameter.
2. Curves of change over time for each parameter. Most research on curves has focused on the parameter of pitch and involved
various topics: musical perception and memory (e.g., Dowling 1978; Edworthy 1985; Andrews and Dowling 1991); the
contribution to musical structure (Eitan 1997), especially to the characterization of folk songs (Nettl 1977; Huron 1996); the
curves as gestures that contribute to emotional expression, as a component of expectations that arise in melodic progressions,
and as an important factor in determining the parameters that accompany it (changes in intensity and duration) during
performance (Repp 1998; Sundberg et al. 1991). In general, one can speak of six meaningful basic curves: ascending,
descending; convex, concave; flat; and zigzag. These curves may appear on various levels and in various combinations. The
convex curve (also called "arch") represents a natural model of predictability as to the continuation of the musical progression.
It may be regarded as defining a closed unit in which tension (or excitement) results in relaxation, where tension and
relaxation may be realized as contrasting pairs in various parameters: ascent-descent; crescendo-diminuendo;
accelerando-ritardando; sudden change (e.g., a skip)-gradual change (scale steps).
Both random zigzag and horizontal (no change) curves portray lack of organization (based on difference and similarity [see
Tversky 1977]). They therefore hinder clear directionality and cause stress, due to expectancy of some sort of repetition in the
case of zigzag and expectancy of change in the horizontal case. One can also say that in the case of continuous exact repetition
(A A A ...) "before" and "after" are meaningless, and the units are interchangeable. (To be more precise, the units are never
identical in the sense that each appears after a different number of A's).

It is important to distinguish between kinds of repetition. For example, a single repetition (with varying degrees of precision),
which underlies the "Classical period" (and 2n organization), enhances directionality; multiple repetitions distinguish at least
between various levels, such as repeated background events (e.g., meter) and repeated frontal events .The first helps to
reinforce a sense of directionality in the changing frontal events, whereas the second evokes excitement due to expectation of
change (up to a certain threshold at which one despairs of change, when repetition turns the frontal events into background).
The type of repetition also depends on the length of the repeated unit and events on various levels.
3. The degree of definability of all the above-elements, events, curves, units, etc.-depends on psychoacoustic and cognitive
constraints, and, of course, it affects the perception of clear directionality. As for the definability of elements such as the notes
of the scale, intervals, chords, and rhythms, a precondition is the existence of clear categories for our perceptions (Burns et al.
1978; Rakowski 1990; Kefe et al. 1991). The categories, for their part, depend on the quantitative aspect and the existence of a
hierarchy. For example, An examination of hierarchy and the conditions for coherence in connection with the interval system
showed a preference, in terms of possible forms of organization, for the specific division of the octave into twelve and seven,
as in the West, over other hypothetical divisions (Balzano 1980; Burns 1981; Agmon 1989). Thus we can regard some learned
schemata as natural schemata, since they are the outcome of cognitive activity.
As for the definability of events or units based on combinations of parameters, this depends on the degree of concurrence or
nonconcurrence of the parameters (Cohen and Wagner 2000). Full concurrence, for example, is obtained when a note that
appears on a stressed beat in the measure serves as a peak for its neighbors in pitch, duration, and intensity. In such a case
there is no doubt about its salience, which corresponds to its location on a stressed beat. In contrast, in all states of
nonconcurrence (seven possibilities in all), the degree of definability regarding its salience is smaller.
Concurrence/nonconcurrence may be seen between different learned schemata, between natural schemata, and between
learned and natural schemata. Nonconcurrence heightens complexity, uncertainty, and excitement.
4. Categories of operations that denote the principle of change that occurs in a transformation. Transformation, by its very
nature, includes the two conditions required by every form of organization: difference and similarity; and repetition. It can be
characterized by the parameters in which it occurs, the level (immediate or overall), the degree of change, and most
importantly, the operation. We can group the operations in five categories-contrast, shift, reduction/expansion,
segregation/grouping, and equivalence-which represent categories of cognitive operations; consequently, we can regard them
as natural schemata (Apel 1993).
C. EXAMPLES: PRINCIPLES OF NATURAL SCHEMATA IN PIECES WITH

DIFFERENT DIRECTIONALITY
C.1 Example of Rules of Unexcited Expression
One salient example of music with an explicitly stated stylistic ideal is the religious vocal music of Palestrina. Its ideal, as suggested
by Palestrina-in a letter to the Duke of Mantua (see Reese 1959)-is tranquility, which exalts the text. This is manifested in the
correspondence between the unit of the verbal phrase and the musical phrase, with meticulous preservation of concurrence between
the syllables and the notes. The main unit of directionality is the phrase (the directionality on higher levels is simpler). Most of the
rules of Palestrina counterpoint that pertain to this music can be formulated in terms of natural schemata (Cohen 1971, 1983): The
convex curve prevails for pitch and duration, both on the most immediate level (the large interval precedes the small one in ascent,
and the small precedes the large in descent) and on the level of the phrase. Adherence to the optimal range in the inverted U
function-neither too much nor too little-is salient in various parameters: the ambitus of every voice, the quantity of dissonants, the
number of rapid or slow notes, the number of skips, and so on. In particular, there are limits on sudden, salient changes that may be
manifested in various ways: a large skip; a salient transition from slow to rapid and vice-versa (strict rules preclude this); a transition
from consonance to dissonance and vice-versa without a skip, and when stressed, in species IV, the transition is only to imperfect
consonance (because a transition to perfect consonance means too great a change) by a descending second (maximum natural
relaxation); very rapid notes-but no more than two-on an unstressed beat with no skip; rapid notes (the third species) in the same
direction, which will not contain a skip larger than a third and will not begin on a stressed beat without a change of direction; and so
on. On the other hand, we see an avoidance of lack of change, i.e., evenness, as manifested in repetition of a single note, a melodic
interval, perfect harmonic intervals, sequence, and more.
There are also limits on nonconcurrence: In rapid notes, a peak in pitch will come on a stressed beat; there are no two consecutive
descending skips (because that would emphasize the unstressed note); and syncope, which by its very nature represents
nonconcurrence, is extremely limited (it serves as a soothing factor in the transition from rapid to slow).
As we know, some of these rules are preserved to varying degrees in all Western tonal music, which is marked, in general, by
directionality on the various levels. Interestingly, some theoreticians who focused on later styles of tonal music (e.g., Salzer 1952)
emphasize what Palestrina's rules have in common with the other styles and neglect the rules that produce differences between
styles. It seems to us that the rules of Palestrina counterpoint may serve to a large extent as rules of reference, where distance from
them indicates distance from the ideal of tranquility. Deliberate violation of these principles attests to ideals that call for unclear
directionality, which leads to experiences of uncertainty, tension, and strong emotions of various sorts. Below are a few examples of
some of the principles.

C.2 Convex/Concave Curves
Convex curves, which are highly typical of Palestrina's music, prevail in most folk songs that are not meant to be exciting (Nettl
1977). We also find them in superstructures such as A B A, as it appears in the Classical sonata. In this case, B represents the
development section-the least directional and the tensest part-and the convex curve represents the change in the clarity of the
directionality, to give us overall directionality with complexity.
The convex curve is latent in various units in tonal music, where it appears together with the learned schemata and reinforces the
clarity of the directionality. For example (Cohen and Dubnov 1997), in Mozart's Piano Sonata in C Major (K. 545), the
twelve-measure opening unit, until the appearance of the second subject (in G major), is highly directional from many standpoints:
the salient features are the harmonic schema that includes I, II, V and the ending on I of each subunit. Here we stress that
directionality is also reinforced by natural schemata: the schema of 2n, which indicates one repetition and appears here in the
structure (1+1+2+4)+4 regarding the texture, the harmony, expansion of a melodic line, etc.; and the concurrence schema,
manifested here in the fact that almost all the subunits begin on the first beat of the measure, with a note that is never shorter than its
neighbors. On a higher level, the entire piece is embraced by a convex curve regarding the density and the ambitus in every two
consecutive measures in the right hand.
The same is partly true for the instrumental section that opens Bach's aria in B minor in section 58 of the St. Matthew Passion. On
the immediate level, the section by Bach is more complex and contains many examples of nonconcurrence. Overall, however, there
is a convex curve not only for density and for the ambitus in the upper voice, but also for the ambitus with the bass voice and for the
curve of pitch. Interestingly, the convex curve is also seen in twentieth-century dodecaphonic music (as we explain below), where it
plays a very important role in shaping the directionality and coherence of musical pieces.
And where do we find concave curves? On the most immediate level, we find them in nineteenth-century music. We also find
concave curves in the Rig Veda hymns (Cohen 1986), which are rife with tension, since their function is to maintain the harmony in
the world, through meticulous performance that is practiced for many years. Their entire organization is reduced to the unit of the
phrase, whose micro subunits are nonconcurrent, thereby producing great momentary complexity.
On the overall level, in the A B A structure of one-movement pieces from the nineteenth century, the B is sometimes the orderly,
directional part, while the A is less directional and full of tension, thus making the overall structure concave.
C.3 The Principle of Evenness (forbidden in Palestrina counterpoint)
Evenness (equality) in various forms was very common in the Baroque period, when the stylistic ideal changed drastically; it is also
prevalent in nineteenth- and twentieth-century music and in many non-Western cultures. We note here a few manifestations of
evenness, some of which are inconspicuous or uncommon, but which are significant because they are so extreme.
1. Extensive repetition of a single note: This is common in exciting recitatives and is salient in some works of Monteverdi,
who explicitly stated his desire to break away from the tranquility of the Renaissance and to express excitement (in his letters
[Stevens 1980] and in the introductions to his music books [1607, 1638]). The repetitions are especially noticeable in his
programmatic piece "Il combatimento del Tancredi e Clorinda" (1623). In that piece, measures 149-156 contain extensive
repetition of each of the notes of the G major chord, and measures 167-175 are marked by numerous repetitions and an
increase in density to the point that the piece was met with ridicule and reluctance to perform it.
2. Evenness as an absence of clear hierarchy regarding the harmonic degrees. This is expressed in Bach's few theoretical
writings (David and Nendel 1945), by means of schemata of harmonic chains made up of small links repeated in a sequence
and represented by a General Bass progression in an octave framework. These chains allow in all the harmonic degrees. In
contrast, in most of Mozart's works there is a strong hierarchy governing the probability of appearance, and the prototypical
degrees are especially salient. Moreover, in contrast to Mozart, Bach uses septachords or secondary degrees for the large
number of degrees.
3. An interesting and extreme manifestation of evenness, albeit rare, is the appearance of real operations (which are standard
in dodecaphonic music) in Bach's works (the duet BWV 803), both for the operation of contrast and for the operation of shift
(in precise transpositions), to the point that there is a sense of bitonality.
4. Evenness of intervals (overall manifestation in the "raw material": in the chromatic scale, the scale of whole tones, the
augmented chord, the diminished septachord): This is prominent in the beginning of Bach's Saraband in C Minor for Cello
Solo, where the intervals between the first thirteen notes include two tritones, with the rest being almost exclusively major
thirds. The result is a sense of points scattered in space without any direction. Of course, these notes are part of an opening
formula that is very common in Bach's music: I IV VII I. Equal intervals-major thirds-are also found in the opening of
Brahms's Piano Rhapsody in B Minor. The notes of the augmented chord (which can be also understood as part of a harmonic
formula) are interpreted as successive tonal centers throughout most of the first part of the rhapsody.
5. Extensive repetition (31 times) of a pair of chords (I and VI, which are, in a sense, a switch between major and minor), in a
fixed rhythm, with approximately the same notes, in the opening of Chopin's Funeral March. These repetitions seem to freeze
time, and, of course, they prevent any directionality.
6. Very extensive precise repetition (A A A ...) of polyrhythmic units without clear boundaries is very common in Africa

(Aron 1991). It contributes to momentary complexity and to an absence of overall directionality and complexity.
C.4 Deviation from the Norm in an Inverted U Function
Large sudden changes in various parameters-sharp changes in intensity, switches of distant registers, sudden silences, changes in
texture, and so on-that ruin the directionality are conspicuous in the works of Carl Philipp Emanuel Bach (one of the most prominent
representatives of the Affektenlehre) and, in a more structured manner, in those of Beethoven. In Beethoven's Waldstein Sonata for
piano (Op. 53), we find at least three prominent examples of extreme deviations in opposite directions: in the second movement
there are a large number of different chords and a lack of clarity in the harmony; in contrast, the last movement contains few
harmonic degrees-almost exclusively the prototypical degrees I and V; in the beginning of the sonata there are sharp transitions
between occurrences in the lesser range-low, small ambitus, repetition of notes-and in the high range.
The melodies of the Kyrie (the first and the second) in Bach's B Minor Mass are opposites in that the first is full of skips (divided
into two melodic levels), whereas the second has no skips except for a diminished third).
In Japanese theater, which is rife with tension and lacks spontaneity, the No represents the lesser extreme and the Kabuki represents
the greater extreme. In excited speech, externalized emotions are expressed at the greater extreme (stronger, more zigzagged,
higher), whereas the internalized ones are expressed at the lesser extreme.
C.5 Concurrence/Nonconcurrence
Nonconcurrence is common in Bach's works and takes various forms, including harmonic exchanges that do not come on stressed
beats and peaks of pitch or duration that are not on stressed beats; it is so common that one can even perceive a single piece as
having different meters. For example, the subject in the Fugue in D# Minor in the first part of the Well-Tempered Clavier, which is
written in meter, can easily be interpreted in ; and the melody
of the Minuet in D Minor from the Anna Magdalena Bach Notebook can be interpreted in as well. Moreover, there may be
nonconcurrence between the boundaries of units, which are liable to change depending on considerations pertaining to various
parameters. States of concurrence are extremely common in Mozart's works; in this context, every case of nonconcurrence stands
out, as in the opening of the Piano Sonata in C Minor (see Schachter 1976) or the opening of the third movement of Symphony No.
40. Beethoven has numerous examples of nonconcurrence; among the most salient is the sf with a peak in pitch on unstressed beats
(see, for example, the third- and fourth-to-last measures in the first movement of the first Piano Sonata in F Minor). In Chopin's
mazurkas, nonconcurrence is salient in that the events on the second or third beat are emphasized; we can consider this a
characteristic of the mazurka style. Nonconcurrence between pitch and intensity is common in nineteenth-century music. It is salient
in the opening of Wagner's Tristan and Isolde, where many units end with a gradual ascent in diminuendo (typical of a prosodic
speech curve expressing longing). In the performance of Arab music, which emphasizes the moment, we find many such
nonconcurrences between pitch and intensity.
C.6 Example from the Twentieth Century
In the twentieth century, when the learned tonal schemata virtually disappeared and were replaced in many cases by timbre and
texture, the natural schemata are, of course, more salient. It seems to us, however, that it would be interesting to examine their
appearance as secondary factors, as it were, in atonal music that pays meticulous attention to the parameter of the interval, such as
dodecaphonic music or that which is subject to various forms of interval organization.
Let us give brief details of Krenek's dodecaphonic piece, the Etude for Piano, Opus 38, no. 1, written in 1946. The piece is extremely
short (17 measures), with tempo allegro, and is a strict canon in two voices (with a difference of half a measure). The natural
schemata are manifested primarily in pitch curves, as well as dynamic curves, types of repetition, and operations, and they generally
do not overlap with the appearances of the dodecaphonic row.


Fig. 1: The structure of the upper voice in the canon from Krenek's Etude (op. 38, no. 1) in terms of pitch curves (and written
dynamics concurrent with pitch) on several levels.
a. Specification of curves on the various levels. Dots (•) stand for non-repeating notes; diamonds around dots ( ) indicate
notes that open a row; solid lines (-) above bar numbers denote pauses; broken lines (- - -) represent the overall curves of
units (convex or ascending); broken vertical lines ( ) denote the division between the two subparts in I and II. An oval ( ) in I
surrounds the opening subunit of two motives; in II the oval surrounds each of the two motives, which have become separate
units.
b. Overall structure of the piece, in terms of the curves of the parts (I and II are concave, whereas the overall curve is
convex).
In terms of the natural schemata, we can think of the piece as being divided into two parts (I and II) and a finale (see fig. 1). The first
part (eight measures, ending with the end of the third appearance of the row) is divided into two unequal subparts (41/2 + 31/2
measures). The first subpart has three units that are fairly zigzagged, but their overall curves are convex. The units, which are
separated by rests, become gradually shorter (together with the rests, their durations in number of measures are 2, 13/4, 3/4). All
three are subject to the overall descending curve. The second subpart contains two zigzagged units, with rests before, between, and
after them. In this subpart the ascending curve is salient on several levels. The second subpart may be regarded as a contrast to the
first in terms of the curve: ascent following descent. The overall curve of the first part is concave. Note that the curves on the
different levels, both convex and ascending, are reinforced by appropriate changes in intensity, but, as we mentioned, there is no
concurrence with the rows, except at the end of the first part and at the very end of the piece (altogether the row appears five times,
with a gradual increase in duration).
The second part (the second line in the figure) constitutes a repetition of sorts. The second part is also divided into two subparts. The
first subpart consists only of the subunit that opens I (circled in the figure) and contains two motives. In II it breaks up into two units
(set off by two ellipses in the figure); each of them is expanded by a repetition of notes, an increase in duration, and a pause between
them. The second unit is shorter than the first (the first has six notes and the second has only three). Then, in the second subpart,
there are three ascending lines that become shorter and shorter (the first has eight notes, the second has six, and the third has three)
with an ascent between them, up to a peak (in intensity, too). Like the first part, the second part as a whole is represented by a
concave curve, with a simpler and smaller descent and a stronger ascent. The second part as a whole is ascending. Then comes the
finale, which is marked by segregating staccato and rests between the notes and ends with descent, legato, long notes, and piano (i.e.,
a decrease in all parameters, including density).
Thus we can regard the piece as comprising several levels: the parts (two plus the finale), the subparts, the units of the subparts, and
the immediate level. As for the curves, on the immediate level most of them are zigzags with varying degrees of steepness; on the
level of the units, the curves are convex or ascending; on the level of the subparts, they are ascending or descending; on the level of
the parts, they are concave (in the two parts) and descending (in the finale); on the level of the piece as a whole (b in the figure), the
curve is convex. The operations are contrast, shift, reduction, and intensification. Unquestionably, these natural schemata contribute
to directionality on various levels of the piece, and they fill in what the dodecaphonic system lacks.
D. THE EXPERIMENT
The experiment examined subjects' responses to monophonic, atonal lines that are supposed to represent types of directionality by
means of various characteristics of directionality. Atonality was selected in order to prevent competition from learned schemata.
D.1 Method
The following questions were asked: (1) Is the pattern a closed one (in which case the unit is coherent and directional), or does it
require a continuation (lack of clear expectations). (2) Does it elicit a sense of pleasantness, unpleasantness (annoyance), or apathy?
(3) What adjectives are appropriate for characterizing the pattern?
The subjects: All were adults; some had studied music and the others could not read notes.
The patterns examined: The patterns were characterized by pitch curves (convex, concave, zigzagged, and flat), combinations of
them on various levels, and interval sizes (small/large). All of them appear in various (more than two) realizations, with equal
durations (i.e., neutralization of the duration factor) and without repetition of notes. Some of them reappeared with additional facets
of organization: of durations (meter, regular or irregular rhythms), with or without concurrence between pitch and duration factors,
and with repetitions of notes. There were a total of 47 patterns; the number of notes in each pattern ranged from 8 to 16; and the
ambitus did not exceed an 11th (17 half-tones). The patterns were recorded on a computer in the timbre of a piano; between the
patterns, the subjects were given 12 seconds to write down their responses.
D.2 A Selection of Significant Findings
General
1. There was usually a high correlation between "pleasant" and "directional."

2. The "unpleasant" is more blatant than the "pleasant."
Differences between musicians and nonmusicians
1. The musicians found less extreme differences between patterns.
2. The musicians used more associations ("like Schoenberg," "repeating motive," "slightly Phrygian," etc.).
3. In contrast to the nonmusicians, who considered extensive repetition of an interval pleasant, the musicians found it
annoying, "monotonous," and "an exercise."
Selected characteristics
A. Characteristics of the patterns perceived to be the most annoying
All were zigzags without repeated notes.
1. Equal durations, with concavity for the intervals (the order: large, small, large) and for the curve of the top level
2. Unequal durations, subject to meter but nonconcurrent with respect to pitch, duration, and position in the measure
3. Equal durations, with convexity and large intervals only
B. Characteristics of the patterns perceived to be the most pleasant
1. Zigzags on the immediate level but with an interval that does not exceed a minor 6th; a regular rhythm concurrent with
meter; symmetrical, divided into 4+4, with the second half an inversion of the first (convex curve overall)
2. Same as above, but only the first half has a convex curve, with equal durations and faster tempo
3. Various types of symmetry with repeated notes and equal or unequal durations
4. Convex, with no skips and with equal durations (short pattern)
C. Soothing factors, i.e., operations that make annoying patterns pleasant
1. Repetition of some of the notes
2. Introduction of meter, regular rhythm, and concurrence
3. Introduction of symmetry in terms of one repetition only (2n) with clear operation
4. Reduction of the interval size
Another general finding:
Symmetrical curves that are not zigzags on the immediate level and are either convex or concave in all respects are considered
pleasant to the same degree
D.3 Discussion of the Results
The results are not simple, because there are many factors linking the notes and contributing to directionality, and they are liable to
compete with each other even when the patterns are limited in duration and include no learned schemata. Almost every line that is
not simple may include a variety of curves on different levels, and a single curve may be characterized by various factors that
contribute to directionality.
The findings showed the importance of the following factors in contributing to directionality: symmetry in its various forms,
convexity of pitch and intervals, background regularity (meter and rhythm); and in contrast, the factors that work against
directionality: zigzags, excessively even intervals and notes; large intervals; concave intervals; and nonconcurrence.
The soothing effect of regularity in terms of duration was particularly prominent when the pitch curves were "wild" and the same
note was heard twice.
Most of these factors are suitable to some of the characteristics of natural schemata found in styles with different ideals.
And which of the competing factors prevails? We can make guesses, but more experiments are needed before we can offer a reliable
answer.
CONCLUSION
To sum up, we have tried to highlight the importance of natural schemata in shaping musical directionality. To do so, we have
extended what would be the background to our work, which views directionality as one of the three main variables in characterizing
the stylistic ideal.

The other two variables, on which we did not focus, are excitement versus tranquility in general and the degree of connection to
specific extramusical factors. These three variables are interrelated, since extramusical factors may affect types of directionality (by
sometimes blurring directionality on various levels), and the very type of directionality interferes with a general sense of excitement
or tranquility.
The learned schemata, which we have not discussed here, share some characterizing principles with the natural schemata. Here we
have examined several principles of natural schemata that are derived from the four main super-variables of natural schemata: range
of occurrence, curves of change, degree of definability (also related to hierarchy, concurrence/nonconcurrence, etc.), and categories
of operations.
We assumed these principles on the basis of theoretical considerations, analysis of musical pieces, and experiments (verbal and
ERP) performed by us and by others. We attempted to summarize and formulate those principles and used them to examine
additional musical pieces written with various stylistic ideals as well as to test-in an experiment-the influence of some principles on
our perception. The results reinforce our assumptions, but we have yet to expand the surveyed material and experiments.
REFERENCES
Agmon, E. (1989). A mathematical model of the diatonic system. Journal of Music Theory, 33, 1-25.
Andrews, M. W. & Dowling, W. J. (1991). The development of perception of interleaved melodies and control of auditory
attention. Music Perception, 8, 349-368.
Apel, R. (1993). A cognitive model for transformation processes in Western tonal music. Doctoral diss., University of Haifa.
Arom, S. (1991). African Polyphony and Polyrhythm. Cambridge, Cambridge University Press.
Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. Computer Music Journal, 4
(4), 66-84.
Bartlett, F. (1932). Remembering: a Study in Experimental and Social Psychology. Cambridge, Cambridge University Press.
Boulez, P. (1971). Boulez on Music Today. London, Faber and Faber.
Burns, E. M., Ward. W., & Dixon. (1978). Categorical perception-phenomenon or epiphenomenon evidence from experiments
in the perception of melodic musical intervals. Journal of the Acoustic Society of America, 63, 456-468.
Carlsen, J. C. (Ed.). (1990). Music Expectancy (special issue). Psychomusicology, 9 (2).
Cohen, D. (1971). Palestrina counterpoint-a musical expression of unexcited speech. Journal of Music Theory, 15, 84-111.
Cohen, D. (1986). The performance practice of the Rig Veda: a musical expression of excited speech. Yuval, 6, 292-317.
Cohen, D. (1994). Directionality and complexity in music. Musikometrica, 6, 27-77.
Cohen, D., & Dubnov, S. (1997). Gestalt phenomena in musical texture. In M. Leman (Ed.), Music, Gestalt, and Computing.
Berlin, Springer, pp. 386-405.
Cohen, D., & Granot, R. (1995). Constant and variable influences on stages of musical activities: research based on
experiments using behavioral and electrophysiological indices. Journal of New Music Research, 24, 197-229.
Cohen, D., & Michelson, I. (1999). Directionality and the meaning of harmonic patterns. In I. Zamos (Ed.), Music and Sign:
Semiotic and Cognitive Studies in Music, Systematica Musicologica, vol. 2. Bratislava, Asco Art and Science, pp. 278-298.
Cohen, D., & Mondry, H. (1997). Learned and natural schemata in music. In Proceedings of the Third Triennial ESCOM
Conference, Uppsala, pp. 605-610.
Cohen, D., & Wagner, N. (2000). Concurrence and nonconcurrence between learned and natural schemata: the case of Johann
Sebastian Bach's saraband in C minor for cello solo. New Music Research (in press).
Dowling, W. J. (1978). Scale and contour: two components of a theory of memory for melodies. Psychological Review,
341-354.
Edworthy, J. (1985). Melodic contour and musical structure. In P. Howell, J. Cross, and R. West (Eds.), Musical Structure
and Cognition. London, Academic Press.
Eitan, Z. (1997). High Points: a Study of Melodic Peak. Philadelphia, University of Pennsylvania Press.
Fónagy, I., & Magdis, K. (1972). Emotional patterns in intonation and music. In D. Bolinger (Ed.), Intonation: Selected
Readings. Harmondsworth, England, Penguin Education, pp. 286-312.
Goldstein, M. (1974). Sound texture. In J. Vinton (Ed.), Dictionary of 20th Century Music. London, Thames and Hudson, pp.
747-753.

Huron, D. (1996). On the kinematics of melodic contour: deceleration, declination and arch-trajectories in vocal phrases. In
Proceedings of the 4th ICMPC. Montreal.
Krumhansel, C. (1997). Effect of perpetual organization and musical form on melodic expectancies. In M. Leman (Ed.).
Music, Gestalt, and Computing. Berlin, Springer, pp. 294-320.
Lansky, P. (1974). Texture. In J. Vinton (Ed.), Dictionary of 20th Century Music. London, Thames and Hudson, pp. 741-747.
Levy, J. (1982). Texture as a sign in class and early Romantic music. JAMS, 35, 402-531.
Ligeti, G. (1993). States, events, transformation. Perspectives of New Music, 31, 164-171.
Lomax, A. (1977). Universals in song. The World of Music, 19, 117-129.
Lorince, F., Jr. (1966). A study of musical texture in relation to sonata-form as evidenced in selected keyboard sonatas.
Doctoral diss., University of Rochester.
Meyer, L. B. (1975). Explaining Music. Berkeley, CA, University of California Press.
Mondry, H. (1999). Schemata as a basis for perception and creation of music. M.A. thesis, Hebrew University of Jerusalem
(Hebrew).
Narmour, E. (1991). The top-down and bottom-up systems of musical implication: building on Meyer's theory of emotional
syntax. Music Perception, 9, 1-26.
Nettl, B. (1977). On the question of universals. The World of Music, 19, 2-7.
Rakowski, A. (1990). Intonation variants of musical intervals in isolation and in musical context. Psychology of Music, 18,
60-72.
Ratner, L. G. (1980). Texture, a rhetorical element in Beethoven's quartets. Israel Studies in Musicology, 2, 51-62.
Reese, G. (1959). Music in the Renaissance. New York, W. W. Norton.
Repp, B. H. (1998). Obligatory expectations of expressive timing induced by perception of musical structure. Psychological
Research, 61, 33-43.
Rothgeb, J. (1977). Design as a key to structure in tonal music. In M. Yeston (Ed.), Readings in Schenker Analysis and Other
Approaches. New Haven, CT, Yale University Press, pp. 72-93.
Sachs, C. (1946). The Commonwealth of Art. New York, W. W. Norton.
Shanon, B., & Atlan, H. (1990). Von Forster's theory: semantic application. New Ideas in Psychology, 9, 81-90.
Stevens, D. (1980). The Letters of Claudio Monteverdi. London, Faber and Faber.
Stockhausen, K. (1957). How time passes... Die Reihe, 3, 10-40 (English edition).
Sundberg, J., Friberg, A., & Fryden, L. (1991). Common secrets of musicians and listeners: an analysis by synthesis study of
musical performance. In P. Howel, R. West, and J. Cross (Eds.). Representing Musical Structure. London, Academic Press,
pp. 161-197.
Tversky, A. (1977). Features of similarity. Psychological Review, 84 (4), 327-352.
Varèse, E. (1967). The liberation of sound, excerpts from lectures of 1936-1962. In C. Chou (Ed.), Contemporary Composers
on Contemporary Music. New York.
Yeger-Granot, R. (1996). A study of musical expectancy by electrophysiological and behavioral measures. Doctoral diss.,
Hebrew University of Jerusalem.
Back to index

HOW MUCH TIME DO WE NEED TO PROCESS HARMONIC STRUCTURES
HOW MUCH TIME DO WE NEED TO PROCESS HARMONIC STRUCTURES ?
Bigand, E.*, d'Adamo, D., Madurell, F.**, Poulain, B., & Tillmann B.
* LEAD CNRS, Faculte des Sciences, Bld Gabriel, 21000 Dijon, FRANCE
** Music Department, Universite Paris IV La Sorbonne, France
*** Dartmouth College, USA
Background. The processing of a given musical event partly depends on the harmonic context in
which it occurs. Harmonically related events usually are processed faster than harmonically unrelated
one.
Aims. The purpose of the present study was to investigate the time course of harmonic priming in
short and long contexts. In short context (one chord priming), the duration of the prime, the SOA and
the ISI were manipulated in order to specify the speed at which abstract knowledge of Western
harmony may be activated in musicians and nonmusicians. In longer context (nine-chord sequences),
the tempo of the sequence was manipulated in order to trace the time course at which harmonic
modulations are processed by musician and nonmusicians.
Method. The harmonic priming paradigm was used. Participants were asked to quickly process a
target chord occurring after a short or a long musical context. The harmonic relationship of the target
and the previous context is manipulated. In single chord priming, the harmonic relationship of the
target and the prime was varied across the circle of fifth. In long chord sequence priming, the target
chord was a tonic chord of a new key whose distance from the first key was manipulated around the
circle of fifth. It was assumed that distant key or chord relationships requires more time to be
processed.
Results. Previous findings provide evidence that single chord priming occurs for prime duration as
short as 50 ms. A 50 ms prime is no longer explicitly recognizable as music, but participants (even
nonmusicians) reacted differently to more or less harmonically related primes. This suggest that
harmonic knowledge of western harmony may be very quickly activated. Experiments with long
(modulating) chord sequences are currently running.
Conclusions. Previous findings provide evidence that simple harmonic relationships may be very
quickly processed, even by musically naive listeners.
Back to index
file:///g|/Thurs/Bigand.htm [18/07/2000 00:40:15]

From: Professor Annabel J
MUSIC COGNITION ILLUMINATES OUR UNDERSTANDING OF THE EXPERIENCE OF FILM
Professor Annabel J. Cohen
acohen@upei.ca
Background:
Recent examination of film in the context of music typically focusses on the

interaction of the two media, for example, with respect to interpretation or
memory of film material. Early 20th Century French and American psychological
theory of film, however, pointed to a direct similarity underlying the mental
processing of the two media.
Aims:
This paper examines the validity of this view of music as metaphor for cinema.
The evaluation will be based on the applicability of contemporary knowledge
about music cognition to the experience of the art film.
Main contributions:
This paper places early psychological music-film analogies in the context of

recent cognitive musicology. If the early 20th Century insights are correct,
then what has been more recently discovered about music cognition should apply
to film theory. Recent cognitive film theorists generally have not explored
this possibility overtly, but they often use musical metaphors to describe the
phenomena of film. A specific question to be addressed is whether the
understanding of the aesthetic impact of repetition in film is illuminated by
the psychological processes underlying tonality induction, an aspect of music
cognition about which much is known. Specific examples from "The Red Violin"
(Girard/Corigliano, 1998) will be elaborated in support of the argument.
Implications:
Consistent with the intuitions of early French and American psychological film
theorists, this paper suggests the direct application of principles of music
cognition to cinema. The present approach sanctions the application of
music-theoretic ideas to film (see also Cook, 1998), but advocates, in
addition, the importance of considering cognitive constraints and proclivities
as determined by music cognition research.
Back to index
file:///g|/Thurs/Cohena.htm [18/07/2000 00:40:15]

Student Practice Habits in the United States and Japan
Proceedings paper

by
Mark Lammers, Ph.D. and Mark Kruger, Ph.D.
Gustavus Adolphus College
In this study the researchers surveyed applied music students at five schools in the United States and one
conservatoire in Japan. The survey investigated student perceptions of their efficiency, motivation, concentration
and planning of practice time. Students were also asked to report total practice times per week, general
demographic information and other aspects of the student's musical background.
Method
The present study in the United States and Japan extended the work of Harald Jørgensen (1997), who created the
study for students at the Norwegian Conservatoire of Music in Oslo, Norway by studying voice as well as
instrumental students, both primary and secondary instruments and both non-music majors and music majors.
Jørgensen translated the instrument to English for use in the United States at two liberal arts colleges and three
schools of music at major universities. Kiyoshi Miyamoto of Crown College in Minnesota translated the survey
into Japanese for use at a conservatoire in Japan.
Nine hundred seventy-seven applied music students participated in the study. Two hundred eighty-nine students
were surveyed in Japan, four hundred thirty- nine at liberal arts colleges and two hundred forty-nine at schools of
music in the United States. In addition to the survey, participating schools in the United States were asked to
have each of their applied music instructors identify a student who, in their opinion, was successful in the
practice studio and another who was less successful in their practice habits. Fifty-seven of those students were
interviewed. These interviews were recorded then coded for study. That part of the analysis has not been
completed.
Results
Lammers and Kruger (1999) reported that there are substantial differences between instrument groups in minutes
per week of practice time (see Table 1 below). Practice times also differ between liberal arts colleges and schools
of music in the US. Students who report more planning also engage in more practice time. They also report
higher levels of efficiency, concentration and motivation. It was also found that students who study two
instruments do not necessarily practice more that if they studied a single instrument.
Table 1
Mean Practice Time by Nation and Instrument Group
Group Self-Reports of Practice SD N
time
Vocal (Minutes per week)
file:///g|/Thurs/Lammers.htm (1 of 7) [18/07/2000 00:40:18]

Japan - Majors 381.4 298.5 26

Non-Majors 229.2 181.3 16
US - Majors 316.2 249.2 103

Non-Majors 144.8 98.3 86
Keyboard
Japan - Majors 765.0 502.3 114
Non-Majors 758.8 386.4 28
US - Majors 891.4 596.7 73

Non-Majors 399.7 386.2 61
Wind/Percussion
Japan-Majors 1036.3 605.6 50
Non-Majors 1230.0 212.1 2
US - Majors 740.1 529.3 135
Non-Majors 321.2 317.3 95
Strings
Japan-Majors 695.6 360.8 18
Non-Majors 665.0 643.5 2
US-Majors 691.3 447.0 61

Non-Majors 340.5 275.6 53
Total practice time per week was divided in to three categories: (1) those who practice five or less hours per
week, (2) those who practice between five and twenty hours per week and (3) those who practice more than
twenty hours per week. Table 2 below illustrates differences between the schools.
Table 2
Self-Reported Practice Hours Per Week (HPW) by Music Majors
Country/Type School Less than 5 HPW 5 - 20 HPW More than 20 HPW
Oslo Conservatory 5% 53% 42%

Japan Conservatory 21% 61% 18%
US Liberal Arts 46% 52% 2%

US Schools of Music 14% 61% 25%
In our previous presentation it was reported that there are differences between when students plan their lesson
time and the total amount of practice time they report. Students were asked when they planned practice time. The
five choices were: (1) before the practice day, (2) just before practice, (3) during practice, (4) just after practice
and (5) between practice days. Students in general, regardless of country of origin report that they plan, 'just
before practice' and 'during practice'. However, students who spend more time practicing were more likely to
report planning 'before the practice day', 'just after practice' and 'between practice days'. These differences can be
seen in Figure 1. These differences are found in each of the four primary instrument groups are studied. They are
not noticeable when examining Japanese students. (See Figure 2).

Systematic Planning
In this section we will suggest influences upon practice habits and differences between the two countries of Japan
and the United States. In particular, we will examine the impact of reporting that one is a systematic planner. At
one point in the survey participants were asked to rate themselves on a five-point scale in response to the
question, "Do you regard yourself as a person who uses planning of practice in a systematic way?" The mean
response of music majors at schools of music in the United States was 2.8, (SD 1.1, N = 87). Music majors at the
liberal arts colleges had a mean of 3.1, (SD 1.04, N = 290) while in Japan music majors had a mean of 2.6, (SD
0.89, N = 226). Some caution is advised in comparing means across nations because of potential differences in
the way Likert scales are used. Several experts on Japanese culture we've spoken to indicate that Japanese
students may simply be less likely to use the extreme ends of these scales. Differences within nations remain
quite interesting.
Connection to when Practice is Planned
When practice is planned was found to be associated with the extent to which subjects perceive that they are
systematic planners as well as to how much time students practice. Students in the United States who think they
are 'systematic' planners tend to report that they plan their practice day at the beginning of the day (r (255) = .46,
p < .01), between practice days (r (255) = .36, p < .01) and at the beginning of practice (r (255) = .28, p < .01).
These effects were weaker for Japanese students. The correlation between self-reports of systematic planning and
planning at the beginning of the day was found to be (r (222) = .35, p < .01). The effect was weaker but
significant for between practice days (r (222) = .20, p<.01) and even less at the beginning of practice time (r
(222) = .14, p <.01).
Perceived Change in Systematic Planning
All students were asked to answer on the five-point scale to, "Compare your planning now to the year
immediately preceding your study at this college/university? (We are interested in your general development, not

the often occurring and short time deviations)." Second, third and fourth year students were also asked, "How is
your planning now compared to earlier study at this college/university?" This allows us to ask if those who now
report that they are systematic planners also report that they have improved their practice habits. These
relationships are stronger for the Japanese students than for those in the United States. When comparing
'systematic' planning to 'year before' planning, the statistics are as follows: US (r (250) = .34, p < .01), Japan (r
(209) = .41, p < .01). Similar results occur in statistics that relate 'systematic' planning now to planning for
second, third and fourth year students in earlier years at their institution: US (r (250) = .31, p < .01), Japan (2) to
(4) (r (209) = .42, p < .01). This pattern also is revealed when relationships are examined between 'year before
entering this school' and for second, third and fourth year students' 'previous experience at this school', US (r
(255) = .48, p < .01), Japan (r (222) = .78, p < .01).
Instrument Groups - Systematic Planning and Improved Practice Habits

The researchers divided subjects into four groups for study and comparisons: Vocal, Strings, Winds and
Percussion and Keyboards. When examining relationships between 'systematic' planning now and planning "the
year before entering this school' the vocal area statistics are: US (r (115) = .31, p < .01), Japan (r (40) = .40, p <
.10). In the string area: US (r (71) = .35, p < .03), Japan (r (19) = .31, p = 1.00). For winds/percussion
instruments: US (r (137) = .49, p < .01), Japan (r (52) = .61, p < .01). Finally, relating keyboard students in the
United States and Japan: US (r (75) = .21, p < .68), Japan (r (135) = .33, p < .01).
Second, third and fourth year students report differences from the statistics reported just above in the four
instrument areas when relating 'systematic planning now' with 'planning in previous years at this school'. In the
vocal area: US (r (115) = .22, p < .20). Japan (r (40) = .35, p < .24). Strings: US (r (71) = .42, p < .01), Japan (r
(19) = .31, p =1.00). Winds/percussion: US (r (137) = .48, p < .01), Japan (r (52) = .61, p < .01). Keyboards: US
(r (75) = .32, p < .06), Japan (r (135) = .33, p < .01).
Systematic Planning and Perceived Sources of Influence
How students acquire practice habits may be tied to persons who have influenced their development as students
in applied music. The survey listed seven possible persons that could influence, 'the development of your practice
behavior'. The seven persons are listed here: yourself, other students, your voice/instrument teacher, other
voice/instrument teacher(s), theory teacher(s), other faculty/advisor(s), and persons outside the school. The
statistical answer to the possible relationship of 'systematic' planning now to possible influences has much
variability. The strongest relationship appears in the vocal area when related to "yourself": US (r (115) = .41, p <
.01), Japan (r (40) = .47, p < .04). Other relationships of interest in the United States are: vocal to 'other
faculty/advisor(s)' (r (115) = .30, p < .01), strings to 'instrument teacher' (r (71) = .42, p < .01), strings to 'theory
teacher(s)' (r (71) = .35, p < .03), W/P to 'other voice/instrument teachers' (r (137) = .36, p < .01), and W/P to
'other faculty/advisor(s)' (r (137) = .33, p < .01). Japan statistics report keyboards to 'yourself' (r (135) = .37, p <
.01) and Japan keyboards to 'persons outside the school' (r (135) = .47, p < .02).
Relating Planning of Upper Class Students to Previous Years Planning
Does planning improve while studying at a school? If so, who is perceived to influence that improvement?
Second, third and fourth year students responded in the four instrument areas with a variety of relationships. The
vocal student in the United States relates improvement of practice planning to their teacher (r (115) = .38, p <
.01), other faculty/advisor(s) (r (115) = .32, p < .01) and to other students (r (115) = .31, p < .01). The Japanese
students relate their improvement to themselves (r (40) = .48, p < .02). Wind and percussion students in the US
report that they have improved practice planning because of other voice/instrument teachers ( r (137) = .35, p <
.01) while the Japanese wind and percussion students give their theory teachers credit for their improvement (r
(52) = .43, p < .02). The only place in the keyboard area where a relationship appears is with the Japanese
students who report that they have improved because of their teacher's efforts (r (135) = .33, p < .01).
Summary
The instrument played may have a bearing upon how much a student practices. In an earlier presentation these
researchers reported that music majors at a Japanese school who play wind and percussion instruments practice

more minutes per week than keyboard players who practice more than string players who in turn practice more
than voice students. In the United States, keyboard students practice more than wind and percussion students
followed by string players and voice students.
A comparison was made to the Conservatoire of Music in Oslo to the schools in Japan and the United States. It
was reported that forty-two percent of the Oslo students practice twenty hours per week or more while only two
percent of the students at liberal arts colleges in the United States practice that amount of total time. Those same
liberal arts colleges have forty-six percent of their students reporting less than five hours per week of total
practice time and only five percent of the Oslo students practice a total of five hours or less per week.
The researchers also reported earlier that when some students plan practice time has a relationship to total
practice time. For students in the United States, those that plan before the practice day, just after practice and
between practice days also report that they practice more. This did not apply to students in Japan.
Students were asked if they consider themselves to be 'systematic' planners for practice. Music majors at the
liberal arts colleges in the United States report more systematic planning than do the students at schools of music
in the US and they in turn more than Japanese students. Also, those students in the US who rated themselves
systematic planners of practice time, plan at the beginning of the day, between practice days and at the beginning
of practice. Japanese students agree, to a lesser degree, only to planning at the beginning of the day.
When instrument groups are studied, Japanese students relate more strongly than students in the United States to
systematic planning now compared to the year before entering the school in which they are now enrolled in all
areas except string students where the reverse was reported. In those same groupings, second, third and fourth
year Japanese students relate the question of systematic planning now to planning in previous years at their
school to a greater degree than the students in the US in all areas except strings. They also relate planning
previous to their school enrollment to planning in previous years at their school higher than students in the United
States for all instrument groups except for string instrument performers.
The influence of persons known by students to practice habits was considered in relation to self-reports of how
they rated themselves as systematic planners of practice time. The strongest relationship between these two was
found in the vocal area when it relates to 'yourself' as the major influence upon planning of practice time in both
countries. "Yourself' is also the most important influence upon planning by Japanese keyboard students. Persons
outside the school also influence Japanese keyboard students. Voice students in the United States relate, to a
lesser degree, to 'other faculty/advisor(s)'. The instrument teacher and theory teachers are reported as influencing
planning of practice time by US string students. US wind and percussion students have planning influenced by
'other voice/instrument teachers' and 'other faculty/advisor(s)'.
Relationships were found between planning in the year before attending their school with possible influences
upon the planning of practice time, and between second, third and fourth year students and whom they think
influenced practice behavior. The strongest relationship of 'year before' to 'influence' was by Japanese vocal
students who reported 'yourself' as the person with the most influence. When relating influences to planning in
previous years by second, third and fourth year students, the Japanese vocal students again report the strongest
influence as 'yourself'. The strongest relationship by upper class voice students in the United States was found in
the 'year before' category who gave credit to their teacher for improvement. They also reported the teacher as the
strongest influence if they compared themselves to the year before they entered the school in which they are now
enrolled.
Conclusion
This paper has reported a set of relationships among and between the countries of the United States and Japan as
well as between types of schools and instrument categories. Analysis of the large amount of data produced by
this study has not been completed, notably, a comparison analysis of the data reported here and that reported
elsewhere by Harald Jørgensen at the Norwegian Conservatoire in Oslo, Norway. That project is planned for the
end of the year 2000. The researchers are also preparing to examine the information gathered through the
interview process in the United States. This study has shown that planning of practice time stands out as a very
important aspect of successful practice. How that planning is learned and taught as well as the many other aspects
of successful instrument practice may be answered with more study of the data that is now captured in our

computers. The quest to find these answers will continue.

References
Jorgensen, H., (1997) Time for practicing? Higher level music students' use of time for instrumental
practicing. In H. Jorgensen & H.C.Lehmann (Eds.) Does Practice Make Perfect?, Oslo, Norway, Norges
Musikkhogskole, 123-139. ISBN 82-7853-007-6
Lammers, M. & Kruger, M. (1999) An international study of student practice habits. Paper presented at the
1999 Conference of the Society for Music Perception and Cognition, Northwestern University, Evanston,
Illinois.
Back to index

The effect of musical styles and experiences
Proceedings paper
THE EFFECTS OF MUSICAL STYLES AND EXPERIENCES

ON MELODIC EXPECTANCY
Soo Kyoung Roh & Suk Won Yi
College of Music, Seoul National University
Seoul, Korea
sooroh@netian.com, sukwon@snu.ac.kr
Background
There have been ceaseless efforts to find a model on melodic expectancy of listeners, especially for
the past ten years. Narmour's implication-realization model (1990, 1992) is a music-theoretical and
psychological approach on listeners' melodic expectancy and confirmation, having its roots in Meyer's
theory (1956) and Gestalt principles such as proximity, similarity, and common fate. This model
hypothesizes that melodic expectancy is influenced by two perceptual systems: one a flexible,
variable, and empirically driven top-down system, and the other a rigid, automatic, unconscious,
preprogrammed bottom-up system. In contrast with learned top-down process, bottom-up process is
regarded as an innate and pan-stylistic system on tone-to-tone level. In addition, this bottom-up
system is partially based on universal and Gestalt-like principles and can be summarized as following
five principles: registral direction, intervallic difference, registral return, proximity and closure. The
five principles are defined as follows.
(1) Registral direction: small implicative intervals (≤ 5 semitones) imply continuation in the same
direction, whereas large implicative intervals (≥ 7 semitones) imply change in a different direction.
(2) Intervallic difference: small implicative intervals imply a subsequent realized interval that is
similar in size, whereas large implicative intervals imply a subsequent realized interval that is
relatively smaller in size.
(3) Registral return: it occurs when an implicative interval moves to a third note that is identical or
near (within ± 2 semitones) to the first note of the implicative interval.
(4) Proximity: any implicative interval implies a subsequent note near to (≤ 5 semitones) the second
note of the implicative interval.
(5) Closure: it occurs when registral direction of the implicative and realized intervals is different, or
when a large implicative interval is followed by a smaller realized interval, or both.
Of five bottom-up principles above mentioned, the two primary principles are registral direction and
intervallic difference. Narmour's five bottom-up principles, including these two primary principles,
have been recently supported by several experimental studies (Cuddy & Lunney, 1995; Krumhansl,
file:///g|/Thurs/Roh.htm (1 of 9) [18/07/2000 00:40:21]

The effect of musical styles and experiences
1991, 1995; Russo & Cuddy, 1996; Schellenberg, 1996, 1997; Thompson, Cuddy & Plaus, 1996,
1997; Thompson & Stainton, 1998).
Cuddy & Lunney (1995) and Thompson, Cuddy & Plaus (1996, 1997) evaluated five bottom-up
principles with simple two-tone melodic intervals in a melodic continuation or completion test.
Krumhansl (1991, 1995) expanded the applicability of bottom-up principles into melodic excerpts
ranging from three to eight measures in three different styles: British folk songs (tonal melodies),
Webern Lieder (atonal melodies), and Chinese songs (non-Western tonal melodies). She concluded
that the judgments of continuation tones were predicted quite well by Narmour's bottom-up principles
and this was true regardless of style. Schellenberg (1996) has also supported Krumhansl's results
strongly with the same musical materials and data.
However, with a closer examination, we may find that bottom-up principles are not equally applicable
to all musical styles. In Krumhansl's study (1991, 1995), the predictive power of bottom-up principles
was weaker in atonal music than in tonal or non-Western ton

ICMPC6 2000 - Proceedings

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICMPC6 2000 - Proceedings

Uploaded by

Copyright:

Available Formats

Proceedings

SIXTH INTERNATIONAL CONFERENCE ON MUSIC PERCEPTION

KEELE, UK, AUGUST 2000

Browse from the timetable Alphabetic list of authors

Using this CD Legal and Copyright A note

file:///g|/start.htm [18/07/2000 00:27:23]

Spoken paper sessions: Poster and demonstration

file:///g|/proctt.htm [18/07/2000 00:27:25]

Saturday 17.00 Invited speaker: Sundberg, J. Title: The Singing Voice

Sunday 6th Morning Session

9.00 Clarke, E. Welch, G. Deliège, I. Ryan, K. London, J. Williamon, A.

12.00 Invited speaker: Costa-Giomi, E.

file:///g|/Sun/sunday.htm (1 of 2) [18/07/2000 00:27:28]

Sunday 6th Afternoon Session

15.30 Malloch, S. Gleiser, J & Cambouropoulos, E. Demorest, S. Chen-Hafteck, L. Toiviainen, P.

16.00 Aldridge, D. Konishi, T., Mélen,M. Martinez, I. Fomina, A. Todd, N.

16.30 Trevarthen, C. Timmers, R. & Deliège Aoyagi, T. Ginsborg, J. Waadeland, C.

17.00 Aldridge, G Yoshiyuki, Horii. Koniari, D. Smith, N. . Langner, J.

Musicality and Perception and Musical schemata in Pitch-distributional Rhythm periodicity

file:///g|/Sun/sunday.htm (2 of 2) [18/07/2000 00:27:28]

VOICE CATEGORISATION AND VOICE PRODUCTION CHARACTERISTICS

Johan Sundberg, KTH Voice Research Centre, KTH, Stockholm

file:///g|/Sun/sundabs.htm [18/07/2000 00:27:29]

file:///g|/Sun/Jdsymp.htm [18/07/2000 00:27:29]

Human Voice Development: Symposium Introduction

file:///g|/Sun/welchsym.htm [18/07/2000 00:27:30]

Convenor: Irène Deliège

Department Arts and Sciences of Music

7 place du Vingt Août

Tel and fax: +32 2 660 10 13

1. Rationale for the topic

3. Contributors and titles of the presentations

Introduction: Similarity perception in music and related theories

• Christina Anagnostopoulou & Alan Smaill (Edinburgh, Scotland, UK) :

file:///g|/Sun/Delisymp.htm (1 of 3) [18/07/2000 00:27:31]

• Anna Rita Addessi & Roberto Caterina (Bologna, Italy) :

On segmentation of post tonal music

• Nicola Dibben (Sheffield, UK) & Alexandra Lamont (Leicester, UK)

Motivic structure and the perception of similarity

Melodic cue abstraction, similarity and category formation: a computational approach

• Marc Mélen, Irène Deliège and Sandrine Praedazzer (Liège) :

Categorisation processes in music perception during chilhood

• Irène Deliège (Liège) :

Prototype effect in music listening. An empirical approach on the notion of imprint

• Dimitra Koniari (Thessaloniki), Marc Mélen and Irène Deliège (Liège) :

Categorisation of musical structures in 6- to 10- month-old infants

• Stephen Malloch & Kate Stevens (Sydney, Australia):

Categorising folk melodies using similarity ratings

• Ian Cross (Cambridge):

Musical categories, ethnoscience and cognitive anthropology

file:///g|/Sun/Delisymp.htm (2 of 3) [18/07/2000 00:27:31]

file:///g|/Sun/Delisymp.htm (3 of 3) [18/07/2000 00:27:31]

Meaning and the Specification of Motion in Music

file:///g|/Sun/Clarke.htm (1 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (2 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (3 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (4 of 11) [18/07/2000 00:27:34]

4. Ecological Accounts of Sound, Music and Motion.

file:///g|/Sun/Clarke.htm (5 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (6 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (7 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (8 of 11) [18/07/2000 00:27:34]

file:///g|/Sun/Clarke.htm (9 of 11) [18/07/2000 00:27:34]